CI/CD Integration with Ephemeral Deployment Model (#14)

* feat: Complete migration from Prefect to Temporal

BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal

## Major Changes
- Replace Prefect with Temporal for workflow orchestration
- Implement vertical worker architecture (rust, android)
- Replace Docker registry with MinIO for unified storage
- Refactor activities to be co-located with workflows
- Update all API endpoints for Temporal compatibility

## Infrastructure
- New: docker-compose.temporal.yaml (Temporal + MinIO + workers)
- New: workers/ directory with rust and android vertical workers
- New: backend/src/temporal/ (manager, discovery)
- New: backend/src/storage/ (S3-cached storage with MinIO)
- New: backend/toolbox/common/ (shared storage activities)
- Deleted: docker-compose.yaml (old Prefect setup)
- Deleted: backend/src/core/prefect_manager.py
- Deleted: backend/src/services/prefect_stats_monitor.py
- Deleted: Docker registry and insecure-registries requirement

## Workflows
- Migrated: security_assessment workflow to Temporal
- New: rust_test workflow (example/test workflow)
- Deleted: secret_detection_scan (Prefect-based, to be reimplemented)
- Activities now co-located with workflows for independent testing

## API Changes
- Updated: backend/src/api/workflows.py (Temporal submission)
- Updated: backend/src/api/runs.py (Temporal status/results)
- Updated: backend/src/main.py (727 lines, TemporalManager integration)
- Updated: All 16 MCP tools to use TemporalManager

## Testing
-  All services healthy (Temporal, PostgreSQL, MinIO, workers, backend)
-  All API endpoints functional
-  End-to-end workflow test passed (72 findings from vulnerable_app)
-  MinIO storage integration working (target upload/download, results)
-  Worker activity discovery working (6 activities registered)
-  Tarball extraction working
-  SARIF report generation working

## Documentation
- ARCHITECTURE.md: Complete Temporal architecture documentation
- QUICKSTART_TEMPORAL.md: Getting started guide
- MIGRATION_DECISION.md: Why we chose Temporal over Prefect
- IMPLEMENTATION_STATUS.md: Migration progress tracking
- workers/README.md: Worker development guide

## Dependencies
- Added: temporalio>=1.6.0
- Added: boto3>=1.34.0 (MinIO S3 client)
- Removed: prefect>=3.4.18

* feat: Add Python fuzzing vertical with Atheris integration

This commit implements a complete Python fuzzing workflow using Atheris:

## Python Worker (workers/python/)
- Dockerfile with Python 3.11, Atheris, and build tools
- Generic worker.py for dynamic workflow discovery
- requirements.txt with temporalio, boto3, atheris dependencies
- Added to docker-compose.temporal.yaml with dedicated cache volume

## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/)
- Reusable module extending BaseModule
- Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py)
- Recursive search to find targets in nested directories
- Dynamically loads TestOneInput() function
- Configurable max_iterations and timeout
- Real-time stats callback support for live monitoring
- Returns findings as ModuleFinding objects

## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/)
- Temporal workflow for orchestrating fuzzing
- Downloads user code from MinIO
- Executes AtherisFuzzer module
- Uploads results to MinIO
- Cleans up cache after execution
- metadata.yaml with vertical: python for routing

## Test Project (test_projects/python_fuzz_waterfall/)
- Demonstrates stateful waterfall vulnerability
- main.py with check_secret() that leaks progress
- fuzz_target.py with Atheris TestOneInput() harness
- Complete README with usage instructions

## Backend Fixes
- Fixed parameter merging in REST API endpoints (workflows.py)
- Changed workflow parameter passing from positional args to kwargs (manager.py)
- Default parameters now properly merged with user parameters

## Testing
 Worker discovered AtherisFuzzingWorkflow
 Workflow executed end-to-end successfully
 Fuzz target auto-discovered in nested directories
 Atheris ran 100,000 iterations
 Results uploaded and cache cleaned

* chore: Complete Temporal migration with updated CLI/SDK/docs

This commit includes all remaining Temporal migration changes:

## CLI Updates (cli/)
- Updated workflow execution commands for Temporal
- Enhanced error handling and exceptions
- Updated dependencies in uv.lock

## SDK Updates (sdk/)
- Client methods updated for Temporal workflows
- Updated models for new workflow execution
- Updated dependencies in uv.lock

## Documentation Updates (docs/)
- Architecture documentation for Temporal
- Workflow concept documentation
- Resource management documentation (new)
- Debugging guide (new)
- Updated tutorials and how-to guides
- Troubleshooting updates

## README Updates
- Main README with Temporal instructions
- Backend README
- CLI README
- SDK README

## Other
- Updated IMPLEMENTATION_STATUS.md
- Removed old vulnerable_app.tar.gz

These changes complete the Temporal migration and ensure the
CLI/SDK work correctly with the new backend.

* fix: Use positional args instead of kwargs for Temporal workflows

The Temporal Python SDK's start_workflow() method doesn't accept
a 'kwargs' parameter. Workflows must receive parameters as positional
arguments via the 'args' parameter.

Changed from:
  args=workflow_args  # Positional arguments

This fixes the error:
  TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs'

Workflows now correctly receive parameters in order:
- security_assessment: [target_id, scanner_config, analyzer_config, reporter_config]
- atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds]
- rust_test: [target_id, test_message]

* fix: Filter metadata-only parameters from workflow arguments

SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5.
The issue was that target_path and volume_mode from default_parameters
were being passed to the workflow, when they should only be used by
the system for configuration.

Now filters out metadata-only parameters (target_path, volume_mode)
before passing arguments to workflow execution.

* refactor: Remove Prefect leftovers and volume mounting legacy

Complete cleanup of Prefect migration artifacts:

Backend:
- Delete registry.py and workflow_discovery.py (Prefect-specific files)
- Remove Docker validation from setup.py (no longer needed)
- Remove ResourceLimits and VolumeMount models
- Remove target_path and volume_mode from WorkflowSubmission
- Remove supported_volume_modes from API and discovery
- Clean up metadata.yaml files (remove volume/path fields)
- Simplify parameter filtering in manager.py

SDK:
- Remove volume_mode parameter from client methods
- Remove ResourceLimits and VolumeMount models
- Remove Prefect error patterns from docker_logs.py
- Clean up WorkflowSubmission and WorkflowMetadata models

CLI:
- Remove Volume Modes display from workflow info

All removed features are Prefect-specific or Docker volume mounting
artifacts. Temporal workflows use MinIO storage exclusively.

* feat: Add comprehensive test suite and benchmark infrastructure

- Add 68 unit tests for fuzzer, scanner, and analyzer modules
- Implement pytest-based test infrastructure with fixtures
- Add 6 performance benchmarks with category-specific thresholds
- Configure GitHub Actions for automated testing and benchmarking
- Add test and benchmark documentation

Test coverage:
- AtherisFuzzer: 8 tests
- CargoFuzzer: 14 tests
- FileScanner: 22 tests
- SecurityAnalyzer: 24 tests

All tests passing (68/68)
All benchmarks passing (6/6)

* fix: Resolve all ruff linting violations across codebase

Fixed 27 ruff violations in 12 files:
- Removed unused imports (Depends, Dict, Any, Optional, etc.)
- Fixed undefined workflow_info variable in workflows.py
- Removed dead code with undefined variables in atheris_fuzzer.py
- Changed f-string to regular string where no placeholders used

All files now pass ruff checks for CI/CD compliance.

* fix: Configure CI for unit tests only

- Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility
- Commented out integration-tests job (no integration tests yet)
- Updated test-summary to only depend on lint and unit-tests

CI will now run successfully with 68 unit tests. Integration tests can be added later.

* feat: Add CI/CD integration with ephemeral deployment model

Implements comprehensive CI/CD support for FuzzForge with on-demand worker management:

**Worker Management (v0.7.0)**
- Add WorkerManager for automatic worker lifecycle control
- Auto-start workers from stopped state when workflows execute
- Auto-stop workers after workflow completion
- Health checks and startup timeout handling (90s default)

**CI/CD Features**
- `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info)
- `--export-sarif` flag: Export findings in SARIF 2.1.0 format
- `--auto-start`/`--auto-stop` flags: Control worker lifecycle
- Exit code propagation: Returns 1 on blocking findings, 0 on success

**Exit Code Fix**
- Add `except typer.Exit: raise` handlers at 3 critical locations
- Move worker cleanup to finally block for guaranteed execution
- Exit codes now propagate correctly even when build fails

**CI Scripts & Examples**
- ci-start.sh: Start FuzzForge services with health checks
- ci-stop.sh: Clean shutdown with volume preservation option
- GitHub Actions workflow example (security-scan.yml)
- GitLab CI pipeline example (.gitlab-ci.example.yml)
- docker-compose.ci.yml: CI-optimized compose file with profiles

**OSS-Fuzz Integration**
- New ossfuzz_campaign workflow for running OSS-Fuzz projects
- OSS-Fuzz worker with Docker-in-Docker support
- Configurable campaign duration and project selection

**Documentation**
- Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md)
- Updated architecture docs with worker lifecycle details
- Updated workspace isolation documentation
- CLI README with worker management examples

**SDK Enhancements**
- Add get_workflow_worker_info() endpoint
- Worker vertical metadata in workflow responses

**Testing**
- All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing
- All monitoring commands tested: stats, crashes, status, finding
- Full CI pipeline simulation verified
- Exit codes verified for success/failure scenarios

Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers.

* fix: Resolve ruff linting violations in CI/CD code

- Remove unused variables (run_id, defaults, result)
- Remove unused imports
- Fix f-string without placeholders

All CI/CD integration files now pass ruff checks.
This commit is contained in:
tduhamel42
2025-10-14 10:13:45 +02:00
committed by GitHub
parent 056cde35b2
commit ec812461d6
167 changed files with 26101 additions and 5703 deletions
+165
View File
@@ -0,0 +1,165 @@
name: Benchmarks
on:
# Run on schedule (nightly)
schedule:
- cron: '0 2 * * *' # 2 AM UTC every day
# Allow manual trigger
workflow_dispatch:
inputs:
compare_with:
description: 'Baseline commit to compare against (optional)'
required: false
default: ''
# Run on PR when benchmarks are modified
pull_request:
paths:
- 'backend/benchmarks/**'
- 'backend/toolbox/modules/**'
- '.github/workflows/benchmark.yml'
jobs:
benchmark:
name: Run Benchmarks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch all history for comparison
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential
- name: Install Python dependencies
working-directory: ./backend
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
pip install pytest pytest-asyncio pytest-benchmark pytest-benchmark[histogram]
- name: Run benchmarks
working-directory: ./backend
run: |
pytest benchmarks/ \
-v \
--benchmark-only \
--benchmark-json=benchmark-results.json \
--benchmark-histogram=benchmark-histogram
- name: Store benchmark results
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.run_number }}
path: |
backend/benchmark-results.json
backend/benchmark-histogram.svg
- name: Download baseline benchmarks
if: github.event_name == 'pull_request'
uses: dawidd6/action-download-artifact@v3
continue-on-error: true
with:
workflow: benchmark.yml
branch: ${{ github.base_ref }}
name: benchmark-results-*
path: ./baseline
search_artifacts: true
- name: Compare with baseline
if: github.event_name == 'pull_request' && hashFiles('baseline/benchmark-results.json') != ''
run: |
python -c "
import json
import sys
with open('backend/benchmark-results.json') as f:
current = json.load(f)
with open('baseline/benchmark-results.json') as f:
baseline = json.load(f)
print('\\n## Benchmark Comparison\\n')
print('| Benchmark | Current | Baseline | Change |')
print('|-----------|---------|----------|--------|')
regressions = []
for bench in current['benchmarks']:
name = bench['name']
current_time = bench['stats']['mean']
# Find matching baseline
baseline_bench = next((b for b in baseline['benchmarks'] if b['name'] == name), None)
if baseline_bench:
baseline_time = baseline_bench['stats']['mean']
change = ((current_time - baseline_time) / baseline_time) * 100
print(f'| {name} | {current_time:.4f}s | {baseline_time:.4f}s | {change:+.2f}% |')
# Flag regressions > 10%
if change > 10:
regressions.append((name, change))
else:
print(f'| {name} | {current_time:.4f}s | N/A | NEW |')
if regressions:
print('\\n⚠️ **Performance Regressions Detected:**')
for name, change in regressions:
print(f'- {name}: +{change:.2f}%')
sys.exit(1)
else:
print('\\n✅ No significant performance regressions detected')
"
- name: Comment PR with results
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const results = JSON.parse(fs.readFileSync('backend/benchmark-results.json', 'utf8'));
let body = '## Benchmark Results\\n\\n';
body += '| Category | Benchmark | Mean Time | Std Dev |\\n';
body += '|----------|-----------|-----------|---------|\\n';
for (const bench of results.benchmarks) {
const group = bench.group || 'ungrouped';
const name = bench.name.split('::').pop();
const mean = bench.stats.mean.toFixed(4);
const stddev = bench.stats.stddev.toFixed(4);
body += `| ${group} | ${name} | ${mean}s | ${stddev}s |\\n`;
}
body += '\\n📊 Full benchmark results available in artifacts.';
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
benchmark-summary:
name: Benchmark Summary
runs-on: ubuntu-latest
needs: benchmark
if: always()
steps:
- name: Check results
run: |
if [ "${{ needs.benchmark.result }}" != "success" ]; then
echo "Benchmarks failed or detected regressions"
exit 1
fi
echo "Benchmarks completed successfully!"
@@ -0,0 +1,152 @@
# FuzzForge CI/CD Example - Security Scanning
#
# This workflow demonstrates how to integrate FuzzForge into your CI/CD pipeline
# for automated security testing on pull requests and pushes.
#
# Features:
# - Runs entirely in GitHub Actions (no external infrastructure needed)
# - Auto-starts FuzzForge services on-demand
# - Fails builds on error-level SARIF findings
# - Uploads SARIF results to GitHub Security tab
# - Exports findings as artifacts
#
# Prerequisites:
# - Ubuntu runner with Docker support
# - At least 4GB RAM available
# - ~90 seconds startup time
name: Security Scan Example
on:
pull_request:
branches: [main, develop]
push:
branches: [main]
jobs:
security-scan:
name: Security Assessment
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Start FuzzForge
run: |
bash scripts/ci-start.sh
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install FuzzForge CLI
run: |
pip install ./cli
- name: Initialize FuzzForge
run: |
ff init --api-url http://localhost:8000 --name "GitHub Actions Security Scan"
- name: Run Security Assessment
run: |
ff workflow run security_assessment . \
--wait \
--fail-on error \
--export-sarif results.sarif
- name: Upload SARIF to GitHub Security
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
- name: Upload findings as artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: security-findings
path: results.sarif
retention-days: 30
- name: Stop FuzzForge
if: always()
run: |
bash scripts/ci-stop.sh
secret-scan:
name: Secret Detection
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- name: Start FuzzForge
run: bash scripts/ci-start.sh
- name: Install CLI
run: |
pip install ./cli
- name: Initialize & Scan
run: |
ff init --api-url http://localhost:8000 --name "Secret Detection"
ff workflow run secret_detection . \
--wait \
--fail-on all \
--export-sarif secrets.sarif
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: secret-scan-results
path: secrets.sarif
retention-days: 30
- name: Cleanup
if: always()
run: bash scripts/ci-stop.sh
# Example: Nightly fuzzing campaign (long-running)
nightly-fuzzing:
name: Nightly Fuzzing
runs-on: ubuntu-latest
timeout-minutes: 120
# Only run on schedule
if: github.event_name == 'schedule'
steps:
- uses: actions/checkout@v4
- name: Start FuzzForge
run: bash scripts/ci-start.sh
- name: Install CLI
run: pip install ./cli
- name: Run Fuzzing Campaign
run: |
ff init --api-url http://localhost:8000
ff workflow run atheris_fuzzing . \
max_iterations=100000000 \
timeout_seconds=7200 \
--wait \
--export-sarif fuzzing-results.sarif
# Don't fail on fuzzing findings, just report
continue-on-error: true
- name: Upload fuzzing results
if: always()
uses: actions/upload-artifact@v4
with:
name: fuzzing-results
path: fuzzing-results.sarif
retention-days: 90
- name: Cleanup
if: always()
run: bash scripts/ci-stop.sh
+155
View File
@@ -0,0 +1,155 @@
name: Tests
on:
push:
branches: [ main, master, develop, feature/** ]
pull_request:
branches: [ main, master, develop ]
jobs:
lint:
name: Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ruff mypy
- name: Run ruff
run: ruff check backend/src backend/toolbox backend/tests backend/benchmarks --output-format=github
- name: Run mypy (continue on error)
run: mypy backend/src backend/toolbox || true
continue-on-error: true
unit-tests:
name: Unit Tests
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11', '3.12']
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential
- name: Install Python dependencies
working-directory: ./backend
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
pip install pytest pytest-asyncio pytest-cov pytest-xdist
- name: Run unit tests
working-directory: ./backend
run: |
pytest tests/unit/ \
-v \
--cov=toolbox/modules \
--cov=src \
--cov-report=xml \
--cov-report=term \
--cov-report=html \
-n auto
- name: Upload coverage to Codecov
if: matrix.python-version == '3.11'
uses: codecov/codecov-action@v4
with:
file: ./backend/coverage.xml
flags: unittests
name: codecov-backend
- name: Upload coverage HTML
if: matrix.python-version == '3.11'
uses: actions/upload-artifact@v4
with:
name: coverage-report
path: ./backend/htmlcov/
# integration-tests:
# name: Integration Tests
# runs-on: ubuntu-latest
# needs: unit-tests
#
# services:
# postgres:
# image: postgres:15
# env:
# POSTGRES_USER: postgres
# POSTGRES_PASSWORD: postgres
# POSTGRES_DB: fuzzforge_test
# options: >-
# --health-cmd pg_isready
# --health-interval 10s
# --health-timeout 5s
# --health-retries 5
# ports:
# - 5432:5432
#
# steps:
# - uses: actions/checkout@v4
#
# - name: Set up Python
# uses: actions/setup-python@v5
# with:
# python-version: '3.11'
#
# - name: Set up Docker Buildx
# uses: docker/setup-buildx-action@v3
#
# - name: Install Python dependencies
# working-directory: ./backend
# run: |
# python -m pip install --upgrade pip
# pip install -e ".[dev]"
# pip install pytest pytest-asyncio
#
# - name: Start services (Temporal, MinIO)
# run: |
# docker-compose -f docker-compose.yml up -d temporal minio
# sleep 30
#
# - name: Run integration tests
# working-directory: ./backend
# run: |
# pytest tests/integration/ -v --tb=short
# env:
# DATABASE_URL: postgresql://postgres:postgres@localhost:5432/fuzzforge_test
# TEMPORAL_ADDRESS: localhost:7233
# MINIO_ENDPOINT: localhost:9000
#
# - name: Shutdown services
# if: always()
# run: docker-compose down
test-summary:
name: Test Summary
runs-on: ubuntu-latest
needs: [lint, unit-tests]
if: always()
steps:
- name: Check test results
run: |
if [ "${{ needs.unit-tests.result }}" != "success" ]; then
echo "Unit tests failed"
exit 1
fi
echo "All tests passed!"
+121
View File
@@ -0,0 +1,121 @@
# FuzzForge CI/CD Example - GitLab CI
#
# This file demonstrates how to integrate FuzzForge into your GitLab CI/CD pipeline.
# Copy this to `.gitlab-ci.yml` in your project root to enable security scanning.
#
# Features:
# - Runs entirely in GitLab runners (no external infrastructure)
# - Auto-starts FuzzForge services on-demand
# - Fails pipelines on critical/high severity findings
# - Uploads SARIF reports to GitLab Security Dashboard
# - Exports findings as artifacts
#
# Prerequisites:
# - GitLab Runner with Docker support (docker:dind)
# - At least 4GB RAM available
# - ~90 seconds startup time
stages:
- security
variables:
FUZZFORGE_API_URL: "http://localhost:8000"
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: ""
# Base template for all FuzzForge jobs
.fuzzforge_template:
image: docker:24
services:
- docker:24-dind
before_script:
# Install dependencies
- apk add --no-cache bash curl python3 py3-pip git
# Start FuzzForge
- bash scripts/ci-start.sh
# Install CLI
- pip3 install ./cli --break-system-packages
# Initialize project
- ff init --api-url $FUZZFORGE_API_URL --name "GitLab CI Security Scan"
after_script:
# Cleanup
- bash scripts/ci-stop.sh || true
# Security Assessment - Comprehensive code analysis
security:scan:
extends: .fuzzforge_template
stage: security
timeout: 30 minutes
script:
- ff workflow run security_assessment . --wait --fail-on error --export-sarif results.sarif
artifacts:
when: always
reports:
sast: results.sarif
paths:
- results.sarif
expire_in: 30 days
only:
- merge_requests
- main
- develop
# Secret Detection - Scan for exposed credentials
security:secrets:
extends: .fuzzforge_template
stage: security
timeout: 15 minutes
script:
- ff workflow run secret_detection . --wait --fail-on all --export-sarif secrets.sarif
artifacts:
when: always
paths:
- secrets.sarif
expire_in: 30 days
only:
- merge_requests
- main
# Nightly Fuzzing - Long-running fuzzing campaign (scheduled only)
security:fuzzing:
extends: .fuzzforge_template
stage: security
timeout: 2 hours
script:
- |
ff workflow run atheris_fuzzing . \
max_iterations=100000000 \
timeout_seconds=7200 \
--wait \
--export-sarif fuzzing-results.sarif
artifacts:
when: always
paths:
- fuzzing-results.sarif
expire_in: 90 days
allow_failure: true # Don't fail pipeline on fuzzing findings
only:
- schedules
# OSS-Fuzz Campaign (for supported projects)
security:ossfuzz:
extends: .fuzzforge_template
stage: security
timeout: 1 hour
script:
- |
ff workflow run ossfuzz_campaign . \
project_name=your-project-name \
campaign_duration_hours=0.5 \
--wait \
--export-sarif ossfuzz-results.sarif
artifacts:
when: always
paths:
- ossfuzz-results.sarif
expire_in: 90 days
allow_failure: true
only:
- schedules
# Uncomment and set your project name
# when: manual
+1068
View File
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+421
View File
@@ -0,0 +1,421 @@
# FuzzForge Temporal Architecture - Quick Start Guide
This guide walks you through starting and testing the new Temporal-based architecture.
## Prerequisites
- Docker and Docker Compose installed
- At least 2GB free RAM (core services only, workers start on-demand)
- Ports available: 7233, 8233, 9000, 9001, 8000
## Step 1: Start Core Services
```bash
# From project root
cd /path/to/fuzzforge_ai
# Start core services (Temporal, MinIO, Backend)
docker-compose up -d
# Workers are pre-built but don't auto-start (saves ~6-7GB RAM)
# They'll start automatically when workflows need them
# Check status
docker-compose ps
```
**Expected output:**
```
NAME STATUS PORTS
fuzzforge-minio healthy 0.0.0.0:9000-9001->9000-9001/tcp
fuzzforge-temporal healthy 0.0.0.0:7233->7233/tcp
fuzzforge-temporal-postgresql healthy 5432/tcp
fuzzforge-backend healthy 0.0.0.0:8000->8000/tcp
fuzzforge-minio-setup exited (0)
# Workers NOT running (will start on-demand)
```
**First startup takes ~30-60 seconds** for health checks to pass.
## Step 2: Verify Worker Discovery
Check worker logs to ensure workflows are discovered:
```bash
docker logs fuzzforge-worker-rust
```
**Expected output:**
```
============================================================
FuzzForge Vertical Worker: rust
============================================================
Temporal Address: temporal:7233
Task Queue: rust-queue
Max Concurrent Activities: 5
============================================================
Discovering workflows for vertical: rust
Importing workflow module: toolbox.workflows.rust_test.workflow
✓ Discovered workflow: RustTestWorkflow from rust_test (vertical: rust)
Discovered 1 workflows for vertical 'rust'
Connecting to Temporal at temporal:7233...
✓ Connected to Temporal successfully
Creating worker on task queue: rust-queue
✓ Worker created successfully
============================================================
🚀 Worker started for vertical 'rust'
📦 Registered 1 workflows
⚙️ Registered 3 activities
📨 Listening on task queue: rust-queue
============================================================
Worker is ready to process tasks...
```
## Step 2.5: Worker Lifecycle Management (New in v0.7.0)
Workers start on-demand when workflows need them:
```bash
# Check worker status (should show Exited or not running)
docker ps -a --filter "name=fuzzforge-worker"
# Run a workflow - worker starts automatically
ff workflow run ossfuzz_campaign . project_name=zlib
# Worker is now running
docker ps --filter "name=fuzzforge-worker-ossfuzz"
```
**Configuration** (`.fuzzforge/config.yaml`):
```yaml
workers:
auto_start_workers: true # Default: auto-start
auto_stop_workers: false # Default: keep running
worker_startup_timeout: 60 # Startup timeout in seconds
```
**CLI Control**:
```bash
# Disable auto-start
ff workflow run ossfuzz_campaign . --no-auto-start
# Enable auto-stop after completion
ff workflow run ossfuzz_campaign . --wait --auto-stop
```
## Step 3: Access Web UIs
### Temporal Web UI
- URL: http://localhost:8233
- View workflows, executions, and task queues
### MinIO Console
- URL: http://localhost:9001
- Login: `fuzzforge` / `fuzzforge123`
- View uploaded targets and results
## Step 4: Test Workflow Execution
### Option A: Using Temporal CLI (tctl)
```bash
# Install tctl (if not already installed)
brew install temporal # macOS
# or download from https://github.com/temporalio/tctl/releases
# Execute test workflow
tctl workflow run \
--address localhost:7233 \
--taskqueue rust-queue \
--workflow_type RustTestWorkflow \
--input '{"target_id": "test-123", "test_message": "Hello Temporal!"}'
```
### Option B: Using Python Client
Create `test_workflow.py`:
```python
import asyncio
from temporalio.client import Client
async def main():
# Connect to Temporal
client = await Client.connect("localhost:7233")
# Start workflow
result = await client.execute_workflow(
"RustTestWorkflow",
{"target_id": "test-123", "test_message": "Hello Temporal!"},
id="test-workflow-1",
task_queue="rust-queue"
)
print("Workflow result:", result)
if __name__ == "__main__":
asyncio.run(main())
```
```bash
python test_workflow.py
```
### Option C: Upload Target and Run (Full Flow)
```python
# upload_and_run.py
import asyncio
import boto3
from pathlib import Path
from temporalio.client import Client
async def main():
# 1. Upload target to MinIO
s3 = boto3.client(
's3',
endpoint_url='http://localhost:9000',
aws_access_key_id='fuzzforge',
aws_secret_access_key='fuzzforge123',
region_name='us-east-1'
)
# Create a test file
test_file = Path('/tmp/test_target.txt')
test_file.write_text('This is a test target file')
# Upload to MinIO
target_id = 'my-test-target-001'
s3.upload_file(
str(test_file),
'targets',
f'{target_id}/target'
)
print(f"✓ Uploaded target: {target_id}")
# 2. Run workflow
client = await Client.connect("localhost:7233")
result = await client.execute_workflow(
"RustTestWorkflow",
{"target_id": target_id, "test_message": "Full flow test!"},
id=f"workflow-{target_id}",
task_queue="rust-queue"
)
print("✓ Workflow completed!")
print("Results:", result)
if __name__ == "__main__":
asyncio.run(main())
```
```bash
# Install dependencies
pip install temporalio boto3
# Run test
python upload_and_run.py
```
## Step 5: Monitor Execution
### View in Temporal UI
1. Open http://localhost:8233
2. Click on "Workflows"
3. Find your workflow by ID
4. Click to see:
- Execution history
- Activity results
- Error stack traces (if any)
### View Logs
```bash
# Worker logs (shows activity execution)
docker logs -f fuzzforge-worker-rust
# Temporal server logs
docker logs -f fuzzforge-temporal
```
### Check MinIO Storage
1. Open http://localhost:9001
2. Login: `fuzzforge` / `fuzzforge123`
3. Browse buckets:
- `targets/` - Uploaded target files
- `results/` - Workflow results (if uploaded)
- `cache/` - Worker cache (temporary)
## Troubleshooting
### Services Not Starting
```bash
# Check logs for all services
docker-compose -f docker-compose.temporal.yaml logs
# Check specific service
docker-compose -f docker-compose.temporal.yaml logs temporal
docker-compose -f docker-compose.temporal.yaml logs minio
docker-compose -f docker-compose.temporal.yaml logs worker-rust
```
### Worker Not Discovering Workflows
**Issue**: Worker logs show "No workflows found for vertical: rust"
**Solution**:
1. Check toolbox mount: `docker exec fuzzforge-worker-rust ls /app/toolbox/workflows`
2. Verify metadata.yaml exists and has `vertical: rust`
3. Check workflow.py has `@workflow.defn` decorator
### Cannot Connect to Temporal
**Issue**: `Failed to connect to Temporal`
**Solution**:
```bash
# Wait for Temporal to be healthy
docker-compose -f docker-compose.temporal.yaml ps
# Check Temporal health manually
curl http://localhost:8233
# Restart Temporal if needed
docker-compose -f docker-compose.temporal.yaml restart temporal
```
### MinIO Connection Failed
**Issue**: `Failed to download target`
**Solution**:
```bash
# Check MinIO is running
docker ps | grep minio
# Check buckets exist
docker exec fuzzforge-minio mc ls fuzzforge/
# Verify target was uploaded
docker exec fuzzforge-minio mc ls fuzzforge/targets/
```
### Workflow Hangs
**Issue**: Workflow starts but never completes
**Check**:
1. Worker logs for errors: `docker logs fuzzforge-worker-rust`
2. Activity timeouts in workflow code
3. Target file actually exists in MinIO
## Scaling
### Add More Workers
```bash
# Scale rust workers horizontally
docker-compose -f docker-compose.temporal.yaml up -d --scale worker-rust=3
# Verify all workers are running
docker ps | grep worker-rust
```
### Increase Concurrent Activities
Edit `docker-compose.temporal.yaml`:
```yaml
worker-rust:
environment:
MAX_CONCURRENT_ACTIVITIES: 10 # Increase from 5
```
```bash
# Apply changes
docker-compose -f docker-compose.temporal.yaml up -d worker-rust
```
## Cleanup
```bash
# Stop all services
docker-compose -f docker-compose.temporal.yaml down
# Remove volumes (WARNING: deletes all data)
docker-compose -f docker-compose.temporal.yaml down -v
# Remove everything including images
docker-compose -f docker-compose.temporal.yaml down -v --rmi all
```
## Next Steps
1. **Add More Workflows**: Create workflows in `backend/toolbox/workflows/`
2. **Add More Verticals**: Create new worker types (android, web, etc.) - see `workers/README.md`
3. **Integrate with Backend**: Update FastAPI backend to use Temporal client
4. **Update CLI**: Modify `ff` CLI to work with Temporal workflows
## Useful Commands
```bash
# View all logs
docker-compose -f docker-compose.temporal.yaml logs -f
# View specific service logs
docker-compose -f docker-compose.temporal.yaml logs -f worker-rust
# Restart a service
docker-compose -f docker-compose.temporal.yaml restart worker-rust
# Check service status
docker-compose -f docker-compose.temporal.yaml ps
# Execute command in worker
docker exec -it fuzzforge-worker-rust bash
# View worker Python environment
docker exec fuzzforge-worker-rust pip list
# Check workflow discovery manually
docker exec fuzzforge-worker-rust python -c "
from pathlib import Path
import yaml
for w in Path('/app/toolbox/workflows').iterdir():
if w.is_dir():
meta = w / 'metadata.yaml'
if meta.exists():
print(f'{w.name}: {yaml.safe_load(meta.read_text()).get(\"vertical\")}')"
```
## Architecture Overview
```
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Temporal │────▶│ Task Queue │────▶│ Worker-Rust │
│ Server │ │ rust-queue │ │ (Long-lived)│
└─────────────┘ └──────────────┘ └──────┬───────┘
│ │
│ │
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Postgres │ │ MinIO │
│ (State) │ │ (Storage) │
└─────────────┘ └──────────────┘
┌──────┴──────┐
│ │
┌────▼────┐ ┌─────▼────┐
│ Targets │ │ Results │
└─────────┘ └──────────┘
```
## Support
- **Documentation**: See `ARCHITECTURE.md` for detailed design
- **Worker Guide**: See `workers/README.md` for adding verticals
- **Issues**: Open GitHub issue with logs and steps to reproduce
+20 -13
View File
@@ -131,31 +131,38 @@ uv tool install --python python3.12 .
## ⚡ Quickstart
Run your first workflow :
Run your first workflow with **Temporal orchestration** and **automatic file upload**:
```bash
# 1. Clone the repo
git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
cd fuzzforge_ai
# 2. Build & run with Docker
# Set registry host for your OS (local registry is mandatory)
# macOS/Windows (Docker Desktop):
export REGISTRY_HOST=host.docker.internal
# Linux (default):
# export REGISTRY_HOST=localhost
docker compose up -d
# 2. Start FuzzForge with Temporal
docker-compose -f docker-compose.temporal.yaml up -d
```
> The first launch can take 5-10 minutes due to Docker image building - a good time for a coffee break
> The first launch can take 2-3 minutes for services to initialize
```bash
# 3. Run your first workflow
cd test_projects/vulnerable_app/ # Go into the test directory
fuzzforge init # Init a fuzzforge project
ff workflow run security_assessment . # Start a workflow (you can also use ff command)
# 3. Run your first workflow (files are automatically uploaded)
cd test_projects/vulnerable_app/
fuzzforge init # Initialize FuzzForge project
ff workflow run security_assessment . # Start workflow - CLI uploads files automatically!
# The CLI will:
# - Detect the local directory
# - Create a compressed tarball
# - Upload to backend (via MinIO)
# - Start the workflow on vertical worker
```
**What's running:**
- **Temporal**: Workflow orchestration (UI at http://localhost:8233)
- **MinIO**: File storage for targets (Console at http://localhost:9001)
- **Vertical Workers**: Pre-built workers with security toolchains
- **Backend API**: FuzzForge REST API (http://localhost:8000)
### Manual Workflow Setup
![Manual Workflow Demo](docs/static/videos/manual_workflow.gif)
+3 -3
View File
@@ -78,7 +78,7 @@ def create_a2a_app():
print("\033[0m") # Reset color
# Create A2A app
print(f"🚀 Starting FuzzForge A2A Server")
print("🚀 Starting FuzzForge A2A Server")
print(f" Model: {fuzzforge.model}")
if fuzzforge.cognee_url:
print(f" Memory: Cognee at {fuzzforge.cognee_url}")
@@ -86,7 +86,7 @@ def create_a2a_app():
app = create_custom_a2a_app(fuzzforge.adk_agent, port=port, executor=fuzzforge.executor)
print(f"\n✅ FuzzForge A2A Server ready!")
print("\n✅ FuzzForge A2A Server ready!")
print(f" Agent card: http://localhost:{port}/.well-known/agent-card.json")
print(f" A2A endpoint: http://localhost:{port}/")
print(f"\n📡 Other agents can register FuzzForge at: http://localhost:{port}")
@@ -101,7 +101,7 @@ def main():
app = create_a2a_app()
port = int(os.getenv('FUZZFORGE_PORT', 10100))
print(f"\n🎯 Starting server with uvicorn...")
print("\n🎯 Starting server with uvicorn...")
uvicorn.run(app, host="127.0.0.1", port=port)
-1
View File
@@ -18,7 +18,6 @@ from typing import Optional, Union
from starlette.applications import Starlette
from starlette.responses import Response, FileResponse
from starlette.routing import Route
from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor
from google.adk.a2a.utils.agent_card_builder import AgentCardBuilder
+1 -1
View File
@@ -15,7 +15,7 @@ Defines what FuzzForge can do and how others can discover it
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
from typing import List, Dict, Any
@dataclass
class AgentSkill:
+3 -7
View File
@@ -12,7 +12,6 @@
import asyncio
import base64
import time
import uuid
import json
@@ -392,7 +391,7 @@ class FuzzForgeExecutor:
user_email = f"project_{config.get_project_context()['project_id']}@fuzzforge.example"
user = await get_user(user_email)
cognee.set_user(user)
except Exception as e:
except Exception:
pass # User context not critical
# Use cognee search directly for maximum flexibility
@@ -583,7 +582,6 @@ class FuzzForgeExecutor:
pattern: Glob pattern (e.g. '*.py', '**/*.js', '')
"""
try:
from pathlib import Path
# Get project root from config
config = ProjectConfigManager()
@@ -648,7 +646,6 @@ class FuzzForgeExecutor:
max_lines: Maximum lines to read (0 for all, default 200 for large files)
"""
try:
from pathlib import Path
# Get project root from config
config = ProjectConfigManager()
@@ -711,7 +708,6 @@ class FuzzForgeExecutor:
"""
try:
import re
from pathlib import Path
# Get project root from config
config = ProjectConfigManager()
@@ -757,7 +753,7 @@ class FuzzForgeExecutor:
result = f"Found '{search_pattern}' in {len(matches)} locations (searched {files_searched} files):\n"
result += "\n".join(matches[:50])
if len(matches) >= 50:
result += f"\n... (showing first 50 matches)"
result += "\n... (showing first 50 matches)"
return result
else:
return f"No matches found for '{search_pattern}' in {files_searched} files matching '{file_pattern}'"
@@ -1088,7 +1084,7 @@ class FuzzForgeExecutor:
def _build_instruction(self) -> str:
"""Build the agent's instruction prompt"""
instruction = f"""You are FuzzForge, an intelligent A2A orchestrator with dual memory systems.
instruction = """You are FuzzForge, an intelligent A2A orchestrator with dual memory systems.
## Your Core Responsibilities:
+5 -12
View File
@@ -26,7 +26,6 @@ import random
from datetime import datetime
from contextlib import contextmanager
from pathlib import Path
from typing import Any
from dotenv import load_dotenv
@@ -90,18 +89,12 @@ except ImportError:
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.prompt import Prompt
from rich import box
from google.adk.events.event import Event
from google.adk.events.event_actions import EventActions
from google.genai import types as gen_types
from .agent import FuzzForgeAgent
from .agent_card import get_fuzzforge_agent_card
from .config_manager import ConfigManager
from .config_bridge import ProjectConfigManager
from .remote_agent import RemoteAgentConnection
console = Console()
@@ -243,7 +236,7 @@ class FuzzForgeCLI:
)
)
if self.agent.executor.agentops_trace:
console.print(f"Tracking: [medium_purple1]AgentOps active[/medium_purple1]")
console.print("Tracking: [medium_purple1]AgentOps active[/medium_purple1]")
# Show skills
console.print("\nSkills:")
@@ -320,7 +313,7 @@ class FuzzForgeCLI:
url=args.strip(),
description=description
)
console.print(f" [dim]Saved to config for auto-registration[/dim]")
console.print(" [dim]Saved to config for auto-registration[/dim]")
else:
console.print(f"[red]Failed: {result['error']}[/red]")
@@ -346,9 +339,9 @@ class FuzzForgeCLI:
# Remove from config
if self.config_manager.remove_registered_agent(name=agent_to_remove['name'], url=agent_to_remove['url']):
console.print(f"✅ Unregistered: [bold]{agent_to_remove['name']}[/bold]")
console.print(f" [dim]Removed from config (won't auto-register next time)[/dim]")
console.print(" [dim]Removed from config (won't auto-register next time)[/dim]")
else:
console.print(f"[yellow]Agent unregistered from session but not found in config[/yellow]")
console.print("[yellow]Agent unregistered from session but not found in config[/yellow]")
async def cmd_list(self, args: str = "") -> None:
"""List registered agents"""
@@ -699,7 +692,7 @@ class FuzzForgeCLI:
)
console.print(table)
console.print(f"\n[dim]Use /artifacts <id> to view artifact content[/dim]")
console.print("\n[dim]Use /artifacts <id> to view artifact content[/dim]")
async def cmd_tasks(self, args: str = "") -> None:
"""List tasks or show details for a specific task."""
+1 -3
View File
@@ -16,9 +16,7 @@ Can be reused by external agents and other components
import os
import asyncio
import json
from typing import Dict, List, Any, Optional, Union
from typing import Dict, Any, Optional
from pathlib import Path
+1 -3
View File
@@ -15,11 +15,9 @@ Provides integrated Cognee functionality for codebase analysis and knowledge gra
import os
import asyncio
import logging
from pathlib import Path
from typing import Dict, List, Any, Optional
from datetime import datetime
from typing import Dict, List, Any
logger = logging.getLogger(__name__)
+1 -1
View File
@@ -13,7 +13,7 @@
try:
from fuzzforge_cli.config import ProjectConfigManager as _ProjectConfigManager
except ImportError as exc: # pragma: no cover - used when CLI not available
except ImportError: # pragma: no cover - used when CLI not available
class _ProjectConfigManager: # type: ignore[no-redef]
"""Fallback implementation that raises a helpful error."""
+1 -4
View File
@@ -16,15 +16,12 @@ Separate from Cognee which will be used for RAG/codebase analysis
import os
import json
from typing import Dict, List, Any, Optional
from datetime import datetime
from typing import Dict, Any
import logging
# ADK Memory imports
from google.adk.memory import InMemoryMemoryService, BaseMemoryService
from google.adk.memory.base_memory_service import SearchMemoryResponse
from google.adk.memory.memory_entry import MemoryEntry
# Optional VertexAI Memory Bank
try:
+5 -9
View File
@@ -17,25 +17,21 @@ RUN apt-get update && apt-get install -y \
# Docker client configuration removed - localhost:5001 doesn't require insecure registry config
# Install uv for faster package management
RUN pip install uv
# Copy project files
COPY pyproject.toml ./
COPY uv.lock ./
# Install dependencies
RUN uv sync --no-dev
# Install dependencies with pip
RUN pip install --no-cache-dir -e .
# Copy source code
COPY . .
# Expose port
EXPOSE 8000
# Expose ports (API on 8000, MCP on 8010)
EXPOSE 8000 8010
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Start the application
CMD ["uv", "run", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
+101 -27
View File
@@ -1,6 +1,6 @@
# FuzzForge Backend
A stateless API server for security testing workflow orchestration using Prefect. This system dynamically discovers workflows, executes them in isolated Docker containers with volume mounting, and returns findings in SARIF format.
A stateless API server for security testing workflow orchestration using Temporal. This system dynamically discovers workflows, executes them in isolated worker environments, and returns findings in SARIF format.
## Architecture Overview
@@ -8,17 +8,17 @@ A stateless API server for security testing workflow orchestration using Prefect
1. **Workflow Discovery System**: Automatically discovers workflows at startup
2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
3. **Prefect Integration**: Handles container orchestration, workflow execution, and monitoring
4. **Volume Mounting**: Secure file access with configurable permissions (ro/rw)
3. **Temporal Integration**: Handles workflow orchestration, execution, and monitoring with vertical workers
4. **File Upload & Storage**: HTTP multipart upload to MinIO for target files
5. **SARIF Output**: Standardized security findings format
### Key Features
- **Stateless**: No persistent data, fully scalable
- **Generic**: No hardcoded workflows, automatic discovery
- **Isolated**: Each workflow runs in its own Docker container
- **Isolated**: Each workflow runs in specialized vertical workers
- **Extensible**: Easy to add new workflows and modules
- **Secure**: Read-only volume mounts by default, path validation
- **Secure**: File upload with MinIO storage, automatic cleanup via lifecycle policies
- **Observable**: Comprehensive logging and status tracking
## Quick Start
@@ -32,19 +32,17 @@ A stateless API server for security testing workflow orchestration using Prefect
From the project root, start all services:
```bash
docker-compose up -d
docker-compose -f docker-compose.temporal.yaml up -d
```
This will start:
- Prefect server (API at http://localhost:4200/api)
- PostgreSQL database
- Redis cache
- Docker registry (port 5001)
- Prefect worker (for running workflows)
- Temporal server (Web UI at http://localhost:8233, gRPC at :7233)
- MinIO (S3 storage at http://localhost:9000, Console at http://localhost:9001)
- PostgreSQL database (for Temporal state)
- Vertical workers (worker-rust, worker-android, worker-web, etc.)
- FuzzForge backend API (port 8000)
- FuzzForge MCP server (port 8010)
**Note**: The Prefect UI at http://localhost:4200 is not currently accessible from the host due to the API being configured for inter-container communication. Use the REST API or MCP interface instead.
**Note**: MinIO console login: `fuzzforge` / `fuzzforge123`
## API Endpoints
@@ -54,7 +52,8 @@ This will start:
- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
- `GET /workflows/{name}/parameters` - Get workflow parameter schema
- `GET /workflows/metadata/schema` - Get metadata.yaml schema
- `POST /workflows/{name}/submit` - Submit a workflow for execution
- `POST /workflows/{name}/submit` - Submit a workflow for execution (path-based, legacy)
- `POST /workflows/{name}/upload-and-submit` - **Upload local files and submit workflow** (recommended)
### Runs
@@ -68,12 +67,13 @@ Each workflow must have:
```
toolbox/workflows/{workflow_name}/
workflow.py # Prefect flow definition
metadata.yaml # Mandatory metadata (parameters, version, etc.)
Dockerfile # Optional custom container definition
requirements.txt # Optional Python dependencies
workflow.py # Temporal workflow definition
metadata.yaml # Mandatory metadata (parameters, version, vertical, etc.)
requirements.txt # Optional Python dependencies (installed in vertical worker)
```
**Note**: With Temporal architecture, workflows run in pre-built vertical workers (e.g., `worker-rust`, `worker-android`), not individual Docker containers. The workflow code is mounted as a volume and discovered at runtime.
### Example metadata.yaml
```yaml
@@ -82,6 +82,7 @@ version: "1.0.0"
description: "Comprehensive security analysis workflow"
author: "FuzzForge Team"
category: "comprehensive"
vertical: "rust" # Routes to worker-rust
tags:
- "security"
- "analysis"
@@ -169,6 +170,57 @@ curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
Resource precedence: User limits > Workflow requirements > System defaults
## File Upload and Target Access
### Upload Endpoint
The backend provides an upload endpoint for submitting workflows with local files:
```
POST /workflows/{workflow_name}/upload-and-submit
Content-Type: multipart/form-data
Parameters:
file: File upload (supports .tar.gz for directories)
parameters: JSON string of workflow parameters (optional)
volume_mode: "ro" or "rw" (default: "ro")
timeout: Execution timeout in seconds (optional)
```
Example using curl:
```bash
# Upload a directory (create tarball first)
tar -czf project.tar.gz /path/to/project
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
-F "file=@project.tar.gz" \
-F "parameters={\"check_secrets\":true}" \
-F "volume_mode=ro"
# Upload a single file
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
-F "file=@binary.elf" \
-F "volume_mode=ro"
```
### Storage Flow
1. **CLI/API uploads file** via HTTP multipart
2. **Backend receives file** and streams to temporary location (max 10GB)
3. **Backend uploads to MinIO** with generated `target_id`
4. **Workflow is submitted** to Temporal with `target_id`
5. **Worker downloads target** from MinIO to local cache
6. **Workflow processes target** from cache
7. **MinIO lifecycle policy** deletes files after 7 days
### Advantages
- **No host filesystem access required** - workers can run anywhere
- **Automatic cleanup** - lifecycle policies prevent disk exhaustion
- **Caching** - repeated workflows reuse cached targets
- **Multi-host ready** - targets accessible from any worker
- **Secure** - isolated storage, no arbitrary host path access
## Module Development
Modules implement the `BaseModule` interface:
@@ -198,7 +250,21 @@ class MyModule(BaseModule):
## Submitting a Workflow
### With File Upload (Recommended)
```bash
# Automatic tarball and upload
tar -czf project.tar.gz /home/user/project
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
-F "file=@project.tar.gz" \
-F "parameters={\"scanner_config\":{\"patterns\":[\"*.py\"]},\"analyzer_config\":{\"check_secrets\":true}}" \
-F "volume_mode=ro"
```
### Legacy Path-Based Submission
```bash
# Only works if backend and target are on same machine
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
-H "Content-Type: application/json" \
-d '{
@@ -235,23 +301,31 @@ Returns SARIF-formatted findings:
## Security Considerations
1. **Volume Mounting**: Only allowed directories can be mounted
2. **Read-Only Default**: Volumes mounted as read-only unless explicitly set
3. **Container Isolation**: Each workflow runs in an isolated container
4. **Resource Limits**: Can set CPU/memory limits via Prefect
5. **Network Isolation**: Containers use bridge networking
1. **File Upload Security**: Files uploaded to MinIO with isolated storage
2. **Read-Only Default**: Target files accessed as read-only unless explicitly set
3. **Worker Isolation**: Each workflow runs in isolated vertical workers
4. **Resource Limits**: Can set CPU/memory limits per worker
5. **Automatic Cleanup**: MinIO lifecycle policies delete old files after 7 days
## Development
### Adding a New Workflow
1. Create directory: `toolbox/workflows/my_workflow/`
2. Add `workflow.py` with a Prefect flow
3. Add mandatory `metadata.yaml`
4. Restart backend: `docker-compose restart fuzzforge-backend`
2. Add `workflow.py` with a Temporal workflow (using `@workflow.defn`)
3. Add mandatory `metadata.yaml` with `vertical` field
4. Restart the appropriate worker: `docker-compose -f docker-compose.temporal.yaml restart worker-rust`
5. Worker will automatically discover and register the new workflow
### Adding a New Module
1. Create module in `toolbox/modules/{category}/`
2. Implement `BaseModule` interface
3. Use in workflows via import
3. Use in workflows via import
### Adding a New Vertical Worker
1. Create worker directory: `workers/{vertical}/`
2. Create `Dockerfile` with required tools
3. Add worker to `docker-compose.temporal.yaml`
4. Worker will automatically discover workflows with matching `vertical` in metadata
+184
View File
@@ -0,0 +1,184 @@
# FuzzForge Benchmark Suite
Performance benchmarking infrastructure organized by module category.
## Directory Structure
```
benchmarks/
├── conftest.py # Benchmark fixtures
├── category_configs.py # Category-specific thresholds
├── by_category/ # Benchmarks organized by category
│ ├── fuzzer/
│ │ ├── bench_cargo_fuzz.py
│ │ └── bench_atheris.py
│ ├── scanner/
│ │ └── bench_file_scanner.py
│ ├── secret_detection/
│ │ ├── bench_gitleaks.py
│ │ └── bench_trufflehog.py
│ └── analyzer/
│ └── bench_security_analyzer.py
├── fixtures/ # Benchmark test data
│ ├── small/ # ~1K LOC
│ ├── medium/ # ~10K LOC
│ └── large/ # ~100K LOC
└── results/ # Benchmark results (JSON)
```
## Module Categories
### Fuzzer
**Expected Metrics**: execs/sec, coverage_rate, time_to_crash, memory_usage
**Performance Thresholds**:
- Min 1000 execs/sec
- Max 10s for small projects
- Max 2GB memory
### Scanner
**Expected Metrics**: files/sec, LOC/sec, findings_count
**Performance Thresholds**:
- Min 100 files/sec
- Min 10K LOC/sec
- Max 512MB memory
### Secret Detection
**Expected Metrics**: patterns/sec, precision, recall, F1
**Performance Thresholds**:
- Min 90% precision
- Min 95% recall
- Max 5 false positives per 100 secrets
### Analyzer
**Expected Metrics**: analysis_depth, files/sec, accuracy
**Performance Thresholds**:
- Min 10 files/sec (deep analysis)
- Min 85% accuracy
- Max 2GB memory
## Running Benchmarks
### All Benchmarks
```bash
cd backend
pytest benchmarks/ --benchmark-only -v
```
### Specific Category
```bash
pytest benchmarks/by_category/fuzzer/ --benchmark-only -v
```
### With Comparison
```bash
# Run and save baseline
pytest benchmarks/ --benchmark-only --benchmark-save=baseline
# Compare against baseline
pytest benchmarks/ --benchmark-only --benchmark-compare=baseline
```
### Generate Histogram
```bash
pytest benchmarks/ --benchmark-only --benchmark-histogram=histogram
```
## Benchmark Results
Results are saved as JSON and include:
- Mean execution time
- Standard deviation
- Min/Max values
- Iterations per second
- Memory usage
Example output:
```
------------------------ benchmark: fuzzer --------------------------
Name Mean StdDev Ops/Sec
bench_cargo_fuzz[discovery] 0.0012s 0.0001s 833.33
bench_cargo_fuzz[execution] 0.1250s 0.0050s 8.00
bench_cargo_fuzz[memory] 0.0100s 0.0005s 100.00
---------------------------------------------------------------------
```
## CI/CD Integration
Benchmarks run:
- **Nightly**: Full benchmark suite, track trends
- **On PR**: When benchmarks/ or modules/ changed
- **Manual**: Via workflow_dispatch
### Regression Detection
Benchmarks automatically fail if:
- Performance degrades >10%
- Memory usage exceeds thresholds
- Throughput drops below minimum
See `.github/workflows/benchmark.yml` for configuration.
## Adding New Benchmarks
### 1. Create benchmark file in category directory
```python
# benchmarks/by_category/fuzzer/bench_new_fuzzer.py
import pytest
from benchmarks.category_configs import ModuleCategory, get_threshold
@pytest.mark.benchmark(group="fuzzer")
def test_execution_performance(benchmark, new_fuzzer, test_workspace):
"""Benchmark execution speed"""
result = benchmark(new_fuzzer.execute, config, test_workspace)
# Validate against threshold
threshold = get_threshold(ModuleCategory.FUZZER, "max_execution_time_small")
assert result.execution_time < threshold
```
### 2. Update category_configs.py if needed
Add new thresholds or metrics for your module.
### 3. Run locally
```bash
pytest benchmarks/by_category/fuzzer/bench_new_fuzzer.py --benchmark-only -v
```
## Best Practices
1. **Use mocking** for external dependencies (network, disk I/O)
2. **Fixed iterations** for consistent benchmarking
3. **Warm-up runs** for JIT-compiled code
4. **Category-specific metrics** aligned with module purpose
5. **Realistic fixtures** that represent actual use cases
6. **Memory profiling** using tracemalloc
7. **Compare apples to apples** within the same category
## Interpreting Results
### Good Performance
- ✅ Execution time below threshold
- ✅ Memory usage within limits
- ✅ Throughput meets minimum
- ✅ <5% variance across runs
### Performance Issues
- ⚠️ Execution time 10-20% over threshold
- ❌ Execution time >20% over threshold
- ❌ Memory leaks (increasing over iterations)
- ❌ High variance (>10%) indicates instability
## Tracking Performance Over Time
Benchmark results are stored as artifacts with:
- Commit SHA
- Timestamp
- Environment details (Python version, OS)
- Full metrics
Use these to track long-term performance trends and detect gradual degradation.
@@ -0,0 +1,221 @@
"""
Benchmarks for CargoFuzzer module
Tests performance characteristics of Rust fuzzing:
- Execution throughput (execs/sec)
- Coverage rate
- Memory efficiency
- Time to first crash
"""
import pytest
import asyncio
from pathlib import Path
from unittest.mock import AsyncMock, patch
import sys
sys.path.insert(0, str(Path(__file__).resolve().parents[3] / "toolbox"))
from modules.fuzzer.cargo_fuzzer import CargoFuzzer
from benchmarks.category_configs import ModuleCategory, get_threshold
@pytest.fixture
def cargo_fuzzer():
"""Create CargoFuzzer instance for benchmarking"""
return CargoFuzzer()
@pytest.fixture
def benchmark_config():
"""Benchmark-optimized configuration"""
return {
"target_name": None,
"max_iterations": 10000, # Fixed iterations for consistent benchmarking
"timeout_seconds": 30,
"sanitizer": "address"
}
@pytest.fixture
def mock_rust_workspace(tmp_path):
"""Create a minimal Rust workspace for benchmarking"""
workspace = tmp_path / "rust_project"
workspace.mkdir()
# Cargo.toml
(workspace / "Cargo.toml").write_text("""[package]
name = "bench_project"
version = "0.1.0"
edition = "2021"
""")
# src/lib.rs
src = workspace / "src"
src.mkdir()
(src / "lib.rs").write_text("""
pub fn benchmark_function(data: &[u8]) -> Vec<u8> {
data.to_vec()
}
""")
# fuzz structure
fuzz = workspace / "fuzz"
fuzz.mkdir()
(fuzz / "Cargo.toml").write_text("""[package]
name = "bench_project-fuzz"
version = "0.0.0"
edition = "2021"
[dependencies]
libfuzzer-sys = "0.4"
[dependencies.bench_project]
path = ".."
[[bin]]
name = "fuzz_target_1"
path = "fuzz_targets/fuzz_target_1.rs"
""")
targets = fuzz / "fuzz_targets"
targets.mkdir()
(targets / "fuzz_target_1.rs").write_text("""#![no_main]
use libfuzzer_sys::fuzz_target;
use bench_project::benchmark_function;
fuzz_target!(|data: &[u8]| {
let _ = benchmark_function(data);
});
""")
return workspace
class TestCargoFuzzerPerformance:
"""Benchmark CargoFuzzer performance metrics"""
@pytest.mark.benchmark(group="fuzzer")
def test_target_discovery_performance(self, benchmark, cargo_fuzzer, mock_rust_workspace):
"""Benchmark fuzz target discovery speed"""
def discover():
return asyncio.run(cargo_fuzzer._discover_fuzz_targets(mock_rust_workspace))
result = benchmark(discover)
assert len(result) > 0
@pytest.mark.benchmark(group="fuzzer")
def test_config_validation_performance(self, benchmark, cargo_fuzzer, benchmark_config):
"""Benchmark configuration validation speed"""
result = benchmark(cargo_fuzzer.validate_config, benchmark_config)
assert result is True
@pytest.mark.benchmark(group="fuzzer")
def test_module_initialization_performance(self, benchmark):
"""Benchmark module instantiation time"""
def init_module():
return CargoFuzzer()
module = benchmark(init_module)
assert module is not None
class TestCargoFuzzerThroughput:
"""Benchmark execution throughput"""
@pytest.mark.benchmark(group="fuzzer")
def test_execution_throughput(self, benchmark, cargo_fuzzer, mock_rust_workspace, benchmark_config):
"""Benchmark fuzzing execution throughput"""
# Mock actual fuzzing to focus on orchestration overhead
async def mock_run(workspace, target, config, callback):
# Simulate 10K execs at 1000 execs/sec
if callback:
await callback({
"total_execs": 10000,
"execs_per_sec": 1000.0,
"crashes": 0,
"coverage": 50,
"corpus_size": 10,
"elapsed_time": 10
})
return [], {"total_executions": 10000, "execution_time": 10.0}
with patch.object(cargo_fuzzer, '_build_fuzz_target', new_callable=AsyncMock, return_value=True):
with patch.object(cargo_fuzzer, '_run_fuzzing', side_effect=mock_run):
with patch.object(cargo_fuzzer, '_parse_crash_artifacts', new_callable=AsyncMock, return_value=[]):
def run_fuzzer():
# Run in new event loop
loop = asyncio.new_event_loop()
try:
return loop.run_until_complete(
cargo_fuzzer.execute(benchmark_config, mock_rust_workspace)
)
finally:
loop.close()
result = benchmark(run_fuzzer)
assert result.status == "success"
# Verify performance threshold
threshold = get_threshold(ModuleCategory.FUZZER, "max_execution_time_small")
assert result.execution_time < threshold, \
f"Execution time {result.execution_time}s exceeds threshold {threshold}s"
class TestCargoFuzzerMemory:
"""Benchmark memory efficiency"""
@pytest.mark.benchmark(group="fuzzer")
def test_memory_overhead(self, benchmark, cargo_fuzzer, mock_rust_workspace, benchmark_config):
"""Benchmark memory usage during execution"""
import tracemalloc
def measure_memory():
tracemalloc.start()
# Simulate operations
cargo_fuzzer.validate_config(benchmark_config)
asyncio.run(cargo_fuzzer._discover_fuzz_targets(mock_rust_workspace))
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return peak / 1024 / 1024 # Convert to MB
peak_mb = benchmark(measure_memory)
# Check against threshold
max_memory = get_threshold(ModuleCategory.FUZZER, "max_memory_mb")
assert peak_mb < max_memory, \
f"Peak memory {peak_mb:.2f}MB exceeds threshold {max_memory}MB"
class TestCargoFuzzerScalability:
"""Benchmark scalability characteristics"""
@pytest.mark.benchmark(group="fuzzer")
def test_multiple_target_discovery(self, benchmark, cargo_fuzzer, tmp_path):
"""Benchmark discovery with multiple targets"""
workspace = tmp_path / "multi_target"
workspace.mkdir()
# Create workspace with 10 fuzz targets
(workspace / "Cargo.toml").write_text("[package]\nname = \"test\"\nversion = \"0.1.0\"\nedition = \"2021\"")
src = workspace / "src"
src.mkdir()
(src / "lib.rs").write_text("pub fn test() {}")
fuzz = workspace / "fuzz"
fuzz.mkdir()
targets = fuzz / "fuzz_targets"
targets.mkdir()
for i in range(10):
(targets / f"fuzz_target_{i}.rs").write_text("// Target")
def discover():
return asyncio.run(cargo_fuzzer._discover_fuzz_targets(workspace))
result = benchmark(discover)
assert len(result) == 10
+151
View File
@@ -0,0 +1,151 @@
"""
Category-specific benchmark configurations
Defines expected metrics and performance thresholds for each module category.
"""
from dataclasses import dataclass
from typing import List, Dict
from enum import Enum
class ModuleCategory(str, Enum):
"""Module categories for benchmarking"""
FUZZER = "fuzzer"
SCANNER = "scanner"
ANALYZER = "analyzer"
SECRET_DETECTION = "secret_detection"
REPORTER = "reporter"
@dataclass
class CategoryBenchmarkConfig:
"""Benchmark configuration for a module category"""
category: ModuleCategory
expected_metrics: List[str]
performance_thresholds: Dict[str, float]
description: str
# Fuzzer category configuration
FUZZER_CONFIG = CategoryBenchmarkConfig(
category=ModuleCategory.FUZZER,
expected_metrics=[
"execs_per_sec",
"coverage_rate",
"time_to_first_crash",
"corpus_efficiency",
"execution_time",
"peak_memory_mb"
],
performance_thresholds={
"min_execs_per_sec": 1000, # Minimum executions per second
"max_execution_time_small": 10.0, # Max time for small project (seconds)
"max_execution_time_medium": 60.0, # Max time for medium project
"max_memory_mb": 2048, # Maximum memory usage
"min_coverage_rate": 1.0, # Minimum new coverage per second
},
description="Fuzzing modules: coverage-guided fuzz testing"
)
# Scanner category configuration
SCANNER_CONFIG = CategoryBenchmarkConfig(
category=ModuleCategory.SCANNER,
expected_metrics=[
"files_per_sec",
"loc_per_sec",
"execution_time",
"peak_memory_mb",
"findings_count"
],
performance_thresholds={
"min_files_per_sec": 100, # Minimum files scanned per second
"min_loc_per_sec": 10000, # Minimum lines of code per second
"max_execution_time_small": 1.0,
"max_execution_time_medium": 10.0,
"max_memory_mb": 512,
},
description="File scanning modules: fast pattern-based scanning"
)
# Secret detection category configuration
SECRET_DETECTION_CONFIG = CategoryBenchmarkConfig(
category=ModuleCategory.SECRET_DETECTION,
expected_metrics=[
"patterns_per_sec",
"precision",
"recall",
"f1_score",
"false_positive_rate",
"execution_time",
"peak_memory_mb"
],
performance_thresholds={
"min_patterns_per_sec": 1000,
"min_precision": 0.90, # 90% precision target
"min_recall": 0.95, # 95% recall target
"max_false_positives": 5, # Max false positives per 100 secrets
"max_execution_time_small": 2.0,
"max_execution_time_medium": 20.0,
"max_memory_mb": 1024,
},
description="Secret detection modules: high precision pattern matching"
)
# Analyzer category configuration
ANALYZER_CONFIG = CategoryBenchmarkConfig(
category=ModuleCategory.ANALYZER,
expected_metrics=[
"analysis_depth",
"files_analyzed_per_sec",
"execution_time",
"peak_memory_mb",
"findings_count",
"accuracy"
],
performance_thresholds={
"min_files_per_sec": 10, # Slower than scanners due to deep analysis
"max_execution_time_small": 5.0,
"max_execution_time_medium": 60.0,
"max_memory_mb": 2048,
"min_accuracy": 0.85, # 85% accuracy target
},
description="Code analysis modules: deep semantic analysis"
)
# Reporter category configuration
REPORTER_CONFIG = CategoryBenchmarkConfig(
category=ModuleCategory.REPORTER,
expected_metrics=[
"report_generation_time",
"findings_per_sec",
"peak_memory_mb"
],
performance_thresholds={
"max_report_time_100_findings": 1.0, # Max 1 second for 100 findings
"max_report_time_1000_findings": 10.0, # Max 10 seconds for 1000 findings
"max_memory_mb": 256,
},
description="Reporting modules: fast report generation"
)
# Category configurations map
CATEGORY_CONFIGS = {
ModuleCategory.FUZZER: FUZZER_CONFIG,
ModuleCategory.SCANNER: SCANNER_CONFIG,
ModuleCategory.SECRET_DETECTION: SECRET_DETECTION_CONFIG,
ModuleCategory.ANALYZER: ANALYZER_CONFIG,
ModuleCategory.REPORTER: REPORTER_CONFIG,
}
def get_category_config(category: ModuleCategory) -> CategoryBenchmarkConfig:
"""Get benchmark configuration for a category"""
return CATEGORY_CONFIGS[category]
def get_threshold(category: ModuleCategory, metric: str) -> float:
"""Get performance threshold for a specific metric"""
config = get_category_config(category)
return config.performance_thresholds.get(metric, 0.0)
+60
View File
@@ -0,0 +1,60 @@
"""
Benchmark fixtures and configuration
"""
import sys
from pathlib import Path
import pytest
# Add parent directories to path
BACKEND_ROOT = Path(__file__).resolve().parents[1]
TOOLBOX = BACKEND_ROOT / "toolbox"
if str(BACKEND_ROOT) not in sys.path:
sys.path.insert(0, str(BACKEND_ROOT))
if str(TOOLBOX) not in sys.path:
sys.path.insert(0, str(TOOLBOX))
# ============================================================================
# Benchmark Fixtures
# ============================================================================
@pytest.fixture(scope="session")
def benchmark_fixtures_dir():
"""Path to benchmark fixtures directory"""
return Path(__file__).parent / "fixtures"
@pytest.fixture(scope="session")
def small_project_fixture(benchmark_fixtures_dir):
"""Small project fixture (~1K LOC)"""
return benchmark_fixtures_dir / "small"
@pytest.fixture(scope="session")
def medium_project_fixture(benchmark_fixtures_dir):
"""Medium project fixture (~10K LOC)"""
return benchmark_fixtures_dir / "medium"
@pytest.fixture(scope="session")
def large_project_fixture(benchmark_fixtures_dir):
"""Large project fixture (~100K LOC)"""
return benchmark_fixtures_dir / "large"
# ============================================================================
# pytest-benchmark Configuration
# ============================================================================
def pytest_configure(config):
"""Configure pytest-benchmark"""
config.addinivalue_line(
"markers", "benchmark: mark test as a benchmark"
)
def pytest_benchmark_group_stats(config, benchmarks, group_by):
"""Group benchmark results by category"""
return group_by
+17 -1
View File
@@ -7,7 +7,8 @@ readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.116.1",
"prefect>=3.4.18",
"temporalio>=1.6.0",
"boto3>=1.34.0",
"pydantic>=2.0.0",
"pyyaml>=6.0",
"docker>=7.0.0",
@@ -21,5 +22,20 @@ dependencies = [
dev = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"pytest-benchmark>=4.0.0",
"pytest-cov>=5.0.0",
"pytest-xdist>=3.5.0",
"pytest-mock>=3.12.0",
"httpx>=0.27.0",
"ruff>=0.1.0",
]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests", "benchmarks"]
python_files = ["test_*.py", "bench_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
markers = [
"benchmark: mark test as a benchmark",
]
+4 -4
View File
@@ -14,8 +14,8 @@ API endpoints for fuzzing workflow management and real-time monitoring
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from typing import List, Dict, Any
from fastapi import APIRouter, HTTPException, Depends, WebSocket, WebSocketDisconnect
from typing import List, Dict
from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
from fastapi.responses import StreamingResponse
import asyncio
import json
@@ -25,7 +25,6 @@ from src.models.findings import (
FuzzingStats,
CrashReport
)
from src.core.workflow_discovery import WorkflowDiscovery
logger = logging.getLogger(__name__)
@@ -126,12 +125,13 @@ async def update_fuzzing_stats(run_id: str, stats: FuzzingStats):
# Debug: log reception for live instrumentation
try:
logger.info(
"Received fuzzing stats update: run_id=%s exec=%s eps=%.2f crashes=%s corpus=%s elapsed=%ss",
"Received fuzzing stats update: run_id=%s exec=%s eps=%.2f crashes=%s corpus=%s coverage=%s elapsed=%ss",
run_id,
stats.executions,
stats.executions_per_sec,
stats.crashes,
stats.corpus_size,
stats.coverage,
stats.elapsed_time,
)
except Exception:
+49 -56
View File
@@ -14,7 +14,6 @@ API endpoints for workflow run management and findings retrieval
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from typing import Dict, Any
from fastapi import APIRouter, HTTPException, Depends
from src.models.findings import WorkflowFindings, WorkflowStatus
@@ -24,22 +23,22 @@ logger = logging.getLogger(__name__)
router = APIRouter(prefix="/runs", tags=["runs"])
def get_prefect_manager():
"""Dependency to get the Prefect manager instance"""
from src.main import prefect_mgr
return prefect_mgr
def get_temporal_manager():
"""Dependency to get the Temporal manager instance"""
from src.main import temporal_mgr
return temporal_mgr
@router.get("/{run_id}/status", response_model=WorkflowStatus)
async def get_run_status(
run_id: str,
prefect_mgr=Depends(get_prefect_manager)
temporal_mgr=Depends(get_temporal_manager)
) -> WorkflowStatus:
"""
Get the current status of a workflow run.
Args:
run_id: The flow run ID
run_id: The workflow run ID
Returns:
Status information including state, timestamps, and completion flags
@@ -48,25 +47,23 @@ async def get_run_status(
HTTPException: 404 if run not found
"""
try:
status = await prefect_mgr.get_flow_run_status(run_id)
status = await temporal_mgr.get_workflow_status(run_id)
# Find workflow name from deployment
workflow_name = "unknown"
workflow_deployment_id = status.get("workflow", "")
for name, deployment_id in prefect_mgr.deployments.items():
if str(deployment_id) == str(workflow_deployment_id):
workflow_name = name
break
# Map Temporal status to response format
workflow_status = status.get("status", "UNKNOWN")
is_completed = workflow_status in ["COMPLETED", "FAILED", "CANCELLED"]
is_failed = workflow_status == "FAILED"
is_running = workflow_status == "RUNNING"
return WorkflowStatus(
run_id=status["run_id"],
workflow=workflow_name,
status=status["status"],
is_completed=status["is_completed"],
is_failed=status["is_failed"],
is_running=status["is_running"],
created_at=status["created_at"],
updated_at=status["updated_at"]
run_id=run_id,
workflow="unknown", # Temporal doesn't track workflow name in status
status=workflow_status,
is_completed=is_completed,
is_failed=is_failed,
is_running=is_running,
created_at=status.get("start_time"),
updated_at=status.get("close_time") or status.get("execution_time")
)
except Exception as e:
@@ -80,13 +77,13 @@ async def get_run_status(
@router.get("/{run_id}/findings", response_model=WorkflowFindings)
async def get_run_findings(
run_id: str,
prefect_mgr=Depends(get_prefect_manager)
temporal_mgr=Depends(get_temporal_manager)
) -> WorkflowFindings:
"""
Get the findings from a completed workflow run.
Args:
run_id: The flow run ID
run_id: The workflow run ID
Returns:
SARIF-formatted findings from the workflow execution
@@ -96,50 +93,46 @@ async def get_run_findings(
"""
try:
# Get run status first
status = await prefect_mgr.get_flow_run_status(run_id)
status = await temporal_mgr.get_workflow_status(run_id)
workflow_status = status.get("status", "UNKNOWN")
if not status["is_completed"]:
if status["is_running"]:
if workflow_status not in ["COMPLETED", "FAILED", "CANCELLED"]:
if workflow_status == "RUNNING":
raise HTTPException(
status_code=400,
detail=f"Run {run_id} is still running. Current status: {status['status']}"
)
elif status["is_failed"]:
raise HTTPException(
status_code=400,
detail=f"Run {run_id} failed. Status: {status['status']}"
detail=f"Run {run_id} is still running. Current status: {workflow_status}"
)
else:
raise HTTPException(
status_code=400,
detail=f"Run {run_id} not completed. Status: {status['status']}"
detail=f"Run {run_id} not completed. Status: {workflow_status}"
)
# Get the findings
findings = await prefect_mgr.get_flow_run_findings(run_id)
if workflow_status == "FAILED":
raise HTTPException(
status_code=400,
detail=f"Run {run_id} failed. Status: {workflow_status}"
)
# Find workflow name
workflow_name = "unknown"
workflow_deployment_id = status.get("workflow", "")
for name, deployment_id in prefect_mgr.deployments.items():
if str(deployment_id) == str(workflow_deployment_id):
workflow_name = name
break
# Get the workflow result
result = await temporal_mgr.get_workflow_result(run_id)
# Get workflow version if available
# Extract SARIF from result (handle None for backwards compatibility)
if isinstance(result, dict):
sarif = result.get("sarif") or {}
else:
sarif = {}
# Metadata
metadata = {
"completion_time": status["updated_at"],
"completion_time": status.get("close_time"),
"workflow_version": "unknown"
}
if workflow_name in prefect_mgr.workflows:
workflow_info = prefect_mgr.workflows[workflow_name]
metadata["workflow_version"] = workflow_info.metadata.get("version", "unknown")
return WorkflowFindings(
workflow=workflow_name,
workflow="unknown",
run_id=run_id,
sarif=findings,
sarif=sarif,
metadata=metadata
)
@@ -157,7 +150,7 @@ async def get_run_findings(
async def get_workflow_findings(
workflow_name: str,
run_id: str,
prefect_mgr=Depends(get_prefect_manager)
temporal_mgr=Depends(get_temporal_manager)
) -> WorkflowFindings:
"""
Get findings for a specific workflow run.
@@ -166,7 +159,7 @@ async def get_workflow_findings(
Args:
workflow_name: Name of the workflow
run_id: The flow run ID
run_id: The workflow run ID
Returns:
SARIF-formatted findings from the workflow execution
@@ -174,11 +167,11 @@ async def get_workflow_findings(
Raises:
HTTPException: 404 if workflow or run not found, 400 if run not completed
"""
if workflow_name not in prefect_mgr.workflows:
if workflow_name not in temporal_mgr.workflows:
raise HTTPException(
status_code=404,
detail=f"Workflow not found: {workflow_name}"
)
# Delegate to the main findings endpoint
return await get_run_findings(run_id, prefect_mgr)
return await get_run_findings(run_id, temporal_mgr)
+307 -59
View File
@@ -15,8 +15,9 @@ API endpoints for workflow management with enhanced error handling
import logging
import traceback
import tempfile
from typing import List, Dict, Any, Optional
from fastapi import APIRouter, HTTPException, Depends
from fastapi import APIRouter, HTTPException, Depends, UploadFile, File, Form
from pathlib import Path
from src.models.findings import (
@@ -25,10 +26,20 @@ from src.models.findings import (
WorkflowListItem,
RunSubmissionResponse
)
from src.core.workflow_discovery import WorkflowDiscovery
from src.temporal.discovery import WorkflowDiscovery
logger = logging.getLogger(__name__)
# Configuration for file uploads
MAX_UPLOAD_SIZE = 10 * 1024 * 1024 * 1024 # 10 GB
ALLOWED_CONTENT_TYPES = [
"application/gzip",
"application/x-gzip",
"application/x-tar",
"application/x-compressed-tar",
"application/octet-stream", # Generic binary
]
router = APIRouter(prefix="/workflows", tags=["workflows"])
@@ -68,15 +79,15 @@ def create_structured_error_response(
return error_response
def get_prefect_manager():
"""Dependency to get the Prefect manager instance"""
from src.main import prefect_mgr
return prefect_mgr
def get_temporal_manager():
"""Dependency to get the Temporal manager instance"""
from src.main import temporal_mgr
return temporal_mgr
@router.get("/", response_model=List[WorkflowListItem])
async def list_workflows(
prefect_mgr=Depends(get_prefect_manager)
temporal_mgr=Depends(get_temporal_manager)
) -> List[WorkflowListItem]:
"""
List all discovered workflows with their metadata.
@@ -85,7 +96,7 @@ async def list_workflows(
author, and tags.
"""
workflows = []
for name, info in prefect_mgr.workflows.items():
for name, info in temporal_mgr.workflows.items():
workflows.append(WorkflowListItem(
name=name,
version=info.metadata.get("version", "0.6.0"),
@@ -111,7 +122,7 @@ async def get_metadata_schema() -> Dict[str, Any]:
@router.get("/{workflow_name}/metadata", response_model=WorkflowMetadata)
async def get_workflow_metadata(
workflow_name: str,
prefect_mgr=Depends(get_prefect_manager)
temporal_mgr=Depends(get_temporal_manager)
) -> WorkflowMetadata:
"""
Get complete metadata for a specific workflow.
@@ -126,8 +137,8 @@ async def get_workflow_metadata(
Raises:
HTTPException: 404 if workflow not found
"""
if workflow_name not in prefect_mgr.workflows:
available_workflows = list(prefect_mgr.workflows.keys())
if workflow_name not in temporal_mgr.workflows:
available_workflows = list(temporal_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
@@ -143,7 +154,7 @@ async def get_workflow_metadata(
detail=error_response
)
info = prefect_mgr.workflows[workflow_name]
info = temporal_mgr.workflows[workflow_name]
metadata = info.metadata
return WorkflowMetadata(
@@ -154,9 +165,7 @@ async def get_workflow_metadata(
tags=metadata.get("tags", []),
parameters=metadata.get("parameters", {}),
default_parameters=metadata.get("default_parameters", {}),
required_modules=metadata.get("required_modules", []),
supported_volume_modes=metadata.get("supported_volume_modes", ["ro", "rw"]),
has_custom_docker=info.has_docker
required_modules=metadata.get("required_modules", [])
)
@@ -164,14 +173,14 @@ async def get_workflow_metadata(
async def submit_workflow(
workflow_name: str,
submission: WorkflowSubmission,
prefect_mgr=Depends(get_prefect_manager)
temporal_mgr=Depends(get_temporal_manager)
) -> RunSubmissionResponse:
"""
Submit a workflow for execution with volume mounting.
Submit a workflow for execution.
Args:
workflow_name: Name of the workflow to execute
submission: Submission parameters including target path and volume mode
submission: Submission parameters including target path and parameters
Returns:
Run submission response with run_id and initial status
@@ -179,8 +188,8 @@ async def submit_workflow(
Raises:
HTTPException: 404 if workflow not found, 400 for invalid parameters
"""
if workflow_name not in prefect_mgr.workflows:
available_workflows = list(prefect_mgr.workflows.keys())
if workflow_name not in temporal_mgr.workflows:
available_workflows = list(temporal_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
@@ -197,31 +206,36 @@ async def submit_workflow(
)
try:
# Convert ResourceLimits to dict if provided
resource_limits_dict = None
if submission.resource_limits:
resource_limits_dict = {
"cpu_limit": submission.resource_limits.cpu_limit,
"memory_limit": submission.resource_limits.memory_limit,
"cpu_request": submission.resource_limits.cpu_request,
"memory_request": submission.resource_limits.memory_request
}
# Upload target file to MinIO and get target_id
target_path = Path(submission.target_path)
if not target_path.exists():
raise ValueError(f"Target path does not exist: {submission.target_path}")
# Submit the workflow with enhanced parameters
flow_run = await prefect_mgr.submit_workflow(
workflow_name=workflow_name,
target_path=submission.target_path,
volume_mode=submission.volume_mode,
parameters=submission.parameters,
resource_limits=resource_limits_dict,
additional_volumes=submission.additional_volumes,
timeout=submission.timeout
# Upload target (using anonymous user for now)
target_id = await temporal_mgr.upload_target(
file_path=target_path,
user_id="api-user",
metadata={"workflow": workflow_name}
)
run_id = str(flow_run.id)
# Merge default parameters with user parameters
workflow_info = temporal_mgr.workflows[workflow_name]
metadata = workflow_info.metadata or {}
defaults = metadata.get("default_parameters", {})
user_params = submission.parameters or {}
workflow_params = {**defaults, **user_params}
# Start workflow execution
handle = await temporal_mgr.run_workflow(
workflow_name=workflow_name,
target_id=target_id,
workflow_params=workflow_params
)
run_id = handle.id
# Initialize fuzzing tracking if this looks like a fuzzing workflow
workflow_info = prefect_mgr.workflows.get(workflow_name, {})
workflow_info = temporal_mgr.workflows.get(workflow_name, {})
workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
from src.api.fuzzing import initialize_fuzzing_tracking
@@ -229,7 +243,7 @@ async def submit_workflow(
return RunSubmissionResponse(
run_id=run_id,
status=flow_run.state.name if flow_run.state else "PENDING",
status="RUNNING",
workflow=workflow_name,
message=f"Workflow '{workflow_name}' submitted successfully"
)
@@ -261,17 +275,13 @@ async def submit_workflow(
error_type = "WorkflowSubmissionError"
# Detect specific error patterns
if "deployment" in error_message.lower():
error_type = "DeploymentError"
deployment_info = {
"status": "failed",
"error": error_message
}
if "workflow" in error_message.lower() and "not found" in error_message.lower():
error_type = "WorkflowError"
suggestions.extend([
"Check if Prefect server is running and accessible",
"Verify Docker is running and has sufficient resources",
"Check container image availability",
"Ensure volume paths exist and are accessible"
"Check if Temporal server is running and accessible",
"Verify workflow workers are running",
"Check if workflow is registered with correct vertical",
"Ensure Docker is running and has sufficient resources"
])
elif "volume" in error_message.lower() or "mount" in error_message.lower():
@@ -324,25 +334,200 @@ async def submit_workflow(
)
@router.get("/{workflow_name}/parameters")
async def get_workflow_parameters(
@router.post("/{workflow_name}/upload-and-submit", response_model=RunSubmissionResponse)
async def upload_and_submit_workflow(
workflow_name: str,
prefect_mgr=Depends(get_prefect_manager)
file: UploadFile = File(..., description="Target file or tarball to analyze"),
parameters: Optional[str] = Form(None, description="JSON-encoded workflow parameters"),
timeout: Optional[int] = Form(None, description="Timeout in seconds"),
temporal_mgr=Depends(get_temporal_manager)
) -> RunSubmissionResponse:
"""
Upload a target file/tarball and submit workflow for execution.
This endpoint accepts multipart/form-data uploads and is the recommended
way to submit workflows from remote CLI clients.
Args:
workflow_name: Name of the workflow to execute
file: Target file or tarball (compressed directory)
parameters: JSON string of workflow parameters (optional)
timeout: Execution timeout in seconds (optional)
Returns:
Run submission response with run_id and initial status
Raises:
HTTPException: 404 if workflow not found, 400 for invalid parameters,
413 if file too large
"""
if workflow_name not in temporal_mgr.workflows:
available_workflows = list(temporal_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
workflow_name=workflow_name,
suggestions=[
f"Available workflows: {', '.join(available_workflows)}",
"Use GET /workflows/ to see all available workflows"
]
)
raise HTTPException(status_code=404, detail=error_response)
temp_file_path = None
try:
# Validate file size
file_size = 0
chunk_size = 1024 * 1024 # 1MB chunks
# Create temporary file
temp_fd, temp_file_path = tempfile.mkstemp(suffix=".tar.gz")
logger.info(f"Receiving file upload for workflow '{workflow_name}': {file.filename}")
# Stream file to disk
with open(temp_fd, 'wb') as temp_file:
while True:
chunk = await file.read(chunk_size)
if not chunk:
break
file_size += len(chunk)
# Check size limit
if file_size > MAX_UPLOAD_SIZE:
raise HTTPException(
status_code=413,
detail=create_structured_error_response(
error_type="FileTooLarge",
message=f"File size exceeds maximum allowed size of {MAX_UPLOAD_SIZE / (1024**3):.1f} GB",
workflow_name=workflow_name,
suggestions=[
"Reduce the size of your target directory",
"Exclude unnecessary files (build artifacts, dependencies, etc.)",
"Consider splitting into smaller analysis targets"
]
)
)
temp_file.write(chunk)
logger.info(f"Received file: {file_size / (1024**2):.2f} MB")
# Parse parameters
workflow_params = {}
if parameters:
try:
import json
workflow_params = json.loads(parameters)
if not isinstance(workflow_params, dict):
raise ValueError("Parameters must be a JSON object")
except (json.JSONDecodeError, ValueError) as e:
raise HTTPException(
status_code=400,
detail=create_structured_error_response(
error_type="InvalidParameters",
message=f"Invalid parameters JSON: {e}",
workflow_name=workflow_name,
suggestions=["Ensure parameters is valid JSON object"]
)
)
# Upload to MinIO
target_id = await temporal_mgr.upload_target(
file_path=Path(temp_file_path),
user_id="api-user",
metadata={
"workflow": workflow_name,
"original_filename": file.filename,
"upload_method": "multipart"
}
)
logger.info(f"Uploaded to MinIO with target_id: {target_id}")
# Merge default parameters with user parameters
workflow_info = temporal_mgr.workflows.get(workflow_name)
metadata = workflow_info.metadata or {}
defaults = metadata.get("default_parameters", {})
workflow_params = {**defaults, **workflow_params}
# Start workflow execution
handle = await temporal_mgr.run_workflow(
workflow_name=workflow_name,
target_id=target_id,
workflow_params=workflow_params
)
run_id = handle.id
# Initialize fuzzing tracking if needed
workflow_info = temporal_mgr.workflows.get(workflow_name, {})
workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
from src.api.fuzzing import initialize_fuzzing_tracking
initialize_fuzzing_tracking(run_id, workflow_name)
return RunSubmissionResponse(
run_id=run_id,
status="RUNNING",
workflow=workflow_name,
message=f"Workflow '{workflow_name}' submitted successfully with uploaded target"
)
except HTTPException:
raise
except Exception as e:
logger.error(f"Failed to upload and submit workflow '{workflow_name}': {e}")
logger.error(f"Traceback: {traceback.format_exc()}")
error_response = create_structured_error_response(
error_type="WorkflowSubmissionError",
message=f"Failed to process upload and submit workflow: {str(e)}",
workflow_name=workflow_name,
suggestions=[
"Check if the uploaded file is a valid tarball",
"Verify MinIO storage is accessible",
"Check backend logs for detailed error information",
"Ensure Temporal workers are running"
]
)
raise HTTPException(status_code=500, detail=error_response)
finally:
# Cleanup temporary file
if temp_file_path and Path(temp_file_path).exists():
try:
Path(temp_file_path).unlink()
logger.debug(f"Cleaned up temp file: {temp_file_path}")
except Exception as e:
logger.warning(f"Failed to cleanup temp file {temp_file_path}: {e}")
@router.get("/{workflow_name}/worker-info")
async def get_workflow_worker_info(
workflow_name: str,
temporal_mgr=Depends(get_temporal_manager)
) -> Dict[str, Any]:
"""
Get the parameters schema for a workflow.
Get worker information for a workflow.
Returns details about which worker is required to execute this workflow,
including container name, task queue, and vertical.
Args:
workflow_name: Name of the workflow
Returns:
Parameters schema with types, descriptions, and defaults
Worker information including vertical, container name, and task queue
Raises:
HTTPException: 404 if workflow not found
"""
if workflow_name not in prefect_mgr.workflows:
available_workflows = list(prefect_mgr.workflows.keys())
if workflow_name not in temporal_mgr.workflows:
available_workflows = list(temporal_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
@@ -357,7 +542,70 @@ async def get_workflow_parameters(
detail=error_response
)
info = prefect_mgr.workflows[workflow_name]
info = temporal_mgr.workflows[workflow_name]
metadata = info.metadata
# Extract vertical from metadata
vertical = metadata.get("vertical")
if not vertical:
error_response = create_structured_error_response(
error_type="MissingVertical",
message=f"Workflow '{workflow_name}' does not specify a vertical in metadata",
workflow_name=workflow_name,
suggestions=[
"Check workflow metadata.yaml for 'vertical' field",
"Contact workflow author for support"
]
)
raise HTTPException(
status_code=500,
detail=error_response
)
return {
"workflow": workflow_name,
"vertical": vertical,
"worker_container": f"fuzzforge-worker-{vertical}",
"task_queue": f"{vertical}-queue",
"required": True
}
@router.get("/{workflow_name}/parameters")
async def get_workflow_parameters(
workflow_name: str,
temporal_mgr=Depends(get_temporal_manager)
) -> Dict[str, Any]:
"""
Get the parameters schema for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Parameters schema with types, descriptions, and defaults
Raises:
HTTPException: 404 if workflow not found
"""
if workflow_name not in temporal_mgr.workflows:
available_workflows = list(temporal_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
workflow_name=workflow_name,
suggestions=[
f"Available workflows: {', '.join(available_workflows)}",
"Use GET /workflows/ to see all available workflows"
]
)
raise HTTPException(
status_code=404,
detail=error_response
)
info = temporal_mgr.workflows[workflow_name]
metadata = info.metadata
# Return parameters with enhanced schema information
-770
View File
@@ -1,770 +0,0 @@
"""
Prefect Manager - Core orchestration for workflow deployment and execution
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import os
import platform
import re
from pathlib import Path
from typing import Dict, Optional, Any
from prefect import get_client
from prefect.docker import DockerImage
from prefect.client.schemas import FlowRun
from src.core.workflow_discovery import WorkflowDiscovery, WorkflowInfo
logger = logging.getLogger(__name__)
def get_registry_url(context: str = "default") -> str:
"""
Get the container registry URL to use for a given operation context.
Goals:
- Work reliably across Linux and macOS Docker Desktop
- Prefer in-network service discovery when running inside containers
- Allow full override via env vars from docker-compose
Env overrides:
- FUZZFORGE_REGISTRY_PUSH_URL: used for image builds/pushes
- FUZZFORGE_REGISTRY_PULL_URL: used for workers to pull images
"""
# Normalize context
ctx = (context or "default").lower()
# Always honor explicit overrides first
if ctx in ("push", "build"):
push_url = os.getenv("FUZZFORGE_REGISTRY_PUSH_URL")
if push_url:
logger.debug("Using FUZZFORGE_REGISTRY_PUSH_URL: %s", push_url)
return push_url
# Default to host-published registry for Docker daemon operations
return "localhost:5001"
if ctx == "pull":
pull_url = os.getenv("FUZZFORGE_REGISTRY_PULL_URL")
if pull_url:
logger.debug("Using FUZZFORGE_REGISTRY_PULL_URL: %s", pull_url)
return pull_url
# Prefect worker pulls via host Docker daemon as well
return "localhost:5001"
# Default/fallback
return os.getenv("FUZZFORGE_REGISTRY_PULL_URL", os.getenv("FUZZFORGE_REGISTRY_PUSH_URL", "localhost:5001"))
def _compose_project_name(default: str = "fuzzforge") -> str:
"""Return the docker-compose project name used for network/volume naming.
Always returns 'fuzzforge' regardless of environment variables.
"""
return "fuzzforge"
class PrefectManager:
"""
Manages Prefect deployments and flow runs for discovered workflows.
This class handles:
- Workflow discovery and registration
- Docker image building through Prefect
- Deployment creation and management
- Flow run submission with volume mounting
- Findings retrieval from completed runs
"""
def __init__(self, workflows_dir: Path = None):
"""
Initialize the Prefect manager.
Args:
workflows_dir: Path to the workflows directory (default: toolbox/workflows)
"""
if workflows_dir is None:
workflows_dir = Path("toolbox/workflows")
self.discovery = WorkflowDiscovery(workflows_dir)
self.workflows: Dict[str, WorkflowInfo] = {}
self.deployments: Dict[str, str] = {} # workflow_name -> deployment_id
# Security: Define allowed and forbidden paths for host mounting
self.allowed_base_paths = [
"/tmp",
"/home",
"/Users", # macOS users
"/opt",
"/var/tmp",
"/workspace", # Common container workspace
"/app" # Container application directory (for test projects)
]
self.forbidden_paths = [
"/etc",
"/root",
"/var/run",
"/sys",
"/proc",
"/dev",
"/boot",
"/var/lib/docker", # Critical Docker data
"/var/log", # System logs
"/usr/bin", # System binaries
"/usr/sbin",
"/sbin",
"/bin"
]
@staticmethod
def _parse_memory_to_bytes(memory_str: str) -> int:
"""
Parse memory string (like '512Mi', '1Gi') to bytes.
Args:
memory_str: Memory string with unit suffix
Returns:
Memory in bytes
Raises:
ValueError: If format is invalid
"""
if not memory_str:
return 0
match = re.match(r'^(\d+(?:\.\d+)?)\s*([GMK]i?)$', memory_str.strip())
if not match:
raise ValueError(f"Invalid memory format: {memory_str}. Expected format like '512Mi', '1Gi'")
value, unit = match.groups()
value = float(value)
# Convert to bytes based on unit (binary units: Ki, Mi, Gi)
if unit in ['K', 'Ki']:
multiplier = 1024
elif unit in ['M', 'Mi']:
multiplier = 1024 * 1024
elif unit in ['G', 'Gi']:
multiplier = 1024 * 1024 * 1024
else:
raise ValueError(f"Unsupported memory unit: {unit}")
return int(value * multiplier)
@staticmethod
def _parse_cpu_to_millicores(cpu_str: str) -> int:
"""
Parse CPU string (like '500m', '1', '2.5') to millicores.
Args:
cpu_str: CPU string
Returns:
CPU in millicores (1 core = 1000 millicores)
Raises:
ValueError: If format is invalid
"""
if not cpu_str:
return 0
cpu_str = cpu_str.strip()
# Handle millicores format (e.g., '500m')
if cpu_str.endswith('m'):
try:
return int(cpu_str[:-1])
except ValueError:
raise ValueError(f"Invalid CPU format: {cpu_str}")
# Handle core format (e.g., '1', '2.5')
try:
cores = float(cpu_str)
return int(cores * 1000) # Convert to millicores
except ValueError:
raise ValueError(f"Invalid CPU format: {cpu_str}")
def _extract_resource_requirements(self, workflow_info: WorkflowInfo) -> Dict[str, str]:
"""
Extract resource requirements from workflow metadata.
Args:
workflow_info: Workflow information with metadata
Returns:
Dictionary with resource requirements in Docker format
"""
metadata = workflow_info.metadata
requirements = metadata.get("requirements", {})
resources = requirements.get("resources", {})
resource_config = {}
# Extract memory requirement
memory = resources.get("memory")
if memory:
try:
# Validate memory format and store original string for Docker
self._parse_memory_to_bytes(memory)
resource_config["memory"] = memory
except ValueError as e:
logger.warning(f"Invalid memory requirement in {workflow_info.name}: {e}")
# Extract CPU requirement
cpu = resources.get("cpu")
if cpu:
try:
# Validate CPU format and store original string for Docker
self._parse_cpu_to_millicores(cpu)
resource_config["cpus"] = cpu
except ValueError as e:
logger.warning(f"Invalid CPU requirement in {workflow_info.name}: {e}")
# Extract timeout
timeout = resources.get("timeout")
if timeout and isinstance(timeout, int):
resource_config["timeout"] = str(timeout)
return resource_config
async def initialize(self):
"""
Initialize the manager by discovering and deploying all workflows.
This method:
1. Discovers all valid workflows in the workflows directory
2. Validates their metadata
3. Deploys each workflow to Prefect with Docker images
"""
try:
# Discover workflows
self.workflows = await self.discovery.discover_workflows()
if not self.workflows:
logger.warning("No workflows discovered")
return
logger.info(f"Discovered {len(self.workflows)} workflows: {list(self.workflows.keys())}")
# Deploy each workflow
for name, info in self.workflows.items():
try:
await self._deploy_workflow(name, info)
except Exception as e:
logger.error(f"Failed to deploy workflow '{name}': {e}")
except Exception as e:
logger.error(f"Failed to initialize Prefect manager: {e}")
raise
async def _deploy_workflow(self, name: str, info: WorkflowInfo):
"""
Deploy a single workflow to Prefect with Docker image.
Args:
name: Workflow name
info: Workflow information including metadata and paths
"""
logger.info(f"Deploying workflow '{name}'...")
# Get the flow function from registry
flow_func = self.discovery.get_flow_function(name)
if not flow_func:
logger.error(
f"Failed to get flow function for '{name}' from registry. "
f"Ensure the workflow is properly registered in toolbox/workflows/registry.py"
)
return
# Use the mandatory Dockerfile with absolute paths for Docker Compose
# Get absolute paths for build context and dockerfile
toolbox_path = info.path.parent.parent.resolve()
dockerfile_abs_path = info.dockerfile.resolve()
# Calculate relative dockerfile path from toolbox context
try:
dockerfile_rel_path = dockerfile_abs_path.relative_to(toolbox_path)
except ValueError:
# If relative path fails, use the workflow-specific path
dockerfile_rel_path = Path("workflows") / name / "Dockerfile"
# Determine deployment strategy based on Dockerfile presence
base_image = "prefecthq/prefect:3-python3.11"
has_custom_dockerfile = info.has_docker and info.dockerfile.exists()
logger.info(f"=== DEPLOYMENT DEBUG for '{name}' ===")
logger.info(f"info.has_docker: {info.has_docker}")
logger.info(f"info.dockerfile: {info.dockerfile}")
logger.info(f"info.dockerfile.exists(): {info.dockerfile.exists()}")
logger.info(f"has_custom_dockerfile: {has_custom_dockerfile}")
logger.info(f"toolbox_path: {toolbox_path}")
logger.info(f"dockerfile_rel_path: {dockerfile_rel_path}")
if has_custom_dockerfile:
logger.info(f"Workflow '{name}' has custom Dockerfile - building custom image")
# Decide whether to use registry or keep images local to host engine
import os
# Default to using the local registry; set FUZZFORGE_USE_REGISTRY=false to bypass (not recommended)
use_registry = os.getenv("FUZZFORGE_USE_REGISTRY", "true").lower() == "true"
if use_registry:
registry_url = get_registry_url(context="push")
image_spec = DockerImage(
name=f"{registry_url}/fuzzforge/{name}",
tag="latest",
dockerfile=str(dockerfile_rel_path),
context=str(toolbox_path)
)
deploy_image = f"{registry_url}/fuzzforge/{name}:latest"
build_custom = True
push_custom = True
logger.info(f"Using registry: {registry_url} for '{name}'")
else:
# Single-host mode: build into host engine cache; no push required
image_spec = DockerImage(
name=f"fuzzforge/{name}",
tag="latest",
dockerfile=str(dockerfile_rel_path),
context=str(toolbox_path)
)
deploy_image = f"fuzzforge/{name}:latest"
build_custom = True
push_custom = False
logger.info("Using single-host image (no registry push): %s", deploy_image)
else:
logger.info(f"Workflow '{name}' using base image - no custom dependencies needed")
deploy_image = base_image
build_custom = False
push_custom = False
# Pre-validate registry connectivity when pushing
if push_custom:
try:
from .setup import validate_registry_connectivity
await validate_registry_connectivity(registry_url)
logger.info(f"Registry connectivity validated for {registry_url}")
except Exception as e:
logger.error(f"Registry connectivity validation failed for {registry_url}: {e}")
raise RuntimeError(f"Cannot deploy workflow '{name}': Registry {registry_url} is not accessible. {e}")
# Deploy the workflow
try:
# Ensure any previous deployment is removed so job variables are updated
try:
async with get_client() as client:
existing = await client.read_deployment_by_name(
f"{name}/{name}-deployment"
)
if existing:
logger.info(f"Removing existing deployment for '{name}' to refresh settings...")
await client.delete_deployment(existing.id)
except Exception:
# If not found or deletion fails, continue with deployment
pass
# Extract resource requirements from metadata
workflow_resource_requirements = self._extract_resource_requirements(info)
logger.info(f"Workflow '{name}' resource requirements: {workflow_resource_requirements}")
# Build job variables with resource requirements
job_variables = {
"image": deploy_image, # Use the worker-accessible registry name
"volumes": [], # Populated at run submission with toolbox mount
"env": {
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect",
"WORKFLOW_NAME": name
}
}
# Add resource requirements to job variables if present
if workflow_resource_requirements:
job_variables["resources"] = workflow_resource_requirements
# Prepare deployment parameters
deploy_params = {
"name": f"{name}-deployment",
"work_pool_name": "docker-pool",
"image": image_spec if has_custom_dockerfile else deploy_image,
"push": push_custom,
"build": build_custom,
"job_variables": job_variables
}
deployment = await flow_func.deploy(**deploy_params)
self.deployments[name] = str(deployment.id) if hasattr(deployment, 'id') else name
logger.info(f"Successfully deployed workflow '{name}'")
except Exception as e:
# Enhanced error reporting with more context
import traceback
logger.error(f"Failed to deploy workflow '{name}': {e}")
logger.error(f"Deployment traceback: {traceback.format_exc()}")
# Try to capture Docker-specific context
error_context = {
"workflow_name": name,
"has_dockerfile": has_custom_dockerfile,
"image_name": deploy_image if 'deploy_image' in locals() else "unknown",
"registry_url": registry_url if 'registry_url' in locals() else "unknown",
"error_type": type(e).__name__,
"error_message": str(e)
}
# Check for specific error patterns with detailed categorization
error_msg_lower = str(e).lower()
if "registry" in error_msg_lower and ("no such host" in error_msg_lower or "connection" in error_msg_lower):
error_context["category"] = "registry_connectivity_error"
error_context["solution"] = f"Cannot reach registry at {error_context['registry_url']}. Check Docker network and registry service."
elif "docker" in error_msg_lower:
error_context["category"] = "docker_error"
if "build" in error_msg_lower:
error_context["subcategory"] = "image_build_failed"
error_context["solution"] = "Check Dockerfile syntax and dependencies."
elif "pull" in error_msg_lower:
error_context["subcategory"] = "image_pull_failed"
error_context["solution"] = "Check if image exists in registry and network connectivity."
elif "push" in error_msg_lower:
error_context["subcategory"] = "image_push_failed"
error_context["solution"] = f"Check registry connectivity and push permissions to {error_context['registry_url']}."
elif "registry" in error_msg_lower:
error_context["category"] = "registry_error"
error_context["solution"] = "Check registry configuration and accessibility."
elif "prefect" in error_msg_lower:
error_context["category"] = "prefect_error"
error_context["solution"] = "Check Prefect server connectivity and deployment configuration."
else:
error_context["category"] = "unknown_deployment_error"
error_context["solution"] = "Check logs for more specific error details."
logger.error(f"Deployment error context: {error_context}")
# Raise enhanced exception with context
enhanced_error = Exception(f"Deployment failed for workflow '{name}': {str(e)} | Context: {error_context}")
enhanced_error.original_error = e
enhanced_error.context = error_context
raise enhanced_error
async def submit_workflow(
self,
workflow_name: str,
target_path: str,
volume_mode: str = "ro",
parameters: Dict[str, Any] = None,
resource_limits: Dict[str, str] = None,
additional_volumes: list = None,
timeout: int = None
) -> FlowRun:
"""
Submit a workflow for execution with volume mounting.
Args:
workflow_name: Name of the workflow to execute
target_path: Host path to mount as volume
volume_mode: Volume mount mode ("ro" for read-only, "rw" for read-write)
parameters: Workflow-specific parameters
resource_limits: CPU/memory limits for container
additional_volumes: List of additional volume mounts
timeout: Timeout in seconds
Returns:
FlowRun object with run information
Raises:
ValueError: If workflow not found or volume mode not supported
"""
if workflow_name not in self.workflows:
raise ValueError(f"Unknown workflow: {workflow_name}")
# Validate volume mode
workflow_info = self.workflows[workflow_name]
supported_modes = workflow_info.metadata.get("supported_volume_modes", ["ro", "rw"])
if volume_mode not in supported_modes:
raise ValueError(
f"Workflow '{workflow_name}' doesn't support volume mode '{volume_mode}'. "
f"Supported modes: {supported_modes}"
)
# Validate target path with security checks
self._validate_target_path(target_path)
# Validate additional volumes if provided
if additional_volumes:
for volume in additional_volumes:
self._validate_target_path(volume.host_path)
async with get_client() as client:
# Get the deployment, auto-redeploy once if missing
try:
deployment = await client.read_deployment_by_name(
f"{workflow_name}/{workflow_name}-deployment"
)
except Exception as e:
import traceback
logger.error(f"Failed to find deployment for workflow '{workflow_name}': {e}")
logger.error(f"Deployment lookup traceback: {traceback.format_exc()}")
# Attempt a one-time auto-deploy to recover from startup races
try:
logger.info(f"Auto-deploying missing workflow '{workflow_name}' and retrying...")
await self._deploy_workflow(workflow_name, workflow_info)
deployment = await client.read_deployment_by_name(
f"{workflow_name}/{workflow_name}-deployment"
)
except Exception as redeploy_exc:
# Enhanced error with context
error_context = {
"workflow_name": workflow_name,
"error_type": type(e).__name__,
"error_message": str(e),
"redeploy_error": str(redeploy_exc),
"available_deployments": list(self.deployments.keys()),
}
enhanced_error = ValueError(
f"Deployment not found and redeploy failed for workflow '{workflow_name}': {e} | Context: {error_context}"
)
enhanced_error.context = error_context
raise enhanced_error
# Determine the Docker Compose network name and volume names
# Hardcoded to 'fuzzforge' to avoid directory name dependencies
import os
compose_project = "fuzzforge"
docker_network = "fuzzforge_default"
# Build volume mounts
# Add toolbox volume mount for workflow code access
backend_toolbox_path = "/app/toolbox" # Path in backend container
# Hardcoded volume names
prefect_storage_volume = "fuzzforge_prefect_storage"
toolbox_code_volume = "fuzzforge_toolbox_code"
volumes = [
f"{target_path}:/workspace:{volume_mode}",
f"{prefect_storage_volume}:/prefect-storage", # Shared storage for results
f"{toolbox_code_volume}:/opt/prefect/toolbox:ro" # Mount workflow code
]
# Add additional volumes if provided
if additional_volumes:
for volume in additional_volumes:
volume_spec = f"{volume.host_path}:{volume.container_path}:{volume.mode}"
volumes.append(volume_spec)
# Build environment variables
env_vars = {
"PREFECT_API_URL": "http://prefect-server:4200/api", # Use internal network hostname
"PREFECT_LOGGING_LEVEL": "INFO",
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage", # Use shared storage
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true", # Enable result persistence
"PREFECT_DEFAULT_RESULT_STORAGE_BLOCK": "local-file-system/fuzzforge-results", # Use our storage block
"WORKSPACE_PATH": "/workspace",
"VOLUME_MODE": volume_mode,
"WORKFLOW_NAME": workflow_name
}
# Add additional volume paths to environment for easy access
if additional_volumes:
for i, volume in enumerate(additional_volumes):
env_vars[f"ADDITIONAL_VOLUME_{i}_PATH"] = volume.container_path
# Determine which image to use based on workflow configuration
workflow_info = self.workflows[workflow_name]
has_custom_dockerfile = workflow_info.has_docker and workflow_info.dockerfile.exists()
# Use pull context for worker to pull from registry
registry_url = get_registry_url(context="pull")
workflow_image = f"{registry_url}/fuzzforge/{workflow_name}:latest" if has_custom_dockerfile else "prefecthq/prefect:3-python3.11"
logger.debug(f"Worker will pull image: {workflow_image} (Registry: {registry_url})")
# Configure job variables with volume mounting and network access
job_variables = {
# Use custom image if available, otherwise base Prefect image
"image": workflow_image,
"volumes": volumes,
"networks": [docker_network], # Connect to Docker Compose network
"env": {
**env_vars,
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect/toolbox/workflows",
"WORKFLOW_NAME": workflow_name
}
}
# Apply resource requirements from workflow metadata and user overrides
workflow_resource_requirements = self._extract_resource_requirements(workflow_info)
final_resource_config = {}
# Start with workflow requirements as base
if workflow_resource_requirements:
final_resource_config.update(workflow_resource_requirements)
# Apply user-provided resource limits (overrides workflow defaults)
if resource_limits:
user_resource_config = {}
if resource_limits.get("cpu_limit"):
user_resource_config["cpus"] = resource_limits["cpu_limit"]
if resource_limits.get("memory_limit"):
user_resource_config["memory"] = resource_limits["memory_limit"]
# Note: cpu_request and memory_request are not directly supported by Docker
# but could be used for Kubernetes in the future
# User overrides take precedence
final_resource_config.update(user_resource_config)
# Apply final resource configuration
if final_resource_config:
job_variables["resources"] = final_resource_config
logger.info(f"Applied resource limits: {final_resource_config}")
# Merge parameters with defaults from metadata
default_params = workflow_info.metadata.get("default_parameters", {})
final_params = {**default_params, **(parameters or {})}
# Set flow parameters that match the flow signature
final_params["target_path"] = "/workspace" # Container path where volume is mounted
final_params["volume_mode"] = volume_mode
# Create and submit the flow run
# Pass job_variables to ensure network, volumes, and environment are configured
logger.info(f"Submitting flow with job_variables: {job_variables}")
logger.info(f"Submitting flow with parameters: {final_params}")
# Prepare flow run creation parameters
flow_run_params = {
"deployment_id": deployment.id,
"parameters": final_params,
"job_variables": job_variables
}
# Note: Timeout is handled through workflow-level configuration
# Additional timeout configuration can be added to deployment metadata if needed
flow_run = await client.create_flow_run_from_deployment(**flow_run_params)
logger.info(
f"Submitted workflow '{workflow_name}' with run_id: {flow_run.id}, "
f"target: {target_path}, mode: {volume_mode}"
)
return flow_run
async def get_flow_run_findings(self, run_id: str) -> Dict[str, Any]:
"""
Retrieve findings from a completed flow run.
Args:
run_id: The flow run ID
Returns:
Dictionary containing SARIF-formatted findings
Raises:
ValueError: If run not completed or not found
"""
async with get_client() as client:
flow_run = await client.read_flow_run(run_id)
if not flow_run.state.is_completed():
raise ValueError(
f"Flow run {run_id} not completed. Current status: {flow_run.state.name}"
)
# Get the findings from the flow run result
try:
findings = await flow_run.state.result()
return findings
except Exception as e:
logger.error(f"Failed to retrieve findings for run {run_id}: {e}")
raise ValueError(f"Failed to retrieve findings: {e}")
async def get_flow_run_status(self, run_id: str) -> Dict[str, Any]:
"""
Get the current status of a flow run.
Args:
run_id: The flow run ID
Returns:
Dictionary with status information
"""
async with get_client() as client:
flow_run = await client.read_flow_run(run_id)
return {
"run_id": str(flow_run.id),
"workflow": flow_run.deployment_id,
"status": flow_run.state.name,
"is_completed": flow_run.state.is_completed(),
"is_failed": flow_run.state.is_failed(),
"is_running": flow_run.state.is_running(),
"created_at": flow_run.created,
"updated_at": flow_run.updated
}
def _validate_target_path(self, target_path: str) -> None:
"""
Validate target path for security before mounting as volume.
Args:
target_path: Host path to validate
Raises:
ValueError: If path is not allowed for security reasons
"""
target = Path(target_path)
# Path must be absolute
if not target.is_absolute():
raise ValueError(f"Target path must be absolute: {target_path}")
# Resolve path to handle symlinks and relative components
try:
resolved_path = target.resolve()
except (OSError, RuntimeError) as e:
raise ValueError(f"Cannot resolve target path: {target_path} - {e}")
resolved_str = str(resolved_path)
# Check against forbidden paths first (more restrictive)
for forbidden in self.forbidden_paths:
if resolved_str.startswith(forbidden):
raise ValueError(
f"Access denied: Path '{target_path}' resolves to forbidden directory '{forbidden}'. "
f"This path contains sensitive system files and cannot be mounted."
)
# Check if path starts with any allowed base path
path_allowed = False
for allowed in self.allowed_base_paths:
if resolved_str.startswith(allowed):
path_allowed = True
break
if not path_allowed:
allowed_list = ", ".join(self.allowed_base_paths)
raise ValueError(
f"Access denied: Path '{target_path}' is not in allowed directories. "
f"Allowed base paths: {allowed_list}"
)
# Additional security checks
if resolved_str == "/":
raise ValueError("Cannot mount root filesystem")
# Warn if path doesn't exist (but don't block - it might be created later)
if not resolved_path.exists():
logger.warning(f"Target path does not exist: {target_path}")
logger.info(f"Path validation passed for: {target_path} -> {resolved_str}")
+10 -367
View File
@@ -1,5 +1,5 @@
"""
Setup utilities for Prefect infrastructure
Setup utilities for FuzzForge infrastructure
"""
# Copyright (c) 2025 FuzzingLabs
@@ -14,364 +14,21 @@ Setup utilities for Prefect infrastructure
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from prefect import get_client
from prefect.client.schemas.actions import WorkPoolCreate
from prefect.client.schemas.objects import WorkPool
from .prefect_manager import get_registry_url
logger = logging.getLogger(__name__)
async def setup_docker_pool():
"""
Create or update the Docker work pool for container execution.
This work pool is configured to:
- Connect to the local Docker daemon
- Support volume mounting at runtime
- Clean up containers after execution
- Use bridge networking by default
"""
import os
async with get_client() as client:
pool_name = "docker-pool"
# Add force recreation flag for debugging fresh install issues
force_recreate = os.getenv('FORCE_RECREATE_WORK_POOL', 'false').lower() == 'true'
debug_setup = os.getenv('DEBUG_WORK_POOL_SETUP', 'false').lower() == 'true'
if force_recreate:
logger.warning(f"FORCE_RECREATE_WORK_POOL=true - Will recreate work pool regardless of existing configuration")
if debug_setup:
logger.warning(f"DEBUG_WORK_POOL_SETUP=true - Enhanced logging enabled")
# Temporarily set logging level to DEBUG for this function
original_level = logger.level
logger.setLevel(logging.DEBUG)
try:
# Check if pool already exists and supports custom images
existing_pools = await client.read_work_pools()
existing_pool = None
for pool in existing_pools:
if pool.name == pool_name:
existing_pool = pool
break
if existing_pool and not force_recreate:
logger.info(f"Found existing work pool '{pool_name}' - validating configuration...")
# Check if the existing pool has the correct configuration
base_template = existing_pool.base_job_template or {}
logger.debug(f"Base template keys: {list(base_template.keys())}")
job_config = base_template.get("job_configuration", {})
logger.debug(f"Job config keys: {list(job_config.keys())}")
image_config = job_config.get("image", "")
has_image_variable = "{{ image }}" in str(image_config)
logger.debug(f"Image config: '{image_config}' -> has_image_variable: {has_image_variable}")
# Check if volume defaults include toolbox mount
variables = base_template.get("variables", {})
properties = variables.get("properties", {})
volume_config = properties.get("volumes", {})
volume_defaults = volume_config.get("default", [])
has_toolbox_volume = any("toolbox_code" in str(vol) for vol in volume_defaults) if volume_defaults else False
logger.debug(f"Volume defaults: {volume_defaults}")
logger.debug(f"Has toolbox volume: {has_toolbox_volume}")
# Check if environment defaults include required settings
env_config = properties.get("env", {})
env_defaults = env_config.get("default", {})
has_api_url = "PREFECT_API_URL" in env_defaults
has_storage_path = "PREFECT_LOCAL_STORAGE_PATH" in env_defaults
has_results_persist = "PREFECT_RESULTS_PERSIST_BY_DEFAULT" in env_defaults
has_required_env = has_api_url and has_storage_path and has_results_persist
logger.debug(f"Environment defaults: {env_defaults}")
logger.debug(f"Has API URL: {has_api_url}, Has storage path: {has_storage_path}, Has results persist: {has_results_persist}")
logger.debug(f"Has required env: {has_required_env}")
# Log the full validation result
logger.info(f"Work pool validation - Image: {has_image_variable}, Toolbox: {has_toolbox_volume}, Environment: {has_required_env}")
if has_image_variable and has_toolbox_volume and has_required_env:
logger.info(f"Docker work pool '{pool_name}' already exists with correct configuration")
return
else:
reasons = []
if not has_image_variable:
reasons.append("missing image template")
if not has_toolbox_volume:
reasons.append("missing toolbox volume mount")
if not has_required_env:
if not has_api_url:
reasons.append("missing PREFECT_API_URL")
if not has_storage_path:
reasons.append("missing PREFECT_LOCAL_STORAGE_PATH")
if not has_results_persist:
reasons.append("missing PREFECT_RESULTS_PERSIST_BY_DEFAULT")
logger.warning(f"Docker work pool '{pool_name}' exists but lacks: {', '.join(reasons)}. Recreating...")
# Delete the old pool and recreate it
try:
await client.delete_work_pool(pool_name)
logger.info(f"Deleted old work pool '{pool_name}'")
except Exception as e:
logger.warning(f"Failed to delete old work pool: {e}")
elif force_recreate and existing_pool:
logger.warning(f"Force recreation enabled - deleting existing work pool '{pool_name}'")
try:
await client.delete_work_pool(pool_name)
logger.info(f"Deleted existing work pool for force recreation")
except Exception as e:
logger.warning(f"Failed to delete work pool for force recreation: {e}")
logger.info(f"Creating Docker work pool '{pool_name}' with custom image support...")
# Create the work pool with proper Docker configuration
work_pool = WorkPoolCreate(
name=pool_name,
type="docker",
description="Docker work pool for FuzzForge workflows with custom image support",
base_job_template={
"job_configuration": {
"image": "{{ image }}", # Template variable for custom images
"volumes": "{{ volumes }}", # List of volume mounts
"env": "{{ env }}", # Environment variables
"networks": "{{ networks }}", # Docker networks
"stream_output": True,
"auto_remove": True,
"privileged": False,
"network_mode": None, # Use networks instead
"labels": {},
"command": None # Let the image's CMD/ENTRYPOINT run
},
"variables": {
"type": "object",
"properties": {
"image": {
"type": "string",
"title": "Docker Image",
"default": "prefecthq/prefect:3-python3.11",
"description": "Docker image for the flow run"
},
"volumes": {
"type": "array",
"title": "Volume Mounts",
"default": [
"fuzzforge_prefect_storage:/prefect-storage",
"fuzzforge_toolbox_code:/opt/prefect/toolbox:ro"
],
"description": "Volume mounts in format 'host:container:mode'",
"items": {
"type": "string"
}
},
"networks": {
"type": "array",
"title": "Docker Networks",
"default": ["fuzzforge_default"],
"description": "Docker networks to connect container to",
"items": {
"type": "string"
}
},
"env": {
"type": "object",
"title": "Environment Variables",
"default": {
"PREFECT_API_URL": "http://prefect-server:4200/api",
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage",
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true"
},
"description": "Environment variables for the container",
"additionalProperties": {
"type": "string"
}
}
}
}
}
)
await client.create_work_pool(work_pool)
logger.info(f"Created Docker work pool '{pool_name}'")
except Exception as e:
logger.error(f"Failed to setup Docker work pool: {e}")
raise
finally:
# Restore original logging level if debug mode was enabled
if debug_setup and 'original_level' in locals():
logger.setLevel(original_level)
def get_actual_compose_project_name():
"""
Return the hardcoded compose project name for FuzzForge.
Always returns 'fuzzforge' as per system requirements.
"""
logger.info("Using hardcoded compose project name: fuzzforge")
return "fuzzforge"
async def setup_result_storage():
"""
Create or update Prefect result storage block for findings persistence.
Setup result storage (MinIO).
This sets up a LocalFileSystem storage block pointing to the shared
/prefect-storage volume for result persistence.
MinIO is used for both target upload and result storage.
This is a placeholder for any MinIO-specific setup if needed.
"""
from prefect.filesystems import LocalFileSystem
storage_name = "fuzzforge-results"
try:
# Create the storage block, overwrite if it exists
logger.info(f"Setting up storage block '{storage_name}'...")
storage = LocalFileSystem(basepath="/prefect-storage")
block_doc_id = await storage.save(name=storage_name, overwrite=True)
logger.info(f"Storage block '{storage_name}' configured successfully")
return str(block_doc_id)
except Exception as e:
logger.error(f"Failed to setup result storage: {e}")
# Don't raise the exception - continue without storage block
logger.warning("Continuing without result storage block - findings may not persist")
return None
async def validate_docker_connection():
"""
Validate that Docker is accessible and running.
Note: In containerized deployments with Docker socket proxy,
the backend doesn't need direct Docker access.
Raises:
RuntimeError: If Docker is not accessible
"""
import os
# Skip Docker validation if running in container without socket access
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
logger.info("Running in container without Docker socket - skipping Docker validation")
return
try:
import docker
client = docker.from_env()
client.ping()
logger.info("Docker connection validated")
except Exception as e:
logger.error(f"Docker is not accessible: {e}")
raise RuntimeError(
"Docker is not running or not accessible. "
"Please ensure Docker is installed and running."
)
async def validate_registry_connectivity(registry_url: str = None):
"""
Validate that the Docker registry is accessible.
Args:
registry_url: URL of the Docker registry to validate (auto-detected if None)
Raises:
RuntimeError: If registry is not accessible
"""
# Resolve a reachable test URL from within this process
if registry_url is None:
# If not specified, prefer internal service name in containers, host port on host
import os
if os.path.exists('/.dockerenv'):
registry_url = "registry:5000"
else:
registry_url = "localhost:5001"
# If we're running inside a container and asked to probe localhost:PORT,
# the probe would hit the container, not the host. Use host.docker.internal instead.
import os
try:
host_part, port_part = registry_url.split(":", 1)
except ValueError:
host_part, port_part = registry_url, "80"
if os.path.exists('/.dockerenv') and host_part in ("localhost", "127.0.0.1"):
test_host = "host.docker.internal"
else:
test_host = host_part
test_url = f"http://{test_host}:{port_part}/v2/"
import aiohttp
import asyncio
logger.info(f"Validating registry connectivity to {registry_url}...")
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
async with session.get(test_url) as response:
if response.status == 200:
logger.info(f"Registry at {registry_url} is accessible (tested via {test_host})")
return
else:
raise RuntimeError(f"Registry returned status {response.status}")
except asyncio.TimeoutError:
raise RuntimeError(f"Registry at {registry_url} is not responding (timeout)")
except aiohttp.ClientError as e:
raise RuntimeError(f"Registry at {registry_url} is not accessible: {e}")
except Exception as e:
raise RuntimeError(f"Failed to validate registry connectivity: {e}")
async def validate_docker_network(network_name: str):
"""
Validate that the specified Docker network exists.
Args:
network_name: Name of the Docker network to validate
Raises:
RuntimeError: If network doesn't exist
"""
import os
# Skip network validation if running in container without Docker socket
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
logger.info("Running in container without Docker socket - skipping network validation")
return
try:
import docker
client = docker.from_env()
# List all networks
networks = client.networks.list(names=[network_name])
if not networks:
# Try to find networks with similar names
all_networks = client.networks.list()
similar_networks = [n.name for n in all_networks if "fuzzforge" in n.name.lower()]
error_msg = f"Docker network '{network_name}' not found."
if similar_networks:
error_msg += f" Available networks: {similar_networks}"
else:
error_msg += " Please ensure Docker Compose is running."
raise RuntimeError(error_msg)
logger.info(f"Docker network '{network_name}' validated")
except Exception as e:
if isinstance(e, RuntimeError):
raise
logger.error(f"Network validation failed: {e}")
raise RuntimeError(f"Failed to validate Docker network: {e}")
logger.info("Result storage (MinIO) configured")
# MinIO is configured via environment variables in docker-compose
# No additional setup needed here
return True
async def validate_infrastructure():
@@ -382,21 +39,7 @@ async def validate_infrastructure():
"""
logger.info("Validating infrastructure...")
# Validate Docker connection
await validate_docker_connection()
# Validate registry connectivity for custom image building
await validate_registry_connectivity()
# Validate network (hardcoded to avoid directory name dependencies)
import os
compose_project = "fuzzforge"
docker_network = "fuzzforge_default"
try:
await validate_docker_network(docker_network)
except RuntimeError as e:
logger.warning(f"Network validation failed: {e}")
logger.warning("Workflows may not be able to connect to Prefect services")
# Setup storage (MinIO)
await setup_result_storage()
logger.info("Infrastructure validation completed")
-459
View File
@@ -1,459 +0,0 @@
"""
Workflow Discovery - Registry-based discovery and loading of workflows
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import yaml
from pathlib import Path
from typing import Dict, Optional, Any, Callable
from pydantic import BaseModel, Field, ConfigDict
logger = logging.getLogger(__name__)
class WorkflowInfo(BaseModel):
"""Information about a discovered workflow"""
name: str = Field(..., description="Workflow name")
path: Path = Field(..., description="Path to workflow directory")
workflow_file: Path = Field(..., description="Path to workflow.py file")
dockerfile: Path = Field(..., description="Path to Dockerfile")
has_docker: bool = Field(..., description="Whether workflow has custom Dockerfile")
metadata: Dict[str, Any] = Field(..., description="Workflow metadata from YAML")
flow_function_name: str = Field(default="main_flow", description="Name of the flow function")
model_config = ConfigDict(arbitrary_types_allowed=True)
class WorkflowDiscovery:
"""
Discovers workflows from the filesystem and validates them against the registry.
This system:
1. Scans for workflows with metadata.yaml files
2. Cross-references them with the manual registry
3. Provides registry-based flow functions for deployment
Workflows must have:
- workflow.py: Contains the Prefect flow
- metadata.yaml: Mandatory metadata file
- Entry in toolbox/workflows/registry.py: Manual registration
- Dockerfile (optional): Custom container definition
- requirements.txt (optional): Python dependencies
"""
def __init__(self, workflows_dir: Path):
"""
Initialize workflow discovery.
Args:
workflows_dir: Path to the workflows directory
"""
self.workflows_dir = workflows_dir
if not self.workflows_dir.exists():
self.workflows_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Created workflows directory: {self.workflows_dir}")
# Import registry - this validates it on import
try:
from toolbox.workflows.registry import WORKFLOW_REGISTRY, list_registered_workflows
self.registry = WORKFLOW_REGISTRY
logger.info(f"Loaded workflow registry with {len(self.registry)} registered workflows")
except ImportError as e:
logger.error(f"Failed to import workflow registry: {e}")
self.registry = {}
except Exception as e:
logger.error(f"Registry validation failed: {e}")
self.registry = {}
# Cache for discovered workflows
self._workflow_cache: Optional[Dict[str, WorkflowInfo]] = None
self._cache_timestamp: Optional[float] = None
self._cache_ttl = 60.0 # Cache TTL in seconds
async def discover_workflows(self) -> Dict[str, WorkflowInfo]:
"""
Discover workflows by cross-referencing filesystem with registry.
Uses caching to avoid frequent filesystem scans.
Returns:
Dictionary mapping workflow names to their information
"""
# Check cache validity
import time
current_time = time.time()
if (self._workflow_cache is not None and
self._cache_timestamp is not None and
(current_time - self._cache_timestamp) < self._cache_ttl):
# Return cached results
logger.debug(f"Returning cached workflow discovery ({len(self._workflow_cache)} workflows)")
return self._workflow_cache
workflows = {}
discovered_dirs = set()
registry_names = set(self.registry.keys())
if not self.workflows_dir.exists():
logger.warning(f"Workflows directory does not exist: {self.workflows_dir}")
return workflows
# Recursively scan all directories and subdirectories
await self._scan_directory_recursive(self.workflows_dir, workflows, discovered_dirs)
# Check for registry entries without corresponding directories
missing_dirs = registry_names - discovered_dirs
if missing_dirs:
logger.warning(
f"Registry contains workflows without filesystem directories: {missing_dirs}. "
f"These workflows cannot be deployed."
)
logger.info(
f"Discovery complete: {len(workflows)} workflows ready for deployment, "
f"{len(missing_dirs)} registry entries missing directories, "
f"{len(discovered_dirs - registry_names)} filesystem workflows not registered"
)
# Update cache
self._workflow_cache = workflows
self._cache_timestamp = current_time
return workflows
async def _scan_directory_recursive(self, directory: Path, workflows: Dict[str, WorkflowInfo], discovered_dirs: set):
"""
Recursively scan directory for workflows.
Args:
directory: Directory to scan
workflows: Dictionary to populate with discovered workflows
discovered_dirs: Set to track discovered workflow names
"""
for item in directory.iterdir():
if not item.is_dir():
continue
if item.name.startswith('_') or item.name.startswith('.'):
continue # Skip hidden or private directories
# Check if this directory contains workflow files (workflow.py and metadata.yaml)
workflow_file = item / "workflow.py"
metadata_file = item / "metadata.yaml"
if workflow_file.exists() and metadata_file.exists():
# This is a workflow directory
workflow_name = item.name
discovered_dirs.add(workflow_name)
# Only process workflows that are in the registry
if workflow_name not in self.registry:
logger.warning(
f"Workflow '{workflow_name}' found in filesystem but not in registry. "
f"Add it to toolbox/workflows/registry.py to enable deployment."
)
continue
try:
workflow_info = await self._load_workflow(item)
if workflow_info:
workflows[workflow_info.name] = workflow_info
logger.info(f"Discovered and registered workflow: {workflow_info.name}")
except Exception as e:
logger.error(f"Failed to load workflow from {item}: {e}")
else:
# This is a category directory, recurse into it
await self._scan_directory_recursive(item, workflows, discovered_dirs)
async def _load_workflow(self, workflow_dir: Path) -> Optional[WorkflowInfo]:
"""
Load and validate a single workflow.
Args:
workflow_dir: Path to the workflow directory
Returns:
WorkflowInfo if valid, None otherwise
"""
workflow_name = workflow_dir.name
# Check for mandatory files
workflow_file = workflow_dir / "workflow.py"
metadata_file = workflow_dir / "metadata.yaml"
if not workflow_file.exists():
logger.warning(f"Workflow {workflow_name} missing workflow.py")
return None
if not metadata_file.exists():
logger.error(f"Workflow {workflow_name} missing mandatory metadata.yaml")
return None
# Load and validate metadata
try:
metadata = self._load_metadata(metadata_file)
if not self._validate_metadata(metadata, workflow_name):
return None
except Exception as e:
logger.error(f"Failed to load metadata for {workflow_name}: {e}")
return None
# Check for mandatory Dockerfile
dockerfile = workflow_dir / "Dockerfile"
if not dockerfile.exists():
logger.error(f"Workflow {workflow_name} missing mandatory Dockerfile")
return None
has_docker = True # Always True since Dockerfile is mandatory
# Get flow function name from metadata or use default
flow_function_name = metadata.get("flow_function", "main_flow")
return WorkflowInfo(
name=workflow_name,
path=workflow_dir,
workflow_file=workflow_file,
dockerfile=dockerfile,
has_docker=has_docker,
metadata=metadata,
flow_function_name=flow_function_name
)
def _load_metadata(self, metadata_file: Path) -> Dict[str, Any]:
"""
Load metadata from YAML file.
Args:
metadata_file: Path to metadata.yaml
Returns:
Dictionary containing metadata
"""
with open(metadata_file, 'r') as f:
metadata = yaml.safe_load(f)
if metadata is None:
raise ValueError("Empty metadata file")
return metadata
def _validate_metadata(self, metadata: Dict[str, Any], workflow_name: str) -> bool:
"""
Validate that metadata contains all required fields.
Args:
metadata: Metadata dictionary
workflow_name: Name of the workflow for logging
Returns:
True if valid, False otherwise
"""
required_fields = ["name", "version", "description", "author", "category", "parameters", "requirements"]
missing_fields = []
for field in required_fields:
if field not in metadata:
missing_fields.append(field)
if missing_fields:
logger.error(
f"Workflow {workflow_name} metadata missing required fields: {missing_fields}"
)
return False
# Validate version format (semantic versioning)
version = metadata.get("version", "")
if not self._is_valid_version(version):
logger.error(f"Workflow {workflow_name} has invalid version format: {version}")
return False
# Validate parameters structure
parameters = metadata.get("parameters", {})
if not isinstance(parameters, dict):
logger.error(f"Workflow {workflow_name} parameters must be a dictionary")
return False
return True
def _is_valid_version(self, version: str) -> bool:
"""
Check if version follows semantic versioning (x.y.z).
Args:
version: Version string
Returns:
True if valid semantic version
"""
try:
parts = version.split('.')
if len(parts) != 3:
return False
for part in parts:
int(part) # Check if each part is a number
return True
except (ValueError, AttributeError):
return False
def invalidate_cache(self) -> None:
"""
Invalidate the workflow discovery cache.
Useful when workflows are added or modified.
"""
self._workflow_cache = None
self._cache_timestamp = None
logger.debug("Workflow discovery cache invalidated")
def get_flow_function(self, workflow_name: str) -> Optional[Callable]:
"""
Get the flow function from the registry.
Args:
workflow_name: Name of the workflow
Returns:
The flow function if found in registry, None otherwise
"""
if workflow_name not in self.registry:
logger.error(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {list(self.registry.keys())}"
)
return None
try:
from toolbox.workflows.registry import get_workflow_flow
flow_func = get_workflow_flow(workflow_name)
logger.debug(f"Retrieved flow function for '{workflow_name}' from registry")
return flow_func
except Exception as e:
logger.error(f"Failed to get flow function for '{workflow_name}': {e}")
return None
def get_registry_info(self, workflow_name: str) -> Optional[Dict[str, Any]]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information if found, None otherwise
"""
if workflow_name not in self.registry:
return None
try:
from toolbox.workflows.registry import get_workflow_info
return get_workflow_info(workflow_name)
except Exception as e:
logger.error(f"Failed to get registry info for '{workflow_name}': {e}")
return None
@staticmethod
def get_metadata_schema() -> Dict[str, Any]:
"""
Get the JSON schema for workflow metadata.
Returns:
JSON schema dictionary
"""
return {
"type": "object",
"required": ["name", "version", "description", "author", "category", "parameters", "requirements"],
"properties": {
"name": {
"type": "string",
"description": "Workflow name"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Semantic version (x.y.z)"
},
"description": {
"type": "string",
"description": "Workflow description"
},
"author": {
"type": "string",
"description": "Workflow author"
},
"category": {
"type": "string",
"enum": ["comprehensive", "specialized", "fuzzing", "focused"],
"description": "Workflow category"
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Workflow tags for categorization"
},
"requirements": {
"type": "object",
"required": ["tools", "resources"],
"properties": {
"tools": {
"type": "array",
"items": {"type": "string"},
"description": "Required security tools"
},
"resources": {
"type": "object",
"required": ["memory", "cpu", "timeout"],
"properties": {
"memory": {
"type": "string",
"pattern": "^\\d+[GMK]i$",
"description": "Memory limit (e.g., 1Gi, 512Mi)"
},
"cpu": {
"type": "string",
"pattern": "^\\d+m?$",
"description": "CPU limit (e.g., 1000m, 2)"
},
"timeout": {
"type": "integer",
"minimum": 60,
"maximum": 7200,
"description": "Workflow timeout in seconds"
}
}
}
}
},
"parameters": {
"type": "object",
"description": "Workflow parameters schema"
},
"default_parameters": {
"type": "object",
"description": "Default parameter values"
},
"required_modules": {
"type": "array",
"items": {"type": "string"},
"description": "Required module names"
},
"supported_volume_modes": {
"type": "array",
"items": {"enum": ["ro", "rw"]},
"default": ["ro", "rw"],
"description": "Supported volume mount modes"
},
"flow_function": {
"type": "string",
"default": "main_flow",
"description": "Name of the flow function in workflow.py"
}
}
}
+171 -310
View File
@@ -12,7 +12,6 @@
import asyncio
import logging
import os
from uuid import UUID
from contextlib import AsyncExitStack, asynccontextmanager, suppress
from typing import Any, Dict, Optional, List
@@ -23,31 +22,20 @@ from starlette.routing import Mount
from fastmcp.server.http import create_sse_app
from src.core.prefect_manager import PrefectManager
from src.core.setup import setup_docker_pool, setup_result_storage, validate_infrastructure
from src.core.workflow_discovery import WorkflowDiscovery
from src.temporal.manager import TemporalManager
from src.core.setup import setup_result_storage, validate_infrastructure
from src.api import workflows, runs, fuzzing
from src.services.prefect_stats_monitor import prefect_stats_monitor
from fastmcp import FastMCP
from prefect.client.orchestration import get_client
from prefect.client.schemas.filters import (
FlowRunFilter,
FlowRunFilterDeploymentId,
FlowRunFilterState,
FlowRunFilterStateType,
)
from prefect.client.schemas.sorting import FlowRunSort
from prefect.states import StateType
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
prefect_mgr = PrefectManager()
temporal_mgr = TemporalManager()
class PrefectBootstrapState:
"""Tracks Prefect initialization progress for API and MCP consumers."""
class TemporalBootstrapState:
"""Tracks Temporal initialization progress for API and MCP consumers."""
def __init__(self) -> None:
self.ready: bool = False
@@ -64,19 +52,19 @@ class PrefectBootstrapState:
}
prefect_bootstrap_state = PrefectBootstrapState()
temporal_bootstrap_state = TemporalBootstrapState()
# Configure retry strategy for bootstrapping Prefect + infrastructure
# Configure retry strategy for bootstrapping Temporal + infrastructure
STARTUP_RETRY_SECONDS = max(1, int(os.getenv("FUZZFORGE_STARTUP_RETRY_SECONDS", "5")))
STARTUP_RETRY_MAX_SECONDS = max(
STARTUP_RETRY_SECONDS,
int(os.getenv("FUZZFORGE_STARTUP_RETRY_MAX_SECONDS", "60")),
)
prefect_bootstrap_task: Optional[asyncio.Task] = None
temporal_bootstrap_task: Optional[asyncio.Task] = None
# ---------------------------------------------------------------------------
# FastAPI application (REST API remains unchanged)
# FastAPI application (REST API)
# ---------------------------------------------------------------------------
app = FastAPI(
@@ -90,20 +78,19 @@ app.include_router(runs.router)
app.include_router(fuzzing.router)
def get_prefect_status() -> Dict[str, Any]:
"""Return a snapshot of Prefect bootstrap state for diagnostics."""
status = prefect_bootstrap_state.as_dict()
status["workflows_loaded"] = len(prefect_mgr.workflows)
status["deployments_tracked"] = len(prefect_mgr.deployments)
def get_temporal_status() -> Dict[str, Any]:
"""Return a snapshot of Temporal bootstrap state for diagnostics."""
status = temporal_bootstrap_state.as_dict()
status["workflows_loaded"] = len(temporal_mgr.workflows)
status["bootstrap_task_running"] = (
prefect_bootstrap_task is not None and not prefect_bootstrap_task.done()
temporal_bootstrap_task is not None and not temporal_bootstrap_task.done()
)
return status
def _prefect_not_ready_status() -> Optional[Dict[str, Any]]:
"""Return status details if Prefect is not ready yet."""
status = get_prefect_status()
def _temporal_not_ready_status() -> Optional[Dict[str, Any]]:
"""Return status details if Temporal is not ready yet."""
status = get_temporal_status()
if status.get("ready"):
return None
return status
@@ -111,19 +98,19 @@ def _prefect_not_ready_status() -> Optional[Dict[str, Any]]:
@app.get("/")
async def root() -> Dict[str, Any]:
status = get_prefect_status()
status = get_temporal_status()
return {
"name": "FuzzForge API",
"version": "0.6.0",
"status": "ready" if status.get("ready") else "initializing",
"workflows_loaded": status.get("workflows_loaded", 0),
"prefect": status,
"temporal": status,
}
@app.get("/health")
async def health() -> Dict[str, str]:
status = get_prefect_status()
status = get_temporal_status()
health_status = "healthy" if status.get("ready") else "initializing"
return {"status": health_status}
@@ -165,65 +152,61 @@ _fastapi_mcp_imported = False
mcp = FastMCP(name="FuzzForge MCP")
async def _bootstrap_prefect_with_retries() -> None:
"""Initialize Prefect infrastructure with exponential backoff retries."""
async def _bootstrap_temporal_with_retries() -> None:
"""Initialize Temporal infrastructure with exponential backoff retries."""
attempt = 0
while True:
attempt += 1
prefect_bootstrap_state.task_running = True
prefect_bootstrap_state.status = "starting"
prefect_bootstrap_state.ready = False
prefect_bootstrap_state.last_error = None
temporal_bootstrap_state.task_running = True
temporal_bootstrap_state.status = "starting"
temporal_bootstrap_state.ready = False
temporal_bootstrap_state.last_error = None
try:
logger.info("Bootstrapping Prefect infrastructure...")
logger.info("Bootstrapping Temporal infrastructure...")
await validate_infrastructure()
await setup_docker_pool()
await setup_result_storage()
await prefect_mgr.initialize()
await prefect_stats_monitor.start_monitoring()
await temporal_mgr.initialize()
prefect_bootstrap_state.ready = True
prefect_bootstrap_state.status = "ready"
prefect_bootstrap_state.task_running = False
logger.info("Prefect infrastructure ready")
temporal_bootstrap_state.ready = True
temporal_bootstrap_state.status = "ready"
temporal_bootstrap_state.task_running = False
logger.info("Temporal infrastructure ready")
return
except asyncio.CancelledError:
prefect_bootstrap_state.status = "cancelled"
prefect_bootstrap_state.task_running = False
logger.info("Prefect bootstrap task cancelled")
temporal_bootstrap_state.status = "cancelled"
temporal_bootstrap_state.task_running = False
logger.info("Temporal bootstrap task cancelled")
raise
except Exception as exc: # pragma: no cover - defensive logging on infra startup
logger.exception("Prefect bootstrap failed")
prefect_bootstrap_state.ready = False
prefect_bootstrap_state.status = "error"
prefect_bootstrap_state.last_error = str(exc)
logger.exception("Temporal bootstrap failed")
temporal_bootstrap_state.ready = False
temporal_bootstrap_state.status = "error"
temporal_bootstrap_state.last_error = str(exc)
# Ensure partial initialization does not leave stale state behind
prefect_mgr.workflows.clear()
prefect_mgr.deployments.clear()
await prefect_stats_monitor.stop_monitoring()
temporal_mgr.workflows.clear()
wait_time = min(
STARTUP_RETRY_SECONDS * (2 ** (attempt - 1)),
STARTUP_RETRY_MAX_SECONDS,
)
logger.info("Retrying Prefect bootstrap in %s second(s)", wait_time)
logger.info("Retrying Temporal bootstrap in %s second(s)", wait_time)
try:
await asyncio.sleep(wait_time)
except asyncio.CancelledError:
prefect_bootstrap_state.status = "cancelled"
prefect_bootstrap_state.task_running = False
temporal_bootstrap_state.status = "cancelled"
temporal_bootstrap_state.task_running = False
raise
def _lookup_workflow(workflow_name: str):
info = prefect_mgr.workflows.get(workflow_name)
info = temporal_mgr.workflows.get(workflow_name)
if not info:
return None
metadata = info.metadata
@@ -248,24 +231,23 @@ def _lookup_workflow(workflow_name: str):
"required_modules": metadata.get("required_modules", []),
"supported_volume_modes": supported_modes,
"default_target_path": default_target_path,
"default_volume_mode": default_volume_mode,
"has_custom_docker": bool(info.has_docker),
"default_volume_mode": default_volume_mode
}
@mcp.tool
async def list_workflows_mcp() -> Dict[str, Any]:
"""List all discovered workflows and their metadata summary."""
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"workflows": [],
"prefect": not_ready,
"message": "Prefect infrastructure is still initializing",
"temporal": not_ready,
"message": "Temporal infrastructure is still initializing",
}
workflows_summary = []
for name, info in prefect_mgr.workflows.items():
for name, info in temporal_mgr.workflows.items():
metadata = info.metadata
defaults = metadata.get("default_parameters", {})
workflows_summary.append({
@@ -279,20 +261,19 @@ async def list_workflows_mcp() -> Dict[str, Any]:
or defaults.get("volume_mode")
or "ro",
"default_target_path": metadata.get("default_target_path")
or defaults.get("target_path"),
"has_custom_docker": bool(info.has_docker),
or defaults.get("target_path")
})
return {"workflows": workflows_summary, "prefect": get_prefect_status()}
return {"workflows": workflows_summary, "temporal": get_temporal_status()}
@mcp.tool
async def get_workflow_metadata_mcp(workflow_name: str) -> Dict[str, Any]:
"""Fetch detailed metadata for a workflow."""
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
data = _lookup_workflow(workflow_name)
@@ -304,11 +285,11 @@ async def get_workflow_metadata_mcp(workflow_name: str) -> Dict[str, Any]:
@mcp.tool
async def get_workflow_parameters_mcp(workflow_name: str) -> Dict[str, Any]:
"""Return the parameter schema and defaults for a workflow."""
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
data = _lookup_workflow(workflow_name)
@@ -323,72 +304,41 @@ async def get_workflow_parameters_mcp(workflow_name: str) -> Dict[str, Any]:
@mcp.tool
async def get_workflow_metadata_schema_mcp() -> Dict[str, Any]:
"""Return the JSON schema describing workflow metadata files."""
from src.temporal.discovery import WorkflowDiscovery
return WorkflowDiscovery.get_metadata_schema()
@mcp.tool
async def submit_security_scan_mcp(
workflow_name: str,
target_path: str | None = None,
volume_mode: str | None = None,
target_id: str,
parameters: Dict[str, Any] | None = None,
) -> Dict[str, Any] | Dict[str, str]:
"""Submit a Prefect workflow via MCP."""
"""Submit a Temporal workflow via MCP."""
try:
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
workflow_info = prefect_mgr.workflows.get(workflow_name)
workflow_info = temporal_mgr.workflows.get(workflow_name)
if not workflow_info:
return {"error": f"Workflow '{workflow_name}' not found"}
metadata = workflow_info.metadata or {}
defaults = metadata.get("default_parameters", {})
resolved_target_path = target_path or metadata.get("default_target_path") or defaults.get("target_path")
if not resolved_target_path:
return {
"error": (
"target_path is required and no default_target_path is defined in metadata"
),
"metadata": {
"workflow": workflow_name,
"default_target_path": metadata.get("default_target_path"),
},
}
requested_volume_mode = volume_mode or metadata.get("default_volume_mode") or defaults.get("volume_mode")
if not requested_volume_mode:
requested_volume_mode = "ro"
normalised_volume_mode = (
str(requested_volume_mode).strip().lower().replace("-", "_")
)
if normalised_volume_mode in {"read_only", "readonly", "ro"}:
normalised_volume_mode = "ro"
elif normalised_volume_mode in {"read_write", "readwrite", "rw"}:
normalised_volume_mode = "rw"
else:
supported_modes = metadata.get("supported_volume_modes", ["ro", "rw"])
if isinstance(supported_modes, list) and normalised_volume_mode in supported_modes:
pass
else:
normalised_volume_mode = "ro"
parameters = parameters or {}
cleaned_parameters: Dict[str, Any] = {**defaults, **parameters}
# Ensure *_config structures default to dicts so Prefect validation passes.
# Ensure *_config structures default to dicts
for key, value in list(cleaned_parameters.items()):
if isinstance(key, str) and key.endswith("_config") and value is None:
cleaned_parameters[key] = {}
# Some workflows expect configuration dictionaries even when omitted.
# Some workflows expect configuration dictionaries even when omitted
parameter_definitions = (
metadata.get("parameters", {}).get("properties", {})
if isinstance(metadata.get("parameters"), dict)
@@ -403,20 +353,19 @@ async def submit_security_scan_mcp(
elif cleaned_parameters[key] is None:
cleaned_parameters[key] = {}
flow_run = await prefect_mgr.submit_workflow(
# Start workflow
handle = await temporal_mgr.run_workflow(
workflow_name=workflow_name,
target_path=resolved_target_path,
volume_mode=normalised_volume_mode,
parameters=cleaned_parameters,
target_id=target_id,
workflow_params=cleaned_parameters,
)
return {
"run_id": str(flow_run.id),
"status": flow_run.state.name if flow_run.state else "PENDING",
"run_id": handle.id,
"status": "RUNNING",
"workflow": workflow_name,
"message": f"Workflow '{workflow_name}' submitted successfully",
"target_path": resolved_target_path,
"volume_mode": normalised_volume_mode,
"target_id": target_id,
"parameters": cleaned_parameters,
"mcp_enabled": True,
}
@@ -427,43 +376,38 @@ async def submit_security_scan_mcp(
@mcp.tool
async def get_comprehensive_scan_summary(run_id: str) -> Dict[str, Any] | Dict[str, str]:
"""Return a summary for the given flow run via MCP."""
"""Return a summary for the given workflow run via MCP."""
try:
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
status = await prefect_mgr.get_flow_run_status(run_id)
findings = await prefect_mgr.get_flow_run_findings(run_id)
workflow_name = "unknown"
deployment_id = status.get("workflow", "")
for name, deployment in prefect_mgr.deployments.items():
if str(deployment) == str(deployment_id):
workflow_name = name
break
status = await temporal_mgr.get_workflow_status(run_id)
# Try to get result if completed
total_findings = 0
severity_summary = {"critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0}
if findings and "sarif" in findings:
sarif = findings["sarif"]
if isinstance(sarif, dict):
total_findings = sarif.get("total_findings", 0)
if status.get("status") == "COMPLETED":
try:
result = await temporal_mgr.get_workflow_result(run_id)
if isinstance(result, dict):
summary = result.get("summary", {})
total_findings = summary.get("total_findings", 0)
except Exception as e:
logger.debug(f"Could not retrieve result for {run_id}: {e}")
return {
"run_id": run_id,
"workflow": workflow_name,
"workflow": "unknown", # Temporal doesn't track workflow name in status
"status": status.get("status", "unknown"),
"is_completed": status.get("is_completed", False),
"is_completed": status.get("status") == "COMPLETED",
"total_findings": total_findings,
"severity_summary": severity_summary,
"scan_duration": status.get("updated_at", "")
if status.get("is_completed")
else "In progress",
"scan_duration": status.get("close_time", "In progress"),
"recommendations": (
[
"Review high and critical severity findings first",
@@ -482,32 +426,26 @@ async def get_comprehensive_scan_summary(run_id: str) -> Dict[str, Any] | Dict[s
@mcp.tool
async def get_run_status_mcp(run_id: str) -> Dict[str, Any]:
"""Return current status information for a Prefect run."""
"""Return current status information for a Temporal run."""
try:
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
status = await prefect_mgr.get_flow_run_status(run_id)
workflow_name = "unknown"
deployment_id = status.get("workflow", "")
for name, deployment in prefect_mgr.deployments.items():
if str(deployment) == str(deployment_id):
workflow_name = name
break
status = await temporal_mgr.get_workflow_status(run_id)
return {
"run_id": status["run_id"],
"workflow": workflow_name,
"run_id": run_id,
"workflow": "unknown",
"status": status["status"],
"is_completed": status["is_completed"],
"is_failed": status["is_failed"],
"is_running": status["is_running"],
"created_at": status["created_at"],
"updated_at": status["updated_at"],
"is_completed": status["status"] in ["COMPLETED", "FAILED", "CANCELLED"],
"is_failed": status["status"] == "FAILED",
"is_running": status["status"] == "RUNNING",
"created_at": status.get("start_time"),
"updated_at": status.get("close_time") or status.get("execution_time"),
}
except Exception as exc:
logger.exception("MCP run status failed")
@@ -518,38 +456,30 @@ async def get_run_status_mcp(run_id: str) -> Dict[str, Any]:
async def get_run_findings_mcp(run_id: str) -> Dict[str, Any]:
"""Return SARIF findings for a completed run."""
try:
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
status = await prefect_mgr.get_flow_run_status(run_id)
if not status.get("is_completed"):
status = await temporal_mgr.get_workflow_status(run_id)
if status.get("status") != "COMPLETED":
return {"error": f"Run {run_id} not completed. Status: {status.get('status')}"}
findings = await prefect_mgr.get_flow_run_findings(run_id)
workflow_name = "unknown"
deployment_id = status.get("workflow", "")
for name, deployment in prefect_mgr.deployments.items():
if str(deployment) == str(deployment_id):
workflow_name = name
break
result = await temporal_mgr.get_workflow_result(run_id)
metadata = {
"completion_time": status.get("updated_at"),
"completion_time": status.get("close_time"),
"workflow_version": "unknown",
}
info = prefect_mgr.workflows.get(workflow_name)
if info:
metadata["workflow_version"] = info.metadata.get("version", "unknown")
sarif = result.get("sarif", {}) if isinstance(result, dict) else {}
return {
"workflow": workflow_name,
"workflow": "unknown",
"run_id": run_id,
"sarif": findings,
"sarif": sarif,
"metadata": metadata,
}
except Exception as exc:
@@ -561,16 +491,15 @@ async def get_run_findings_mcp(run_id: str) -> Dict[str, Any]:
async def list_recent_runs_mcp(
limit: int = 10,
workflow_name: str | None = None,
states: List[str] | None = None,
) -> Dict[str, Any]:
"""List recent Prefect runs with optional workflow/state filters."""
"""List recent Temporal runs with optional workflow filter."""
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"runs": [],
"prefect": not_ready,
"message": "Prefect infrastructure is still initializing",
"temporal": not_ready,
"message": "Temporal infrastructure is still initializing",
}
try:
@@ -579,116 +508,49 @@ async def list_recent_runs_mcp(
limit_value = 10
limit_value = max(1, min(limit_value, 100))
deployment_map = {
str(deployment_id): workflow
for workflow, deployment_id in prefect_mgr.deployments.items()
}
try:
# Build filter query
filter_query = None
if workflow_name:
workflow_info = temporal_mgr.workflows.get(workflow_name)
if workflow_info:
filter_query = f'WorkflowType="{workflow_info.workflow_type}"'
deployment_filter_value = None
if workflow_name:
deployment_id = prefect_mgr.deployments.get(workflow_name)
if not deployment_id:
return {
"runs": [],
"prefect": get_prefect_status(),
"error": f"Workflow '{workflow_name}' has no registered deployment",
}
try:
deployment_filter_value = UUID(str(deployment_id))
except ValueError:
return {
"runs": [],
"prefect": get_prefect_status(),
"error": (
f"Deployment id '{deployment_id}' for workflow '{workflow_name}' is invalid"
),
}
workflows = await temporal_mgr.list_workflows(filter_query, limit_value)
desired_state_types: List[StateType] = []
if states:
for raw_state in states:
if not raw_state:
continue
normalised = raw_state.strip().upper()
if normalised == "ALL":
desired_state_types = []
break
try:
desired_state_types.append(StateType[normalised])
except KeyError:
continue
if not desired_state_types:
desired_state_types = [
StateType.RUNNING,
StateType.COMPLETED,
StateType.FAILED,
StateType.CANCELLED,
]
results: List[Dict[str, Any]] = []
for wf in workflows:
results.append({
"run_id": wf["workflow_id"],
"workflow": workflow_name or "unknown",
"state": wf["status"],
"state_type": wf["status"],
"is_completed": wf["status"] in ["COMPLETED", "FAILED", "CANCELLED"],
"is_running": wf["status"] == "RUNNING",
"is_failed": wf["status"] == "FAILED",
"created_at": wf.get("start_time"),
"updated_at": wf.get("close_time"),
})
flow_filter = FlowRunFilter()
if desired_state_types:
flow_filter.state = FlowRunFilterState(
type=FlowRunFilterStateType(any_=desired_state_types)
)
if deployment_filter_value:
flow_filter.deployment_id = FlowRunFilterDeploymentId(
any_=[deployment_filter_value]
)
return {"runs": results, "temporal": get_temporal_status()}
async with get_client() as client:
flow_runs = await client.read_flow_runs(
limit=limit_value,
flow_run_filter=flow_filter,
sort=FlowRunSort.START_TIME_DESC,
)
results: List[Dict[str, Any]] = []
for flow_run in flow_runs:
deployment_id = getattr(flow_run, "deployment_id", None)
workflow = deployment_map.get(str(deployment_id), "unknown")
state = getattr(flow_run, "state", None)
state_name = getattr(state, "name", None) if state else None
state_type = getattr(state, "type", None) if state else None
results.append(
{
"run_id": str(flow_run.id),
"workflow": workflow,
"deployment_id": str(deployment_id) if deployment_id else None,
"state": state_name or (state_type.name if state_type else None),
"state_type": state_type.name if state_type else None,
"is_completed": bool(getattr(state, "is_completed", lambda: False)()),
"is_running": bool(getattr(state, "is_running", lambda: False)()),
"is_failed": bool(getattr(state, "is_failed", lambda: False)()),
"created_at": getattr(flow_run, "created", None),
"updated_at": getattr(flow_run, "updated", None),
"expected_start_time": getattr(flow_run, "expected_start_time", None),
"start_time": getattr(flow_run, "start_time", None),
}
)
# Normalise datetimes to ISO 8601 strings for serialization
for entry in results:
for key in ("created_at", "updated_at", "expected_start_time", "start_time"):
value = entry.get(key)
if value is None:
continue
try:
entry[key] = value.isoformat()
except AttributeError:
entry[key] = str(value)
return {"runs": results, "prefect": get_prefect_status()}
except Exception as exc:
logger.exception("Failed to list runs")
return {
"runs": [],
"temporal": get_temporal_status(),
"error": str(exc)
}
@mcp.tool
async def get_fuzzing_stats_mcp(run_id: str) -> Dict[str, Any]:
"""Return fuzzing statistics for a run if available."""
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
stats = fuzzing.fuzzing_stats.get(run_id)
@@ -708,11 +570,11 @@ async def get_fuzzing_stats_mcp(run_id: str) -> Dict[str, Any]:
@mcp.tool
async def get_fuzzing_crash_reports_mcp(run_id: str) -> Dict[str, Any]:
"""Return crash reports collected for a fuzzing run."""
not_ready = _prefect_not_ready_status()
not_ready = _temporal_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
"error": "Temporal infrastructure not ready",
"temporal": not_ready,
}
reports = fuzzing.crash_reports.get(run_id)
@@ -725,11 +587,11 @@ async def get_fuzzing_crash_reports_mcp(run_id: str) -> Dict[str, Any]:
async def get_backend_status_mcp() -> Dict[str, Any]:
"""Expose backend readiness, workflows, and registered MCP tools."""
status = get_prefect_status()
response: Dict[str, Any] = {"prefect": status}
status = get_temporal_status()
response: Dict[str, Any] = {"temporal": status}
if status.get("ready"):
response["workflows"] = list(prefect_mgr.workflows.keys())
response["workflows"] = list(temporal_mgr.workflows.keys())
try:
tools = await mcp._tool_manager.list_tools()
@@ -775,12 +637,12 @@ def create_mcp_transport_app() -> Starlette:
# ---------------------------------------------------------------------------
# Combined lifespan: Prefect init + dedicated MCP transports
# Combined lifespan: Temporal init + dedicated MCP transports
# ---------------------------------------------------------------------------
@asynccontextmanager
async def combined_lifespan(app: FastAPI):
global prefect_bootstrap_task, _fastapi_mcp_imported
global temporal_bootstrap_task, _fastapi_mcp_imported
logger.info("Starting FuzzForge backend...")
@@ -793,12 +655,12 @@ async def combined_lifespan(app: FastAPI):
except Exception as exc:
logger.exception("Failed to import FastAPI endpoints into MCP", exc_info=exc)
# Kick off Prefect bootstrap in the background if needed
if prefect_bootstrap_task is None or prefect_bootstrap_task.done():
prefect_bootstrap_task = asyncio.create_task(_bootstrap_prefect_with_retries())
logger.info("Prefect bootstrap task started")
# Kick off Temporal bootstrap in the background if needed
if temporal_bootstrap_task is None or temporal_bootstrap_task.done():
temporal_bootstrap_task = asyncio.create_task(_bootstrap_temporal_with_retries())
logger.info("Temporal bootstrap task started")
else:
logger.info("Prefect bootstrap task already running")
logger.info("Temporal bootstrap task already running")
# Start MCP transports on shared port (HTTP + SSE)
mcp_app = create_mcp_transport_app()
@@ -846,18 +708,17 @@ async def combined_lifespan(app: FastAPI):
mcp_server.force_exit = True
await asyncio.gather(mcp_task, return_exceptions=True)
if prefect_bootstrap_task and not prefect_bootstrap_task.done():
prefect_bootstrap_task.cancel()
if temporal_bootstrap_task and not temporal_bootstrap_task.done():
temporal_bootstrap_task.cancel()
with suppress(asyncio.CancelledError):
await prefect_bootstrap_task
prefect_bootstrap_state.task_running = False
if not prefect_bootstrap_state.ready:
prefect_bootstrap_state.status = "stopped"
prefect_bootstrap_state.next_retry_seconds = None
prefect_bootstrap_task = None
await temporal_bootstrap_task
temporal_bootstrap_state.task_running = False
if not temporal_bootstrap_state.ready:
temporal_bootstrap_state.status = "stopped"
temporal_bootstrap_task = None
logger.info("Shutting down Prefect statistics monitor...")
await prefect_stats_monitor.stop_monitoring()
# Close Temporal client
await temporal_mgr.close()
logger.info("Shutting down FuzzForge backend...")
+7 -65
View File
@@ -13,10 +13,9 @@ Models for workflow findings and submissions
#
# Additional attribution and requirements are provided in the NOTICE file.
from pydantic import BaseModel, Field, field_validator
from pydantic import BaseModel, Field
from typing import Dict, Any, Optional, Literal, List
from datetime import datetime
from pathlib import Path
class WorkflowFindings(BaseModel):
@@ -27,47 +26,13 @@ class WorkflowFindings(BaseModel):
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
class ResourceLimits(BaseModel):
"""Resource limits for workflow execution"""
cpu_limit: Optional[str] = Field(None, description="CPU limit (e.g., '2' for 2 cores, '500m' for 0.5 cores)")
memory_limit: Optional[str] = Field(None, description="Memory limit (e.g., '1Gi', '512Mi')")
cpu_request: Optional[str] = Field(None, description="CPU request (guaranteed)")
memory_request: Optional[str] = Field(None, description="Memory request (guaranteed)")
class VolumeMount(BaseModel):
"""Volume mount specification"""
host_path: str = Field(..., description="Host path to mount")
container_path: str = Field(..., description="Container path for mount")
mode: Literal["ro", "rw"] = Field(default="ro", description="Mount mode")
@field_validator("host_path")
@classmethod
def validate_host_path(cls, v):
"""Validate that the host path is absolute (existence checked at runtime)"""
path = Path(v)
if not path.is_absolute():
raise ValueError(f"Host path must be absolute: {v}")
# Note: Path existence is validated at workflow runtime
# We can't validate existence here as this runs inside Docker container
return str(path)
@field_validator("container_path")
@classmethod
def validate_container_path(cls, v):
"""Validate that the container path is absolute"""
if not v.startswith('/'):
raise ValueError(f"Container path must be absolute: {v}")
return v
class WorkflowSubmission(BaseModel):
"""Submit a workflow with configurable settings"""
target_path: str = Field(..., description="Absolute path to analyze")
volume_mode: Literal["ro", "rw"] = Field(
default="ro",
description="Volume mount mode: read-only (ro) or read-write (rw)"
)
"""
Submit a workflow with configurable settings.
Note: This model is deprecated in favor of the /upload-and-submit endpoint
which handles file uploads directly.
"""
parameters: Dict[str, Any] = Field(
default_factory=dict,
description="Workflow-specific parameters"
@@ -78,25 +43,6 @@ class WorkflowSubmission(BaseModel):
ge=1,
le=604800 # Max 7 days to support fuzzing campaigns
)
resource_limits: Optional[ResourceLimits] = Field(
None,
description="Resource limits for workflow container"
)
additional_volumes: List[VolumeMount] = Field(
default_factory=list,
description="Additional volume mounts (e.g., for corpus, output directories)"
)
@field_validator("target_path")
@classmethod
def validate_path(cls, v):
"""Validate that the target path is absolute (existence checked at runtime)"""
path = Path(v)
if not path.is_absolute():
raise ValueError(f"Path must be absolute: {v}")
# Note: Path existence is validated at workflow runtime when volumes are mounted
# We can't validate existence here as this runs inside Docker container
return str(path)
class WorkflowStatus(BaseModel):
@@ -131,10 +77,6 @@ class WorkflowMetadata(BaseModel):
default=["ro", "rw"],
description="Supported volume mount modes"
)
has_custom_docker: bool = Field(
default=False,
description="Whether workflow has custom Dockerfile"
)
class WorkflowListItem(BaseModel):
@@ -1,394 +0,0 @@
"""
Generic Prefect Statistics Monitor Service
This service monitors ALL workflows for structured live data logging and
updates the appropriate statistics APIs. Works with any workflow that follows
the standard LIVE_STATS logging pattern.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import json
import logging
from datetime import datetime, timedelta, timezone
from typing import Dict, Any, Optional
from prefect.client.orchestration import get_client
from prefect.client.schemas.objects import FlowRun, TaskRun
from src.models.findings import FuzzingStats
from src.api.fuzzing import fuzzing_stats, initialize_fuzzing_tracking, active_connections
logger = logging.getLogger(__name__)
class PrefectStatsMonitor:
"""Monitors Prefect flows and tasks for live statistics from any workflow"""
def __init__(self):
self.monitoring = False
self.monitor_task = None
self.monitored_runs = set()
self.last_log_ts: Dict[str, datetime] = {}
self._client = None
self._client_refresh_time = None
self._client_refresh_interval = 300 # Refresh connection every 5 minutes
async def start_monitoring(self):
"""Start the Prefect statistics monitoring service"""
if self.monitoring:
logger.warning("Prefect stats monitor already running")
return
self.monitoring = True
self.monitor_task = asyncio.create_task(self._monitor_flows())
logger.info("Started Prefect statistics monitor")
async def stop_monitoring(self):
"""Stop the monitoring service"""
self.monitoring = False
if self.monitor_task:
self.monitor_task.cancel()
try:
await self.monitor_task
except asyncio.CancelledError:
pass
logger.info("Stopped Prefect statistics monitor")
async def _get_or_refresh_client(self):
"""Get or refresh Prefect client with connection pooling."""
now = datetime.now(timezone.utc)
if (self._client is None or
self._client_refresh_time is None or
(now - self._client_refresh_time).total_seconds() > self._client_refresh_interval):
if self._client:
try:
await self._client.aclose()
except Exception:
pass
self._client = get_client()
self._client_refresh_time = now
await self._client.__aenter__()
return self._client
async def _monitor_flows(self):
"""Main monitoring loop that watches Prefect flows"""
try:
while self.monitoring:
try:
# Use connection pooling for better performance
client = await self._get_or_refresh_client()
# Get recent flow runs (limit to reduce load)
flow_runs = await client.read_flow_runs(
limit=50,
sort="START_TIME_DESC",
)
# Only consider runs from the last 15 minutes
recent_cutoff = datetime.now(timezone.utc) - timedelta(minutes=15)
for flow_run in flow_runs:
created = getattr(flow_run, "created", None)
if created is None:
continue
try:
# Ensure timezone-aware comparison
if created.tzinfo is None:
created = created.replace(tzinfo=timezone.utc)
if created >= recent_cutoff:
await self._monitor_flow_run(client, flow_run)
except Exception:
# If comparison fails, attempt monitoring anyway
await self._monitor_flow_run(client, flow_run)
await asyncio.sleep(5) # Check every 5 seconds
except Exception as e:
logger.error(f"Error in Prefect monitoring: {e}")
await asyncio.sleep(10)
except asyncio.CancelledError:
logger.info("Prefect monitoring cancelled")
except Exception as e:
logger.error(f"Fatal error in Prefect monitoring: {e}")
finally:
# Clean up client on exit
if self._client:
try:
await self._client.__aexit__(None, None, None)
except Exception:
pass
self._client = None
async def _monitor_flow_run(self, client, flow_run: FlowRun):
"""Monitor a specific flow run for statistics"""
run_id = str(flow_run.id)
workflow_name = flow_run.name or "unknown"
try:
# Initialize tracking if not exists - only for workflows that might have live stats
if run_id not in fuzzing_stats:
initialize_fuzzing_tracking(run_id, workflow_name)
self.monitored_runs.add(run_id)
# Skip corrupted entries (should not happen after startup cleanup, but defensive)
elif not isinstance(fuzzing_stats[run_id], FuzzingStats):
logger.warning(f"Skipping corrupted stats entry for {run_id}, reinitializing")
initialize_fuzzing_tracking(run_id, workflow_name)
self.monitored_runs.add(run_id)
# Get task runs for this flow
task_runs = await client.read_task_runs(
flow_run_filter={"id": {"any_": [flow_run.id]}},
limit=25,
)
# Check all tasks for live statistics logging
for task_run in task_runs:
await self._extract_stats_from_task(client, run_id, task_run, workflow_name)
# Also scan flow-level logs as a fallback
await self._extract_stats_from_flow_logs(client, run_id, flow_run, workflow_name)
except Exception as e:
logger.warning(f"Error monitoring flow run {run_id}: {e}")
async def _extract_stats_from_task(self, client, run_id: str, task_run: TaskRun, workflow_name: str):
"""Extract statistics from any task that logs live stats"""
try:
# Get task run logs
logs = await client.read_logs(
log_filter={
"task_run_id": {"any_": [task_run.id]}
},
limit=100,
sort="TIMESTAMP_ASC"
)
# Parse logs for LIVE_STATS entries (generic pattern for any workflow)
latest_stats = None
for log in logs:
# Prefer structured extra field if present
extra_data = getattr(log, "extra", None) or getattr(log, "extra_fields", None) or None
if isinstance(extra_data, dict):
stat_type = extra_data.get("stats_type")
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
latest_stats = extra_data
continue
# Fallback to parsing from message text
if ("FUZZ_STATS" in log.message or "LIVE_STATS" in log.message):
stats = self._parse_stats_from_log(log.message)
if stats:
latest_stats = stats
# Update statistics if we found any
if latest_stats:
# Calculate elapsed time from task start
elapsed_time = 0
if task_run.start_time:
# Ensure timezone-aware arithmetic
now = datetime.now(timezone.utc)
try:
elapsed_time = int((now - task_run.start_time).total_seconds())
except Exception:
# Fallback to naive UTC if types mismatch
elapsed_time = int((datetime.utcnow() - task_run.start_time.replace(tzinfo=None)).total_seconds())
updated_stats = FuzzingStats(
run_id=run_id,
workflow=workflow_name,
executions=latest_stats.get("executions", 0),
executions_per_sec=latest_stats.get("executions_per_sec", 0.0),
crashes=latest_stats.get("crashes", 0),
unique_crashes=latest_stats.get("unique_crashes", 0),
corpus_size=latest_stats.get("corpus_size", 0),
elapsed_time=elapsed_time
)
# Update the global stats
previous = fuzzing_stats.get(run_id)
fuzzing_stats[run_id] = updated_stats
# Broadcast to any active WebSocket clients for this run
if active_connections.get(run_id):
# Handle both Pydantic objects and plain dicts
if isinstance(updated_stats, dict):
stats_data = updated_stats
elif hasattr(updated_stats, 'model_dump'):
stats_data = updated_stats.model_dump()
elif hasattr(updated_stats, 'dict'):
stats_data = updated_stats.dict()
else:
stats_data = updated_stats.__dict__
message = {
"type": "stats_update",
"data": stats_data,
}
disconnected = []
for ws in active_connections[run_id]:
try:
await ws.send_text(json.dumps(message))
except Exception:
disconnected.append(ws)
# Clean up disconnected sockets
for ws in disconnected:
try:
active_connections[run_id].remove(ws)
except ValueError:
pass
logger.debug(f"Updated Prefect stats for {run_id}: {updated_stats.executions} execs")
except Exception as e:
logger.warning(f"Error extracting stats from task {task_run.id}: {e}")
async def _extract_stats_from_flow_logs(self, client, run_id: str, flow_run: FlowRun, workflow_name: str):
"""Extract statistics by scanning flow-level logs for LIVE/FUZZ stats"""
try:
logs = await client.read_logs(
log_filter={
"flow_run_id": {"any_": [flow_run.id]}
},
limit=200,
sort="TIMESTAMP_ASC"
)
latest_stats = None
last_seen = self.last_log_ts.get(run_id)
max_ts = last_seen
for log in logs:
# Skip logs we've already processed
ts = getattr(log, "timestamp", None)
if last_seen and ts and ts <= last_seen:
continue
if ts and (max_ts is None or ts > max_ts):
max_ts = ts
# Prefer structured extra field if available
extra_data = getattr(log, "extra", None) or getattr(log, "extra_fields", None) or None
if isinstance(extra_data, dict):
stat_type = extra_data.get("stats_type")
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
latest_stats = extra_data
continue
# Fallback to message parse
if ("FUZZ_STATS" in log.message or "LIVE_STATS" in log.message):
stats = self._parse_stats_from_log(log.message)
if stats:
latest_stats = stats
if max_ts:
self.last_log_ts[run_id] = max_ts
if latest_stats:
# Use flow_run timestamps for elapsed time if available
elapsed_time = 0
start_time = getattr(flow_run, "start_time", None) or getattr(flow_run, "start_time", None)
if start_time:
now = datetime.now(timezone.utc)
try:
if start_time.tzinfo is None:
start_time = start_time.replace(tzinfo=timezone.utc)
elapsed_time = int((now - start_time).total_seconds())
except Exception:
elapsed_time = int((datetime.utcnow() - start_time.replace(tzinfo=None)).total_seconds())
updated_stats = FuzzingStats(
run_id=run_id,
workflow=workflow_name,
executions=latest_stats.get("executions", 0),
executions_per_sec=latest_stats.get("executions_per_sec", 0.0),
crashes=latest_stats.get("crashes", 0),
unique_crashes=latest_stats.get("unique_crashes", 0),
corpus_size=latest_stats.get("corpus_size", 0),
elapsed_time=elapsed_time
)
fuzzing_stats[run_id] = updated_stats
# Broadcast if listeners exist
if active_connections.get(run_id):
# Handle both Pydantic objects and plain dicts
if isinstance(updated_stats, dict):
stats_data = updated_stats
elif hasattr(updated_stats, 'model_dump'):
stats_data = updated_stats.model_dump()
elif hasattr(updated_stats, 'dict'):
stats_data = updated_stats.dict()
else:
stats_data = updated_stats.__dict__
message = {
"type": "stats_update",
"data": stats_data,
}
disconnected = []
for ws in active_connections[run_id]:
try:
await ws.send_text(json.dumps(message))
except Exception:
disconnected.append(ws)
for ws in disconnected:
try:
active_connections[run_id].remove(ws)
except ValueError:
pass
except Exception as e:
logger.warning(f"Error extracting stats from flow logs {run_id}: {e}")
def _parse_stats_from_log(self, log_message: str) -> Optional[Dict[str, Any]]:
"""Parse statistics from a log message"""
try:
import re
# Prefer explicit JSON after marker tokens
m = re.search(r'(?:FUZZ_STATS|LIVE_STATS)\s+(\{.*\})', log_message)
if m:
try:
return json.loads(m.group(1))
except Exception:
pass
# Fallback: Extract the extra= dict and coerce to JSON
stats_match = re.search(r'extra=({.*?})', log_message)
if not stats_match:
return None
extra_str = stats_match.group(1)
extra_str = extra_str.replace("'", '"')
extra_str = extra_str.replace('None', 'null')
extra_str = extra_str.replace('True', 'true')
extra_str = extra_str.replace('False', 'false')
stats_data = json.loads(extra_str)
# Support multiple stat types for different workflows
stat_type = stats_data.get("stats_type")
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
return stats_data
except Exception as e:
logger.debug(f"Error parsing log stats: {e}")
return None
# Global instance
prefect_stats_monitor = PrefectStatsMonitor()
+10
View File
@@ -0,0 +1,10 @@
"""
Storage abstraction layer for FuzzForge.
Provides unified interface for storing and retrieving targets and results.
"""
from .base import StorageBackend
from .s3_cached import S3CachedStorage
__all__ = ["StorageBackend", "S3CachedStorage"]
+153
View File
@@ -0,0 +1,153 @@
"""
Base storage backend interface.
All storage implementations must implement this interface.
"""
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Optional, Dict, Any
class StorageBackend(ABC):
"""
Abstract base class for storage backends.
Implementations handle storage and retrieval of:
- Uploaded targets (code, binaries, etc.)
- Workflow results
- Temporary files
"""
@abstractmethod
async def upload_target(
self,
file_path: Path,
user_id: str,
metadata: Optional[Dict[str, Any]] = None
) -> str:
"""
Upload a target file to storage.
Args:
file_path: Local path to file to upload
user_id: ID of user uploading the file
metadata: Optional metadata to store with file
Returns:
Target ID (unique identifier for retrieval)
Raises:
FileNotFoundError: If file_path doesn't exist
StorageError: If upload fails
"""
pass
@abstractmethod
async def get_target(self, target_id: str) -> Path:
"""
Get target file from storage.
Args:
target_id: Unique identifier from upload_target()
Returns:
Local path to cached file
Raises:
FileNotFoundError: If target doesn't exist
StorageError: If download fails
"""
pass
@abstractmethod
async def delete_target(self, target_id: str) -> None:
"""
Delete target from storage.
Args:
target_id: Unique identifier to delete
Raises:
StorageError: If deletion fails (doesn't raise if not found)
"""
pass
@abstractmethod
async def upload_results(
self,
workflow_id: str,
results: Dict[str, Any],
results_format: str = "json"
) -> str:
"""
Upload workflow results to storage.
Args:
workflow_id: Workflow execution ID
results: Results dictionary
results_format: Format (json, sarif, etc.)
Returns:
URL to uploaded results
Raises:
StorageError: If upload fails
"""
pass
@abstractmethod
async def get_results(self, workflow_id: str) -> Dict[str, Any]:
"""
Get workflow results from storage.
Args:
workflow_id: Workflow execution ID
Returns:
Results dictionary
Raises:
FileNotFoundError: If results don't exist
StorageError: If download fails
"""
pass
@abstractmethod
async def list_targets(
self,
user_id: Optional[str] = None,
limit: int = 100
) -> list[Dict[str, Any]]:
"""
List uploaded targets.
Args:
user_id: Filter by user ID (None = all users)
limit: Maximum number of results
Returns:
List of target metadata dictionaries
Raises:
StorageError: If listing fails
"""
pass
@abstractmethod
async def cleanup_cache(self) -> int:
"""
Clean up local cache (LRU eviction).
Returns:
Number of files removed
Raises:
StorageError: If cleanup fails
"""
pass
class StorageError(Exception):
"""Base exception for storage operations."""
pass
+423
View File
@@ -0,0 +1,423 @@
"""
S3-compatible storage backend with local caching.
Works with MinIO (dev/prod) or AWS S3 (cloud).
"""
import json
import logging
import os
import shutil
from datetime import datetime
from pathlib import Path
from typing import Optional, Dict, Any
from uuid import uuid4
import boto3
from botocore.exceptions import ClientError
from .base import StorageBackend, StorageError
logger = logging.getLogger(__name__)
class S3CachedStorage(StorageBackend):
"""
S3-compatible storage with local caching.
Features:
- Upload targets to S3/MinIO
- Download with local caching (LRU eviction)
- Lifecycle management (auto-cleanup old files)
- Metadata tracking
"""
def __init__(
self,
endpoint_url: Optional[str] = None,
access_key: Optional[str] = None,
secret_key: Optional[str] = None,
bucket: str = "targets",
region: str = "us-east-1",
use_ssl: bool = False,
cache_dir: Optional[Path] = None,
cache_max_size_gb: int = 10
):
"""
Initialize S3 storage backend.
Args:
endpoint_url: S3 endpoint (None = AWS S3, or MinIO URL)
access_key: S3 access key (None = from env)
secret_key: S3 secret key (None = from env)
bucket: S3 bucket name
region: AWS region
use_ssl: Use HTTPS
cache_dir: Local cache directory
cache_max_size_gb: Maximum cache size in GB
"""
# Use environment variables as defaults
self.endpoint_url = endpoint_url or os.getenv('S3_ENDPOINT', 'http://minio:9000')
self.access_key = access_key or os.getenv('S3_ACCESS_KEY', 'fuzzforge')
self.secret_key = secret_key or os.getenv('S3_SECRET_KEY', 'fuzzforge123')
self.bucket = bucket or os.getenv('S3_BUCKET', 'targets')
self.region = region or os.getenv('S3_REGION', 'us-east-1')
self.use_ssl = use_ssl or os.getenv('S3_USE_SSL', 'false').lower() == 'true'
# Cache configuration
self.cache_dir = cache_dir or Path(os.getenv('CACHE_DIR', '/tmp/fuzzforge-cache'))
self.cache_max_size = cache_max_size_gb * (1024 ** 3) # Convert to bytes
# Ensure cache directory exists
self.cache_dir.mkdir(parents=True, exist_ok=True)
# Initialize S3 client
try:
self.s3_client = boto3.client(
's3',
endpoint_url=self.endpoint_url,
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
region_name=self.region,
use_ssl=self.use_ssl
)
logger.info(f"Initialized S3 storage: {self.endpoint_url}/{self.bucket}")
except Exception as e:
logger.error(f"Failed to initialize S3 client: {e}")
raise StorageError(f"S3 initialization failed: {e}")
async def upload_target(
self,
file_path: Path,
user_id: str,
metadata: Optional[Dict[str, Any]] = None
) -> str:
"""Upload target file to S3/MinIO."""
if not file_path.exists():
raise FileNotFoundError(f"File not found: {file_path}")
# Generate unique target ID
target_id = str(uuid4())
# Prepare metadata
upload_metadata = {
'user_id': user_id,
'uploaded_at': datetime.now().isoformat(),
'filename': file_path.name,
'size': str(file_path.stat().st_size)
}
if metadata:
upload_metadata.update(metadata)
# Upload to S3
s3_key = f'{target_id}/target'
try:
logger.info(f"Uploading target to s3://{self.bucket}/{s3_key}")
self.s3_client.upload_file(
str(file_path),
self.bucket,
s3_key,
ExtraArgs={
'Metadata': upload_metadata
}
)
file_size_mb = file_path.stat().st_size / (1024 * 1024)
logger.info(
f"✓ Uploaded target {target_id} "
f"({file_path.name}, {file_size_mb:.2f} MB)"
)
return target_id
except ClientError as e:
logger.error(f"S3 upload failed: {e}", exc_info=True)
raise StorageError(f"Failed to upload target: {e}")
except Exception as e:
logger.error(f"Upload failed: {e}", exc_info=True)
raise StorageError(f"Upload error: {e}")
async def get_target(self, target_id: str) -> Path:
"""Get target from cache or download from S3/MinIO."""
# Check cache first
cache_path = self.cache_dir / target_id
cached_file = cache_path / "target"
if cached_file.exists():
# Update access time for LRU
cached_file.touch()
logger.info(f"Cache HIT: {target_id}")
return cached_file
# Cache miss - download from S3
logger.info(f"Cache MISS: {target_id}, downloading from S3...")
try:
# Create cache directory
cache_path.mkdir(parents=True, exist_ok=True)
# Download from S3
s3_key = f'{target_id}/target'
logger.info(f"Downloading s3://{self.bucket}/{s3_key}")
self.s3_client.download_file(
self.bucket,
s3_key,
str(cached_file)
)
# Verify download
if not cached_file.exists():
raise StorageError(f"Downloaded file not found: {cached_file}")
file_size_mb = cached_file.stat().st_size / (1024 * 1024)
logger.info(f"✓ Downloaded target {target_id} ({file_size_mb:.2f} MB)")
return cached_file
except ClientError as e:
error_code = e.response.get('Error', {}).get('Code')
if error_code in ['404', 'NoSuchKey']:
logger.error(f"Target not found: {target_id}")
raise FileNotFoundError(f"Target {target_id} not found in storage")
else:
logger.error(f"S3 download failed: {e}", exc_info=True)
raise StorageError(f"Download failed: {e}")
except Exception as e:
logger.error(f"Download error: {e}", exc_info=True)
# Cleanup partial download
if cache_path.exists():
shutil.rmtree(cache_path, ignore_errors=True)
raise StorageError(f"Download error: {e}")
async def delete_target(self, target_id: str) -> None:
"""Delete target from S3/MinIO."""
try:
s3_key = f'{target_id}/target'
logger.info(f"Deleting s3://{self.bucket}/{s3_key}")
self.s3_client.delete_object(
Bucket=self.bucket,
Key=s3_key
)
# Also delete from cache if present
cache_path = self.cache_dir / target_id
if cache_path.exists():
shutil.rmtree(cache_path, ignore_errors=True)
logger.info(f"✓ Deleted target {target_id} from S3 and cache")
else:
logger.info(f"✓ Deleted target {target_id} from S3")
except ClientError as e:
logger.error(f"S3 delete failed: {e}", exc_info=True)
# Don't raise error if object doesn't exist
if e.response.get('Error', {}).get('Code') not in ['404', 'NoSuchKey']:
raise StorageError(f"Delete failed: {e}")
except Exception as e:
logger.error(f"Delete error: {e}", exc_info=True)
raise StorageError(f"Delete error: {e}")
async def upload_results(
self,
workflow_id: str,
results: Dict[str, Any],
results_format: str = "json"
) -> str:
"""Upload workflow results to S3/MinIO."""
try:
# Prepare results content
if results_format == "json":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
elif results_format == "sarif":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/sarif+json'
file_ext = 'sarif'
else:
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
# Upload to results bucket
results_bucket = 'results'
s3_key = f'{workflow_id}/results.{file_ext}'
logger.info(f"Uploading results to s3://{results_bucket}/{s3_key}")
self.s3_client.put_object(
Bucket=results_bucket,
Key=s3_key,
Body=content,
ContentType=content_type,
Metadata={
'workflow_id': workflow_id,
'format': results_format,
'uploaded_at': datetime.now().isoformat()
}
)
# Construct URL
results_url = f"{self.endpoint_url}/{results_bucket}/{s3_key}"
logger.info(f"✓ Uploaded results: {results_url}")
return results_url
except Exception as e:
logger.error(f"Results upload failed: {e}", exc_info=True)
raise StorageError(f"Results upload failed: {e}")
async def get_results(self, workflow_id: str) -> Dict[str, Any]:
"""Get workflow results from S3/MinIO."""
try:
results_bucket = 'results'
s3_key = f'{workflow_id}/results.json'
logger.info(f"Downloading results from s3://{results_bucket}/{s3_key}")
response = self.s3_client.get_object(
Bucket=results_bucket,
Key=s3_key
)
content = response['Body'].read().decode('utf-8')
results = json.loads(content)
logger.info(f"✓ Downloaded results for workflow {workflow_id}")
return results
except ClientError as e:
error_code = e.response.get('Error', {}).get('Code')
if error_code in ['404', 'NoSuchKey']:
logger.error(f"Results not found: {workflow_id}")
raise FileNotFoundError(f"Results for workflow {workflow_id} not found")
else:
logger.error(f"Results download failed: {e}", exc_info=True)
raise StorageError(f"Results download failed: {e}")
except Exception as e:
logger.error(f"Results download error: {e}", exc_info=True)
raise StorageError(f"Results download error: {e}")
async def list_targets(
self,
user_id: Optional[str] = None,
limit: int = 100
) -> list[Dict[str, Any]]:
"""List uploaded targets."""
try:
targets = []
paginator = self.s3_client.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=self.bucket, PaginationConfig={'MaxItems': limit}):
for obj in page.get('Contents', []):
# Get object metadata
try:
metadata_response = self.s3_client.head_object(
Bucket=self.bucket,
Key=obj['Key']
)
metadata = metadata_response.get('Metadata', {})
# Filter by user_id if specified
if user_id and metadata.get('user_id') != user_id:
continue
targets.append({
'target_id': obj['Key'].split('/')[0],
'key': obj['Key'],
'size': obj['Size'],
'last_modified': obj['LastModified'].isoformat(),
'metadata': metadata
})
except Exception as e:
logger.warning(f"Failed to get metadata for {obj['Key']}: {e}")
continue
logger.info(f"Listed {len(targets)} targets (user_id={user_id})")
return targets
except Exception as e:
logger.error(f"List targets failed: {e}", exc_info=True)
raise StorageError(f"List targets failed: {e}")
async def cleanup_cache(self) -> int:
"""Clean up local cache using LRU eviction."""
try:
cache_files = []
total_size = 0
# Gather all cached files with metadata
for cache_file in self.cache_dir.rglob('*'):
if cache_file.is_file():
try:
stat = cache_file.stat()
cache_files.append({
'path': cache_file,
'size': stat.st_size,
'atime': stat.st_atime # Last access time
})
total_size += stat.st_size
except Exception as e:
logger.warning(f"Failed to stat {cache_file}: {e}")
continue
# Check if cleanup is needed
if total_size <= self.cache_max_size:
logger.info(
f"Cache size OK: {total_size / (1024**3):.2f} GB / "
f"{self.cache_max_size / (1024**3):.2f} GB"
)
return 0
# Sort by access time (oldest first)
cache_files.sort(key=lambda x: x['atime'])
# Remove files until under limit
removed_count = 0
for file_info in cache_files:
if total_size <= self.cache_max_size:
break
try:
file_info['path'].unlink()
total_size -= file_info['size']
removed_count += 1
logger.debug(f"Evicted from cache: {file_info['path']}")
except Exception as e:
logger.warning(f"Failed to delete {file_info['path']}: {e}")
continue
logger.info(
f"✓ Cache cleanup: removed {removed_count} files, "
f"new size: {total_size / (1024**3):.2f} GB"
)
return removed_count
except Exception as e:
logger.error(f"Cache cleanup failed: {e}", exc_info=True)
raise StorageError(f"Cache cleanup failed: {e}")
def get_cache_stats(self) -> Dict[str, Any]:
"""Get cache statistics."""
try:
total_size = 0
file_count = 0
for cache_file in self.cache_dir.rglob('*'):
if cache_file.is_file():
total_size += cache_file.stat().st_size
file_count += 1
return {
'total_size_bytes': total_size,
'total_size_gb': total_size / (1024 ** 3),
'file_count': file_count,
'max_size_gb': self.cache_max_size / (1024 ** 3),
'usage_percent': (total_size / self.cache_max_size) * 100
}
except Exception as e:
logger.error(f"Failed to get cache stats: {e}")
return {'error': str(e)}
+10
View File
@@ -0,0 +1,10 @@
"""
Temporal integration for FuzzForge.
Handles workflow execution, monitoring, and management.
"""
from .manager import TemporalManager
from .discovery import WorkflowDiscovery
__all__ = ["TemporalManager", "WorkflowDiscovery"]
+257
View File
@@ -0,0 +1,257 @@
"""
Workflow Discovery for Temporal
Discovers workflows from the toolbox/workflows directory
and provides metadata about available workflows.
"""
import logging
import yaml
from pathlib import Path
from typing import Dict, Any
from pydantic import BaseModel, Field, ConfigDict
logger = logging.getLogger(__name__)
class WorkflowInfo(BaseModel):
"""Information about a discovered workflow"""
name: str = Field(..., description="Workflow name")
path: Path = Field(..., description="Path to workflow directory")
workflow_file: Path = Field(..., description="Path to workflow.py file")
metadata: Dict[str, Any] = Field(..., description="Workflow metadata from YAML")
workflow_type: str = Field(..., description="Workflow class name")
vertical: str = Field(..., description="Vertical (worker type) for this workflow")
model_config = ConfigDict(arbitrary_types_allowed=True)
class WorkflowDiscovery:
"""
Discovers workflows from the filesystem.
Scans toolbox/workflows/ for directories containing:
- metadata.yaml (required)
- workflow.py (required)
Each workflow declares its vertical (rust, android, web, etc.)
which determines which worker pool will execute it.
"""
def __init__(self, workflows_dir: Path):
"""
Initialize workflow discovery.
Args:
workflows_dir: Path to the workflows directory
"""
self.workflows_dir = workflows_dir
if not self.workflows_dir.exists():
self.workflows_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Created workflows directory: {self.workflows_dir}")
async def discover_workflows(self) -> Dict[str, WorkflowInfo]:
"""
Discover workflows by scanning the workflows directory.
Returns:
Dictionary mapping workflow names to their information
"""
workflows = {}
logger.info(f"Scanning for workflows in: {self.workflows_dir}")
for workflow_dir in self.workflows_dir.iterdir():
if not workflow_dir.is_dir():
continue
# Skip special directories
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
continue
metadata_file = workflow_dir / "metadata.yaml"
if not metadata_file.exists():
logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
continue
workflow_file = workflow_dir / "workflow.py"
if not workflow_file.exists():
logger.warning(
f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
)
continue
try:
# Parse metadata
with open(metadata_file) as f:
metadata = yaml.safe_load(f)
# Validate required fields
if 'name' not in metadata:
logger.warning(f"Workflow {workflow_dir.name} metadata missing 'name' field")
metadata['name'] = workflow_dir.name
if 'vertical' not in metadata:
logger.warning(
f"Workflow {workflow_dir.name} metadata missing 'vertical' field"
)
continue
# Infer workflow class name from metadata or use convention
workflow_type = metadata.get('workflow_class')
if not workflow_type:
# Convention: convert snake_case to PascalCase + Workflow
# e.g., rust_test -> RustTestWorkflow
parts = workflow_dir.name.split('_')
workflow_type = ''.join(part.capitalize() for part in parts) + 'Workflow'
# Create workflow info
info = WorkflowInfo(
name=metadata['name'],
path=workflow_dir,
workflow_file=workflow_file,
metadata=metadata,
workflow_type=workflow_type,
vertical=metadata['vertical']
)
workflows[info.name] = info
logger.info(
f"✓ Discovered workflow: {info.name} "
f"(vertical: {info.vertical}, class: {info.workflow_type})"
)
except Exception as e:
logger.error(
f"Error discovering workflow {workflow_dir.name}: {e}",
exc_info=True
)
continue
logger.info(f"Discovered {len(workflows)} workflows")
return workflows
def get_workflows_by_vertical(
self,
workflows: Dict[str, WorkflowInfo],
vertical: str
) -> Dict[str, WorkflowInfo]:
"""
Filter workflows by vertical.
Args:
workflows: All discovered workflows
vertical: Vertical name to filter by
Returns:
Filtered workflows dictionary
"""
return {
name: info
for name, info in workflows.items()
if info.vertical == vertical
}
def get_available_verticals(self, workflows: Dict[str, WorkflowInfo]) -> list[str]:
"""
Get list of all verticals from discovered workflows.
Args:
workflows: All discovered workflows
Returns:
List of unique vertical names
"""
return list(set(info.vertical for info in workflows.values()))
@staticmethod
def get_metadata_schema() -> Dict[str, Any]:
"""
Get the JSON schema for workflow metadata.
Returns:
JSON schema dictionary
"""
return {
"type": "object",
"required": ["name", "version", "description", "author", "vertical", "parameters"],
"properties": {
"name": {
"type": "string",
"description": "Workflow name"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Semantic version (x.y.z)"
},
"vertical": {
"type": "string",
"description": "Vertical worker type (rust, android, web, etc.)"
},
"description": {
"type": "string",
"description": "Workflow description"
},
"author": {
"type": "string",
"description": "Workflow author"
},
"category": {
"type": "string",
"enum": ["comprehensive", "specialized", "fuzzing", "focused"],
"description": "Workflow category"
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Workflow tags for categorization"
},
"requirements": {
"type": "object",
"required": ["tools", "resources"],
"properties": {
"tools": {
"type": "array",
"items": {"type": "string"},
"description": "Required security tools"
},
"resources": {
"type": "object",
"required": ["memory", "cpu", "timeout"],
"properties": {
"memory": {
"type": "string",
"pattern": "^\\d+[GMK]i$",
"description": "Memory limit (e.g., 1Gi, 512Mi)"
},
"cpu": {
"type": "string",
"pattern": "^\\d+m?$",
"description": "CPU limit (e.g., 1000m, 2)"
},
"timeout": {
"type": "integer",
"minimum": 60,
"maximum": 7200,
"description": "Workflow timeout in seconds"
}
}
}
}
},
"parameters": {
"type": "object",
"description": "Workflow parameters schema"
},
"default_parameters": {
"type": "object",
"description": "Default parameter values"
},
"required_modules": {
"type": "array",
"items": {"type": "string"},
"description": "Required module names"
}
}
}
+371
View File
@@ -0,0 +1,371 @@
"""
Temporal Manager - Workflow execution and management
Handles:
- Workflow discovery from toolbox
- Workflow execution (submit to Temporal)
- Status monitoring
- Results retrieval
"""
import logging
import os
from pathlib import Path
from typing import Dict, Optional, Any
from uuid import uuid4
from temporalio.client import Client, WorkflowHandle
from temporalio.common import RetryPolicy
from datetime import timedelta
from .discovery import WorkflowDiscovery, WorkflowInfo
from src.storage import S3CachedStorage
logger = logging.getLogger(__name__)
class TemporalManager:
"""
Manages Temporal workflow execution for FuzzForge.
This class:
- Discovers available workflows from toolbox
- Submits workflow executions to Temporal
- Monitors workflow status
- Retrieves workflow results
"""
def __init__(
self,
workflows_dir: Optional[Path] = None,
temporal_address: Optional[str] = None,
temporal_namespace: str = "default",
storage: Optional[S3CachedStorage] = None
):
"""
Initialize Temporal manager.
Args:
workflows_dir: Path to workflows directory (default: toolbox/workflows)
temporal_address: Temporal server address (default: from env or localhost:7233)
temporal_namespace: Temporal namespace
storage: Storage backend for file uploads (default: S3CachedStorage)
"""
if workflows_dir is None:
workflows_dir = Path("toolbox/workflows")
self.temporal_address = temporal_address or os.getenv(
'TEMPORAL_ADDRESS',
'localhost:7233'
)
self.temporal_namespace = temporal_namespace
self.discovery = WorkflowDiscovery(workflows_dir)
self.workflows: Dict[str, WorkflowInfo] = {}
self.client: Optional[Client] = None
# Initialize storage backend
self.storage = storage or S3CachedStorage()
logger.info(
f"TemporalManager initialized: {self.temporal_address} "
f"(namespace: {self.temporal_namespace})"
)
async def initialize(self):
"""Initialize the manager by discovering workflows and connecting to Temporal."""
try:
# Discover workflows
self.workflows = await self.discovery.discover_workflows()
if not self.workflows:
logger.warning("No workflows discovered")
else:
logger.info(
f"Discovered {len(self.workflows)} workflows: "
f"{list(self.workflows.keys())}"
)
# Connect to Temporal
self.client = await Client.connect(
self.temporal_address,
namespace=self.temporal_namespace
)
logger.info(f"✓ Connected to Temporal: {self.temporal_address}")
except Exception as e:
logger.error(f"Failed to initialize Temporal manager: {e}", exc_info=True)
raise
async def close(self):
"""Close Temporal client connection."""
if self.client:
# Temporal client doesn't need explicit close in Python SDK
pass
async def get_workflows(self) -> Dict[str, WorkflowInfo]:
"""
Get all discovered workflows.
Returns:
Dictionary mapping workflow names to their info
"""
return self.workflows
async def get_workflow(self, name: str) -> Optional[WorkflowInfo]:
"""
Get workflow info by name.
Args:
name: Workflow name
Returns:
WorkflowInfo or None if not found
"""
return self.workflows.get(name)
async def upload_target(
self,
file_path: Path,
user_id: str,
metadata: Optional[Dict[str, Any]] = None
) -> str:
"""
Upload target file to storage.
Args:
file_path: Local path to file
user_id: User ID
metadata: Optional metadata
Returns:
Target ID for use in workflow execution
"""
target_id = await self.storage.upload_target(file_path, user_id, metadata)
logger.info(f"Uploaded target: {target_id}")
return target_id
async def run_workflow(
self,
workflow_name: str,
target_id: str,
workflow_params: Optional[Dict[str, Any]] = None,
workflow_id: Optional[str] = None
) -> WorkflowHandle:
"""
Execute a workflow.
Args:
workflow_name: Name of workflow to execute
target_id: Target ID (from upload_target)
workflow_params: Additional workflow parameters
workflow_id: Optional workflow ID (generated if not provided)
Returns:
WorkflowHandle for monitoring/results
Raises:
ValueError: If workflow not found or client not initialized
"""
if not self.client:
raise ValueError("Temporal client not initialized. Call initialize() first.")
# Get workflow info
workflow_info = self.workflows.get(workflow_name)
if not workflow_info:
raise ValueError(f"Workflow not found: {workflow_name}")
# Generate workflow ID if not provided
if not workflow_id:
workflow_id = f"{workflow_name}-{str(uuid4())[:8]}"
# Prepare workflow input arguments
workflow_params = workflow_params or {}
# Build args list: [target_id, ...workflow_params values]
# The workflow parameters are passed as individual positional args
workflow_args = [target_id]
# Add parameters in order based on workflow signature
# For security_assessment: scanner_config, analyzer_config, reporter_config
# For atheris_fuzzing: target_file, max_iterations, timeout_seconds
if workflow_params:
workflow_args.extend(workflow_params.values())
# Determine task queue from workflow vertical
vertical = workflow_info.metadata.get("vertical", "default")
task_queue = f"{vertical}-queue"
logger.info(
f"Starting workflow: {workflow_name} "
f"(id={workflow_id}, queue={task_queue}, target={target_id})"
)
try:
# Start workflow execution with positional arguments
handle = await self.client.start_workflow(
workflow=workflow_info.workflow_type, # Workflow class name
args=workflow_args, # Positional arguments
id=workflow_id,
task_queue=task_queue,
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(minutes=1),
maximum_attempts=3
)
)
logger.info(f"✓ Workflow started: {workflow_id}")
return handle
except Exception as e:
logger.error(f"Failed to start workflow {workflow_name}: {e}", exc_info=True)
raise
async def get_workflow_status(self, workflow_id: str) -> Dict[str, Any]:
"""
Get workflow execution status.
Args:
workflow_id: Workflow execution ID
Returns:
Status dictionary with workflow state
Raises:
ValueError: If client not initialized or workflow not found
"""
if not self.client:
raise ValueError("Temporal client not initialized")
try:
# Get workflow handle
handle = self.client.get_workflow_handle(workflow_id)
# Try to get result (non-blocking describe)
description = await handle.describe()
status = {
"workflow_id": workflow_id,
"status": description.status.name,
"start_time": description.start_time.isoformat() if description.start_time else None,
"execution_time": description.execution_time.isoformat() if description.execution_time else None,
"close_time": description.close_time.isoformat() if description.close_time else None,
"task_queue": description.task_queue,
}
logger.info(f"Workflow {workflow_id} status: {status['status']}")
return status
except Exception as e:
logger.error(f"Failed to get workflow status: {e}", exc_info=True)
raise
async def get_workflow_result(
self,
workflow_id: str,
timeout: Optional[timedelta] = None
) -> Any:
"""
Get workflow execution result (blocking).
Args:
workflow_id: Workflow execution ID
timeout: Maximum time to wait for result
Returns:
Workflow result
Raises:
ValueError: If client not initialized
TimeoutError: If timeout exceeded
"""
if not self.client:
raise ValueError("Temporal client not initialized")
try:
handle = self.client.get_workflow_handle(workflow_id)
logger.info(f"Waiting for workflow result: {workflow_id}")
# Wait for workflow to complete and get result
if timeout:
# Use asyncio timeout if provided
import asyncio
result = await asyncio.wait_for(handle.result(), timeout=timeout.total_seconds())
else:
result = await handle.result()
logger.info(f"✓ Workflow {workflow_id} completed")
return result
except Exception as e:
logger.error(f"Failed to get workflow result: {e}", exc_info=True)
raise
async def cancel_workflow(self, workflow_id: str) -> None:
"""
Cancel a running workflow.
Args:
workflow_id: Workflow execution ID
Raises:
ValueError: If client not initialized
"""
if not self.client:
raise ValueError("Temporal client not initialized")
try:
handle = self.client.get_workflow_handle(workflow_id)
await handle.cancel()
logger.info(f"✓ Workflow cancelled: {workflow_id}")
except Exception as e:
logger.error(f"Failed to cancel workflow: {e}", exc_info=True)
raise
async def list_workflows(
self,
filter_query: Optional[str] = None,
limit: int = 100
) -> list[Dict[str, Any]]:
"""
List workflow executions.
Args:
filter_query: Optional Temporal list filter query
limit: Maximum number of results
Returns:
List of workflow execution info
Raises:
ValueError: If client not initialized
"""
if not self.client:
raise ValueError("Temporal client not initialized")
try:
workflows = []
# Use Temporal's list API
async for workflow in self.client.list_workflows(filter_query):
workflows.append({
"workflow_id": workflow.id,
"workflow_type": workflow.workflow_type,
"status": workflow.status.name,
"start_time": workflow.start_time.isoformat() if workflow.start_time else None,
"close_time": workflow.close_time.isoformat() if workflow.close_time else None,
"task_queue": workflow.task_queue,
})
if len(workflows) >= limit:
break
logger.info(f"Listed {len(workflows)} workflows")
return workflows
except Exception as e:
logger.error(f"Failed to list workflows: {e}", exc_info=True)
raise
+119
View File
@@ -0,0 +1,119 @@
# FuzzForge Test Suite
Comprehensive test infrastructure for FuzzForge modules and workflows.
## Directory Structure
```
tests/
├── conftest.py # Shared pytest fixtures
├── unit/ # Fast, isolated unit tests
│ ├── test_modules/ # Module-specific tests
│ │ ├── test_cargo_fuzzer.py
│ │ └── test_atheris_fuzzer.py
│ ├── test_workflows/ # Workflow tests
│ └── test_api/ # API endpoint tests
├── integration/ # Integration tests (requires Docker)
└── fixtures/ # Test data and projects
├── test_projects/ # Vulnerable projects for testing
└── expected_results/ # Expected output for validation
```
## Running Tests
### All Tests
```bash
cd backend
pytest tests/ -v
```
### Unit Tests Only (Fast)
```bash
pytest tests/unit/ -v
```
### Integration Tests (Requires Docker)
```bash
# Start services
docker-compose up -d
# Run integration tests
pytest tests/integration/ -v
# Cleanup
docker-compose down
```
### With Coverage
```bash
pytest tests/ --cov=toolbox/modules --cov=src --cov-report=html
```
### Parallel Execution
```bash
pytest tests/unit/ -n auto
```
## Available Fixtures
### Workspace Fixtures
- `temp_workspace`: Empty temporary workspace
- `python_test_workspace`: Python project with vulnerabilities
- `rust_test_workspace`: Rust project with fuzz targets
### Module Fixtures
- `atheris_fuzzer`: AtherisFuzzer instance
- `cargo_fuzzer`: CargoFuzzer instance
- `file_scanner`: FileScanner instance
### Configuration Fixtures
- `atheris_config`: Default Atheris configuration
- `cargo_fuzz_config`: Default cargo-fuzz configuration
- `gitleaks_config`: Default Gitleaks configuration
### Mock Fixtures
- `mock_stats_callback`: Mock stats callback for fuzzing
- `mock_temporal_context`: Mock Temporal activity context
## Writing Tests
### Unit Test Example
```python
import pytest
@pytest.mark.asyncio
async def test_module_execution(cargo_fuzzer, rust_test_workspace, cargo_fuzz_config):
"""Test module execution"""
result = await cargo_fuzzer.execute(cargo_fuzz_config, rust_test_workspace)
assert result.status == "success"
assert result.execution_time > 0
```
### Integration Test Example
```python
@pytest.mark.integration
async def test_end_to_end_workflow():
"""Test complete workflow execution"""
# Test full workflow with real services
pass
```
## CI/CD Integration
Tests run automatically on:
- **Push to main/develop**: Full test suite
- **Pull requests**: Full test suite + coverage
- **Nightly**: Extended integration tests
See `.github/workflows/test.yml` for configuration.
## Code Coverage
Target coverage: **80%+** for core modules
View coverage report:
```bash
pytest tests/ --cov --cov-report=html
open htmlcov/index.html
```
+211
View File
@@ -11,9 +11,220 @@
import sys
from pathlib import Path
from typing import Dict, Any
import pytest
# Ensure project root is on sys.path so `src` is importable
ROOT = Path(__file__).resolve().parents[1]
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))
# Add toolbox to path for module imports
TOOLBOX = ROOT / "toolbox"
if str(TOOLBOX) not in sys.path:
sys.path.insert(0, str(TOOLBOX))
# ============================================================================
# Workspace Fixtures
# ============================================================================
@pytest.fixture
def temp_workspace(tmp_path):
"""Create a temporary workspace directory for testing"""
workspace = tmp_path / "workspace"
workspace.mkdir()
return workspace
@pytest.fixture
def python_test_workspace(temp_workspace):
"""Create a Python test workspace with sample files"""
# Create a simple Python project structure
(temp_workspace / "main.py").write_text("""
def process_data(data):
# Intentional bug: no bounds checking
return data[0:100]
def divide(a, b):
# Division by zero vulnerability
return a / b
""")
(temp_workspace / "config.py").write_text("""
# Hardcoded secrets for testing
API_KEY = "sk_test_1234567890abcdef"
DATABASE_URL = "postgresql://admin:password123@localhost/db"
AWS_SECRET = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
""")
return temp_workspace
@pytest.fixture
def rust_test_workspace(temp_workspace):
"""Create a Rust test workspace with fuzz targets"""
# Create Cargo.toml
(temp_workspace / "Cargo.toml").write_text("""[package]
name = "test_project"
version = "0.1.0"
edition = "2021"
[dependencies]
""")
# Create src/lib.rs
src_dir = temp_workspace / "src"
src_dir.mkdir()
(src_dir / "lib.rs").write_text("""
pub fn process_buffer(data: &[u8]) -> Vec<u8> {
if data.len() < 4 {
return Vec::new();
}
// Vulnerability: bounds checking issue
let size = data[0] as usize;
let mut result = Vec::new();
for i in 0..size {
result.push(data[i]);
}
result
}
""")
# Create fuzz directory structure
fuzz_dir = temp_workspace / "fuzz"
fuzz_dir.mkdir()
(fuzz_dir / "Cargo.toml").write_text("""[package]
name = "test_project-fuzz"
version = "0.0.0"
edition = "2021"
[dependencies]
libfuzzer-sys = "0.4"
[dependencies.test_project]
path = ".."
[[bin]]
name = "fuzz_target_1"
path = "fuzz_targets/fuzz_target_1.rs"
""")
fuzz_targets_dir = fuzz_dir / "fuzz_targets"
fuzz_targets_dir.mkdir()
(fuzz_targets_dir / "fuzz_target_1.rs").write_text("""#![no_main]
use libfuzzer_sys::fuzz_target;
use test_project::process_buffer;
fuzz_target!(|data: &[u8]| {
let _ = process_buffer(data);
});
""")
return temp_workspace
# ============================================================================
# Module Configuration Fixtures
# ============================================================================
@pytest.fixture
def atheris_config():
"""Default Atheris fuzzer configuration"""
return {
"target_file": "auto-discover",
"max_iterations": 1000,
"timeout_seconds": 10,
"corpus_dir": None
}
@pytest.fixture
def cargo_fuzz_config():
"""Default cargo-fuzz configuration"""
return {
"target_name": None,
"max_iterations": 1000,
"timeout_seconds": 10,
"sanitizer": "address"
}
@pytest.fixture
def gitleaks_config():
"""Default Gitleaks configuration"""
return {
"config_path": None,
"scan_uncommitted": True
}
@pytest.fixture
def file_scanner_config():
"""Default file scanner configuration"""
return {
"scan_patterns": ["*.py", "*.rs", "*.js"],
"exclude_patterns": ["*.test.*", "*.spec.*"],
"max_file_size": 1048576 # 1MB
}
# ============================================================================
# Module Instance Fixtures
# ============================================================================
@pytest.fixture
def atheris_fuzzer():
"""Create an AtherisFuzzer instance"""
from modules.fuzzer.atheris_fuzzer import AtherisFuzzer
return AtherisFuzzer()
@pytest.fixture
def cargo_fuzzer():
"""Create a CargoFuzzer instance"""
from modules.fuzzer.cargo_fuzzer import CargoFuzzer
return CargoFuzzer()
@pytest.fixture
def file_scanner():
"""Create a FileScanner instance"""
from modules.scanner.file_scanner import FileScanner
return FileScanner()
# ============================================================================
# Mock Fixtures
# ============================================================================
@pytest.fixture
def mock_stats_callback():
"""Mock stats callback for fuzzing"""
stats_received = []
async def callback(stats: Dict[str, Any]):
stats_received.append(stats)
callback.stats_received = stats_received
return callback
@pytest.fixture
def mock_temporal_context():
"""Mock Temporal activity context"""
class MockActivityInfo:
def __init__(self):
self.workflow_id = "test-workflow-123"
self.activity_id = "test-activity-1"
self.attempt = 1
class MockContext:
def __init__(self):
self.info = MockActivityInfo()
return MockContext()
View File
@@ -1,82 +0,0 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
from datetime import datetime, timezone, timedelta
from src.services.prefect_stats_monitor import PrefectStatsMonitor
from src.api import fuzzing
class FakeLog:
def __init__(self, message: str):
self.message = message
class FakeClient:
def __init__(self, logs):
self._logs = logs
async def read_logs(self, log_filter=None, limit=100, sort="TIMESTAMP_ASC"):
return self._logs
class FakeTaskRun:
def __init__(self):
self.id = "task-1"
self.start_time = datetime.now(timezone.utc) - timedelta(seconds=5)
def test_parse_stats_from_log_fuzzing():
mon = PrefectStatsMonitor()
msg = (
"INFO LIVE_STATS extra={'stats_type': 'fuzzing_live_update', "
"'executions': 42, 'executions_per_sec': 3.14, 'crashes': 1, 'unique_crashes': 1, 'corpus_size': 9}"
)
stats = mon._parse_stats_from_log(msg)
assert stats is not None
assert stats["stats_type"] == "fuzzing_live_update"
assert stats["executions"] == 42
def test_extract_stats_updates_and_broadcasts():
mon = PrefectStatsMonitor()
run_id = "run-123"
workflow = "wf"
fuzzing.initialize_fuzzing_tracking(run_id, workflow)
# Prepare a fake websocket to capture messages
sent = []
class FakeWS:
async def send_text(self, text: str):
sent.append(text)
fuzzing.active_connections[run_id] = [FakeWS()]
# Craft a log line the parser understands
msg = (
"INFO LIVE_STATS extra={'stats_type': 'fuzzing_live_update', "
"'executions': 10, 'executions_per_sec': 1.5, 'crashes': 0, 'unique_crashes': 0, 'corpus_size': 2}"
)
fake_client = FakeClient([FakeLog(msg)])
task_run = FakeTaskRun()
asyncio.run(mon._extract_stats_from_task(fake_client, run_id, task_run, workflow))
# Verify stats updated
stats = fuzzing.fuzzing_stats[run_id]
assert stats.executions == 10
assert stats.executions_per_sec == 1.5
# Verify a message was sent to WebSocket
assert sent, "Expected a stats_update message to be sent"
View File
@@ -0,0 +1,177 @@
"""
Unit tests for AtherisFuzzer module
"""
import pytest
from unittest.mock import AsyncMock, patch
@pytest.mark.asyncio
class TestAtherisFuzzerMetadata:
"""Test AtherisFuzzer metadata"""
async def test_metadata_structure(self, atheris_fuzzer):
"""Test that module metadata is properly defined"""
metadata = atheris_fuzzer.get_metadata()
assert metadata.name == "atheris_fuzzer"
assert metadata.category == "fuzzer"
assert "fuzzing" in metadata.tags
assert "python" in metadata.tags
@pytest.mark.asyncio
class TestAtherisFuzzerConfigValidation:
"""Test configuration validation"""
async def test_valid_config(self, atheris_fuzzer, atheris_config):
"""Test validation of valid configuration"""
assert atheris_fuzzer.validate_config(atheris_config) is True
async def test_invalid_max_iterations(self, atheris_fuzzer):
"""Test validation fails with invalid max_iterations"""
config = {
"target_file": "fuzz_target.py",
"max_iterations": -1,
"timeout_seconds": 10
}
with pytest.raises(ValueError, match="max_iterations"):
atheris_fuzzer.validate_config(config)
async def test_invalid_timeout(self, atheris_fuzzer):
"""Test validation fails with invalid timeout"""
config = {
"target_file": "fuzz_target.py",
"max_iterations": 1000,
"timeout_seconds": 0
}
with pytest.raises(ValueError, match="timeout_seconds"):
atheris_fuzzer.validate_config(config)
@pytest.mark.asyncio
class TestAtherisFuzzerDiscovery:
"""Test fuzz target discovery"""
async def test_auto_discover(self, atheris_fuzzer, python_test_workspace):
"""Test auto-discovery of Python fuzz targets"""
# Create a fuzz target file
(python_test_workspace / "fuzz_target.py").write_text("""
import atheris
import sys
def TestOneInput(data):
pass
if __name__ == "__main__":
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
""")
# Pass None for auto-discovery
target = atheris_fuzzer._discover_target(python_test_workspace, None)
assert target is not None
assert "fuzz_target.py" in str(target)
@pytest.mark.asyncio
class TestAtherisFuzzerExecution:
"""Test fuzzer execution logic"""
async def test_execution_creates_result(self, atheris_fuzzer, python_test_workspace, atheris_config):
"""Test that execution returns a ModuleResult"""
# Create a simple fuzz target
(python_test_workspace / "fuzz_target.py").write_text("""
import atheris
import sys
def TestOneInput(data):
if len(data) > 0:
pass
if __name__ == "__main__":
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
""")
# Use a very short timeout for testing
test_config = {
"target_file": "fuzz_target.py",
"max_iterations": 10,
"timeout_seconds": 1
}
# Mock the fuzzing subprocess to avoid actual execution
with patch.object(atheris_fuzzer, '_run_fuzzing', new_callable=AsyncMock, return_value=([], {"total_executions": 10})):
result = await atheris_fuzzer.execute(test_config, python_test_workspace)
assert result.module == "atheris_fuzzer"
assert result.status in ["success", "partial", "failed"]
assert isinstance(result.execution_time, float)
@pytest.mark.asyncio
class TestAtherisFuzzerStatsCallback:
"""Test stats callback functionality"""
async def test_stats_callback_invoked(self, atheris_fuzzer, python_test_workspace, atheris_config, mock_stats_callback):
"""Test that stats callback is invoked during fuzzing"""
(python_test_workspace / "fuzz_target.py").write_text("""
import atheris
import sys
def TestOneInput(data):
pass
if __name__ == "__main__":
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
""")
# Mock fuzzing to simulate stats
async def mock_run_fuzzing(test_one_input, target_path, workspace, max_iterations, timeout_seconds, stats_callback):
if stats_callback:
await stats_callback({
"total_execs": 100,
"execs_per_sec": 10.0,
"crashes": 0,
"coverage": 5,
"corpus_size": 2,
"elapsed_time": 10
})
return
with patch.object(atheris_fuzzer, '_run_fuzzing', side_effect=mock_run_fuzzing):
with patch.object(atheris_fuzzer, '_load_target_module', return_value=lambda x: None):
# Put stats_callback in config dict, not as kwarg
atheris_config["target_file"] = "fuzz_target.py"
atheris_config["stats_callback"] = mock_stats_callback
await atheris_fuzzer.execute(atheris_config, python_test_workspace)
# Verify callback was invoked
assert len(mock_stats_callback.stats_received) > 0
@pytest.mark.asyncio
class TestAtherisFuzzerFindingGeneration:
"""Test finding generation from crashes"""
async def test_create_crash_finding(self, atheris_fuzzer):
"""Test crash finding creation"""
finding = atheris_fuzzer.create_finding(
title="Crash: Exception in TestOneInput",
description="IndexError: list index out of range",
severity="high",
category="crash",
file_path="fuzz_target.py",
metadata={
"crash_type": "IndexError",
"stack_trace": "Traceback..."
}
)
assert finding.title == "Crash: Exception in TestOneInput"
assert finding.severity == "high"
assert finding.category == "crash"
assert "IndexError" in finding.metadata["crash_type"]
@@ -0,0 +1,177 @@
"""
Unit tests for CargoFuzzer module
"""
import pytest
from unittest.mock import AsyncMock, patch
@pytest.mark.asyncio
class TestCargoFuzzerMetadata:
"""Test CargoFuzzer metadata"""
async def test_metadata_structure(self, cargo_fuzzer):
"""Test that module metadata is properly defined"""
metadata = cargo_fuzzer.get_metadata()
assert metadata.name == "cargo_fuzz"
assert metadata.version == "0.11.2"
assert metadata.category == "fuzzer"
assert "fuzzing" in metadata.tags
assert "rust" in metadata.tags
@pytest.mark.asyncio
class TestCargoFuzzerConfigValidation:
"""Test configuration validation"""
async def test_valid_config(self, cargo_fuzzer, cargo_fuzz_config):
"""Test validation of valid configuration"""
assert cargo_fuzzer.validate_config(cargo_fuzz_config) is True
async def test_invalid_max_iterations(self, cargo_fuzzer):
"""Test validation fails with invalid max_iterations"""
config = {
"max_iterations": -1,
"timeout_seconds": 10,
"sanitizer": "address"
}
with pytest.raises(ValueError, match="max_iterations"):
cargo_fuzzer.validate_config(config)
async def test_invalid_timeout(self, cargo_fuzzer):
"""Test validation fails with invalid timeout"""
config = {
"max_iterations": 1000,
"timeout_seconds": 0,
"sanitizer": "address"
}
with pytest.raises(ValueError, match="timeout_seconds"):
cargo_fuzzer.validate_config(config)
async def test_invalid_sanitizer(self, cargo_fuzzer):
"""Test validation fails with invalid sanitizer"""
config = {
"max_iterations": 1000,
"timeout_seconds": 10,
"sanitizer": "invalid_sanitizer"
}
with pytest.raises(ValueError, match="sanitizer"):
cargo_fuzzer.validate_config(config)
@pytest.mark.asyncio
class TestCargoFuzzerWorkspaceValidation:
"""Test workspace validation"""
async def test_valid_workspace(self, cargo_fuzzer, rust_test_workspace):
"""Test validation of valid workspace"""
assert cargo_fuzzer.validate_workspace(rust_test_workspace) is True
async def test_nonexistent_workspace(self, cargo_fuzzer, tmp_path):
"""Test validation fails with nonexistent workspace"""
nonexistent = tmp_path / "does_not_exist"
with pytest.raises(ValueError, match="does not exist"):
cargo_fuzzer.validate_workspace(nonexistent)
async def test_workspace_is_file(self, cargo_fuzzer, tmp_path):
"""Test validation fails when workspace is a file"""
file_path = tmp_path / "file.txt"
file_path.write_text("test")
with pytest.raises(ValueError, match="not a directory"):
cargo_fuzzer.validate_workspace(file_path)
@pytest.mark.asyncio
class TestCargoFuzzerDiscovery:
"""Test fuzz target discovery"""
async def test_discover_targets(self, cargo_fuzzer, rust_test_workspace):
"""Test discovery of fuzz targets"""
targets = await cargo_fuzzer._discover_fuzz_targets(rust_test_workspace)
assert len(targets) == 1
assert "fuzz_target_1" in targets
async def test_no_fuzz_directory(self, cargo_fuzzer, temp_workspace):
"""Test discovery with no fuzz directory"""
targets = await cargo_fuzzer._discover_fuzz_targets(temp_workspace)
assert targets == []
@pytest.mark.asyncio
class TestCargoFuzzerExecution:
"""Test fuzzer execution logic"""
async def test_execution_creates_result(self, cargo_fuzzer, rust_test_workspace, cargo_fuzz_config):
"""Test that execution returns a ModuleResult"""
# Mock the build and run methods to avoid actual fuzzing
with patch.object(cargo_fuzzer, '_build_fuzz_target', new_callable=AsyncMock, return_value=True):
with patch.object(cargo_fuzzer, '_run_fuzzing', new_callable=AsyncMock, return_value=([], {"total_executions": 0, "crashes_found": 0})):
with patch.object(cargo_fuzzer, '_parse_crash_artifacts', new_callable=AsyncMock, return_value=[]):
result = await cargo_fuzzer.execute(cargo_fuzz_config, rust_test_workspace)
assert result.module == "cargo_fuzz"
assert result.status == "success"
assert isinstance(result.execution_time, float)
assert result.execution_time >= 0
async def test_execution_with_no_targets(self, cargo_fuzzer, temp_workspace, cargo_fuzz_config):
"""Test execution fails gracefully with no fuzz targets"""
result = await cargo_fuzzer.execute(cargo_fuzz_config, temp_workspace)
assert result.status == "failed"
assert "No fuzz targets found" in result.error
@pytest.mark.asyncio
class TestCargoFuzzerStatsCallback:
"""Test stats callback functionality"""
async def test_stats_callback_invoked(self, cargo_fuzzer, rust_test_workspace, cargo_fuzz_config, mock_stats_callback):
"""Test that stats callback is invoked during fuzzing"""
# Mock build/run to simulate stats generation
async def mock_run_fuzzing(workspace, target, config, callback):
# Simulate stats callback
if callback:
await callback({
"total_execs": 1000,
"execs_per_sec": 100.0,
"crashes": 0,
"coverage": 10,
"corpus_size": 5,
"elapsed_time": 10
})
return [], {"total_executions": 1000}
with patch.object(cargo_fuzzer, '_build_fuzz_target', new_callable=AsyncMock, return_value=True):
with patch.object(cargo_fuzzer, '_run_fuzzing', side_effect=mock_run_fuzzing):
with patch.object(cargo_fuzzer, '_parse_crash_artifacts', new_callable=AsyncMock, return_value=[]):
await cargo_fuzzer.execute(cargo_fuzz_config, rust_test_workspace, stats_callback=mock_stats_callback)
# Verify callback was invoked
assert len(mock_stats_callback.stats_received) > 0
assert mock_stats_callback.stats_received[0]["total_execs"] == 1000
@pytest.mark.asyncio
class TestCargoFuzzerFindingGeneration:
"""Test finding generation from crashes"""
async def test_create_finding_from_crash(self, cargo_fuzzer):
"""Test finding creation"""
finding = cargo_fuzzer.create_finding(
title="Crash: Segmentation Fault",
description="Test crash",
severity="critical",
category="crash",
file_path="fuzz/fuzz_targets/fuzz_target_1.rs",
metadata={"crash_type": "SIGSEGV"}
)
assert finding.title == "Crash: Segmentation Fault"
assert finding.severity == "critical"
assert finding.category == "crash"
assert finding.file_path == "fuzz/fuzz_targets/fuzz_target_1.rs"
assert finding.metadata["crash_type"] == "SIGSEGV"
@@ -0,0 +1,349 @@
"""
Unit tests for FileScanner module
"""
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).resolve().parents[3] / "toolbox"))
@pytest.mark.asyncio
class TestFileScannerMetadata:
"""Test FileScanner metadata"""
async def test_metadata_structure(self, file_scanner):
"""Test that metadata has correct structure"""
metadata = file_scanner.get_metadata()
assert metadata.name == "file_scanner"
assert metadata.version == "1.0.0"
assert metadata.category == "scanner"
assert "files" in metadata.tags
assert "enumeration" in metadata.tags
assert metadata.requires_workspace is True
@pytest.mark.asyncio
class TestFileScannerConfigValidation:
"""Test configuration validation"""
async def test_valid_config(self, file_scanner):
"""Test that valid config passes validation"""
config = {
"patterns": ["*.py", "*.js"],
"max_file_size": 1048576,
"check_sensitive": True,
"calculate_hashes": False
}
assert file_scanner.validate_config(config) is True
async def test_default_config(self, file_scanner):
"""Test that empty config uses defaults"""
config = {}
assert file_scanner.validate_config(config) is True
async def test_invalid_patterns_type(self, file_scanner):
"""Test that non-list patterns raises error"""
config = {"patterns": "*.py"}
with pytest.raises(ValueError, match="patterns must be a list"):
file_scanner.validate_config(config)
async def test_invalid_max_file_size(self, file_scanner):
"""Test that invalid max_file_size raises error"""
config = {"max_file_size": -1}
with pytest.raises(ValueError, match="max_file_size must be a positive integer"):
file_scanner.validate_config(config)
async def test_invalid_max_file_size_type(self, file_scanner):
"""Test that non-integer max_file_size raises error"""
config = {"max_file_size": "large"}
with pytest.raises(ValueError, match="max_file_size must be a positive integer"):
file_scanner.validate_config(config)
@pytest.mark.asyncio
class TestFileScannerExecution:
"""Test scanner execution"""
async def test_scan_python_files(self, file_scanner, python_test_workspace):
"""Test scanning Python files"""
config = {
"patterns": ["*.py"],
"check_sensitive": False,
"calculate_hashes": False
}
result = await file_scanner.execute(config, python_test_workspace)
assert result.module == "file_scanner"
assert result.status == "success"
assert len(result.findings) > 0
# Check that Python files were found
python_files = [f for f in result.findings if f.file_path.endswith('.py')]
assert len(python_files) > 0
async def test_scan_all_files(self, file_scanner, python_test_workspace):
"""Test scanning all files with wildcard"""
config = {
"patterns": ["*"],
"check_sensitive": False,
"calculate_hashes": False
}
result = await file_scanner.execute(config, python_test_workspace)
assert result.status == "success"
assert len(result.findings) > 0
assert result.summary["total_files"] > 0
async def test_scan_with_multiple_patterns(self, file_scanner, python_test_workspace):
"""Test scanning with multiple patterns"""
config = {
"patterns": ["*.py", "*.txt"],
"check_sensitive": False,
"calculate_hashes": False
}
result = await file_scanner.execute(config, python_test_workspace)
assert result.status == "success"
assert len(result.findings) > 0
async def test_empty_workspace(self, file_scanner, temp_workspace):
"""Test scanning empty workspace"""
config = {
"patterns": ["*.py"],
"check_sensitive": False
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
assert len(result.findings) == 0
assert result.summary["total_files"] == 0
@pytest.mark.asyncio
class TestFileScannerSensitiveDetection:
"""Test sensitive file detection"""
async def test_detect_env_file(self, file_scanner, temp_workspace):
"""Test detection of .env file"""
# Create .env file
(temp_workspace / ".env").write_text("API_KEY=secret123")
config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
# Check for sensitive file finding
sensitive_findings = [f for f in result.findings if f.category == "sensitive_file"]
assert len(sensitive_findings) > 0
assert any(".env" in f.title for f in sensitive_findings)
async def test_detect_private_key(self, file_scanner, temp_workspace):
"""Test detection of private key file"""
# Create private key file
(temp_workspace / "id_rsa").write_text("-----BEGIN RSA PRIVATE KEY-----")
config = {
"patterns": ["*"],
"check_sensitive": True
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
sensitive_findings = [f for f in result.findings if f.category == "sensitive_file"]
assert len(sensitive_findings) > 0
async def test_no_sensitive_detection_when_disabled(self, file_scanner, temp_workspace):
"""Test that sensitive detection can be disabled"""
(temp_workspace / ".env").write_text("API_KEY=secret123")
config = {
"patterns": ["*"],
"check_sensitive": False
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
sensitive_findings = [f for f in result.findings if f.category == "sensitive_file"]
assert len(sensitive_findings) == 0
@pytest.mark.asyncio
class TestFileScannerHashing:
"""Test file hashing functionality"""
async def test_hash_calculation(self, file_scanner, temp_workspace):
"""Test SHA256 hash calculation"""
# Create test file
test_file = temp_workspace / "test.txt"
test_file.write_text("Hello World")
config = {
"patterns": ["*.txt"],
"calculate_hashes": True
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
# Find the test.txt finding
txt_findings = [f for f in result.findings if "test.txt" in f.file_path]
assert len(txt_findings) > 0
# Check that hash was calculated
finding = txt_findings[0]
assert finding.metadata.get("file_hash") is not None
assert len(finding.metadata["file_hash"]) == 64 # SHA256 hex length
async def test_no_hash_when_disabled(self, file_scanner, temp_workspace):
"""Test that hashing can be disabled"""
test_file = temp_workspace / "test.txt"
test_file.write_text("Hello World")
config = {
"patterns": ["*.txt"],
"calculate_hashes": False
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
txt_findings = [f for f in result.findings if "test.txt" in f.file_path]
if len(txt_findings) > 0:
finding = txt_findings[0]
assert finding.metadata.get("file_hash") is None
@pytest.mark.asyncio
class TestFileScannerFileTypes:
"""Test file type detection"""
async def test_detect_python_type(self, file_scanner, temp_workspace):
"""Test detection of Python file type"""
(temp_workspace / "script.py").write_text("print('hello')")
config = {"patterns": ["*.py"]}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
py_findings = [f for f in result.findings if "script.py" in f.file_path]
assert len(py_findings) > 0
assert "python" in py_findings[0].metadata["file_type"]
async def test_detect_javascript_type(self, file_scanner, temp_workspace):
"""Test detection of JavaScript file type"""
(temp_workspace / "app.js").write_text("console.log('hello')")
config = {"patterns": ["*.js"]}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
js_findings = [f for f in result.findings if "app.js" in f.file_path]
assert len(js_findings) > 0
assert "javascript" in js_findings[0].metadata["file_type"]
async def test_file_type_summary(self, file_scanner, temp_workspace):
"""Test that file type summary is generated"""
(temp_workspace / "script.py").write_text("print('hello')")
(temp_workspace / "app.js").write_text("console.log('hello')")
(temp_workspace / "readme.txt").write_text("Documentation")
config = {"patterns": ["*"]}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
assert "file_types" in result.summary
assert len(result.summary["file_types"]) > 0
@pytest.mark.asyncio
class TestFileScannerSizeLimits:
"""Test file size handling"""
async def test_skip_large_files(self, file_scanner, temp_workspace):
"""Test that large files are skipped"""
# Create a "large" file
large_file = temp_workspace / "large.txt"
large_file.write_text("x" * 1000)
config = {
"patterns": ["*.txt"],
"max_file_size": 500 # Set limit smaller than file
}
result = await file_scanner.execute(config, temp_workspace)
# Should succeed but skip the large file
assert result.status == "success"
# The file should still be counted but not have a detailed finding
assert result.summary["total_files"] > 0
async def test_process_small_files(self, file_scanner, temp_workspace):
"""Test that small files are processed"""
small_file = temp_workspace / "small.txt"
small_file.write_text("small content")
config = {
"patterns": ["*.txt"],
"max_file_size": 1048576 # 1MB
}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
txt_findings = [f for f in result.findings if "small.txt" in f.file_path]
assert len(txt_findings) > 0
@pytest.mark.asyncio
class TestFileScannerSummary:
"""Test result summary generation"""
async def test_summary_structure(self, file_scanner, python_test_workspace):
"""Test that summary has correct structure"""
config = {"patterns": ["*"]}
result = await file_scanner.execute(config, python_test_workspace)
assert result.status == "success"
assert "total_files" in result.summary
assert "total_size_bytes" in result.summary
assert "file_types" in result.summary
assert "patterns_scanned" in result.summary
assert isinstance(result.summary["total_files"], int)
assert isinstance(result.summary["total_size_bytes"], int)
assert isinstance(result.summary["file_types"], dict)
assert isinstance(result.summary["patterns_scanned"], list)
async def test_summary_counts(self, file_scanner, temp_workspace):
"""Test that summary counts are accurate"""
# Create known files
(temp_workspace / "file1.py").write_text("content1")
(temp_workspace / "file2.py").write_text("content2")
(temp_workspace / "file3.txt").write_text("content3")
config = {"patterns": ["*"]}
result = await file_scanner.execute(config, temp_workspace)
assert result.status == "success"
assert result.summary["total_files"] == 3
assert result.summary["total_size_bytes"] > 0
@@ -0,0 +1,493 @@
"""
Unit tests for SecurityAnalyzer module
"""
import pytest
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[3] / "toolbox"))
from modules.analyzer.security_analyzer import SecurityAnalyzer
@pytest.fixture
def security_analyzer():
"""Create SecurityAnalyzer instance"""
return SecurityAnalyzer()
@pytest.mark.asyncio
class TestSecurityAnalyzerMetadata:
"""Test SecurityAnalyzer metadata"""
async def test_metadata_structure(self, security_analyzer):
"""Test that metadata has correct structure"""
metadata = security_analyzer.get_metadata()
assert metadata.name == "security_analyzer"
assert metadata.version == "1.0.0"
assert metadata.category == "analyzer"
assert "security" in metadata.tags
assert "vulnerabilities" in metadata.tags
assert metadata.requires_workspace is True
@pytest.mark.asyncio
class TestSecurityAnalyzerConfigValidation:
"""Test configuration validation"""
async def test_valid_config(self, security_analyzer):
"""Test that valid config passes validation"""
config = {
"file_extensions": [".py", ".js"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
assert security_analyzer.validate_config(config) is True
async def test_default_config(self, security_analyzer):
"""Test that empty config uses defaults"""
config = {}
assert security_analyzer.validate_config(config) is True
async def test_invalid_extensions_type(self, security_analyzer):
"""Test that non-list extensions raises error"""
config = {"file_extensions": ".py"}
with pytest.raises(ValueError, match="file_extensions must be a list"):
security_analyzer.validate_config(config)
@pytest.mark.asyncio
class TestSecurityAnalyzerSecretDetection:
"""Test hardcoded secret detection"""
async def test_detect_api_key(self, security_analyzer, temp_workspace):
"""Test detection of hardcoded API key"""
code_file = temp_workspace / "config.py"
code_file.write_text("""
# Configuration file
api_key = "apikey_live_abcdefghijklmnopqrstuvwxyzabcdefghijk"
database_url = "postgresql://localhost/db"
""")
config = {
"file_extensions": [".py"],
"check_secrets": True,
"check_sql": False,
"check_dangerous_functions": False
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
secret_findings = [f for f in result.findings if f.category == "hardcoded_secret"]
assert len(secret_findings) > 0
assert any("API Key" in f.title for f in secret_findings)
async def test_detect_password(self, security_analyzer, temp_workspace):
"""Test detection of hardcoded password"""
code_file = temp_workspace / "auth.py"
code_file.write_text("""
def connect():
password = "mySecretP@ssw0rd"
return connect_db(password)
""")
config = {
"file_extensions": [".py"],
"check_secrets": True,
"check_sql": False,
"check_dangerous_functions": False
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
secret_findings = [f for f in result.findings if f.category == "hardcoded_secret"]
assert len(secret_findings) > 0
async def test_detect_aws_credentials(self, security_analyzer, temp_workspace):
"""Test detection of AWS credentials"""
code_file = temp_workspace / "aws_config.py"
code_file.write_text("""
aws_access_key = "AKIAIOSFODNN7REALKEY"
aws_secret_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYREALKEY"
""")
config = {
"file_extensions": [".py"],
"check_secrets": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
aws_findings = [f for f in result.findings if "AWS" in f.title]
assert len(aws_findings) >= 2 # Both access key and secret key
async def test_no_secret_detection_when_disabled(self, security_analyzer, temp_workspace):
"""Test that secret detection can be disabled"""
code_file = temp_workspace / "config.py"
code_file.write_text('api_key = "sk_live_1234567890abcdef"')
config = {
"file_extensions": [".py"],
"check_secrets": False
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
secret_findings = [f for f in result.findings if f.category == "hardcoded_secret"]
assert len(secret_findings) == 0
@pytest.mark.asyncio
class TestSecurityAnalyzerSQLInjection:
"""Test SQL injection detection"""
async def test_detect_string_concatenation(self, security_analyzer, temp_workspace):
"""Test detection of SQL string concatenation"""
code_file = temp_workspace / "db.py"
code_file.write_text("""
def get_user(user_id):
query = "SELECT * FROM users WHERE id = " + user_id
return execute(query)
""")
config = {
"file_extensions": [".py"],
"check_secrets": False,
"check_sql": True,
"check_dangerous_functions": False
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
sql_findings = [f for f in result.findings if f.category == "sql_injection"]
assert len(sql_findings) > 0
async def test_detect_f_string_sql(self, security_analyzer, temp_workspace):
"""Test detection of f-string in SQL"""
code_file = temp_workspace / "db.py"
code_file.write_text("""
def get_user(name):
query = f"SELECT * FROM users WHERE name = '{name}'"
return execute(query)
""")
config = {
"file_extensions": [".py"],
"check_sql": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
sql_findings = [f for f in result.findings if f.category == "sql_injection"]
assert len(sql_findings) > 0
async def test_detect_dynamic_query_building(self, security_analyzer, temp_workspace):
"""Test detection of dynamic query building"""
code_file = temp_workspace / "queries.py"
code_file.write_text("""
def search(keyword):
query = "SELECT * FROM products WHERE name LIKE " + keyword
execute(query + " ORDER BY price")
""")
config = {
"file_extensions": [".py"],
"check_sql": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
sql_findings = [f for f in result.findings if f.category == "sql_injection"]
assert len(sql_findings) > 0
async def test_no_sql_detection_when_disabled(self, security_analyzer, temp_workspace):
"""Test that SQL detection can be disabled"""
code_file = temp_workspace / "db.py"
code_file.write_text('query = "SELECT * FROM users WHERE id = " + user_id')
config = {
"file_extensions": [".py"],
"check_sql": False
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
sql_findings = [f for f in result.findings if f.category == "sql_injection"]
assert len(sql_findings) == 0
@pytest.mark.asyncio
class TestSecurityAnalyzerDangerousFunctions:
"""Test dangerous function detection"""
async def test_detect_eval(self, security_analyzer, temp_workspace):
"""Test detection of eval() usage"""
code_file = temp_workspace / "dangerous.py"
code_file.write_text("""
def process_input(user_input):
result = eval(user_input)
return result
""")
config = {
"file_extensions": [".py"],
"check_secrets": False,
"check_sql": False,
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) > 0
assert any("eval" in f.title.lower() for f in dangerous_findings)
async def test_detect_exec(self, security_analyzer, temp_workspace):
"""Test detection of exec() usage"""
code_file = temp_workspace / "runner.py"
code_file.write_text("""
def run_code(code):
exec(code)
""")
config = {
"file_extensions": [".py"],
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) > 0
async def test_detect_os_system(self, security_analyzer, temp_workspace):
"""Test detection of os.system() usage"""
code_file = temp_workspace / "commands.py"
code_file.write_text("""
import os
def run_command(cmd):
os.system(cmd)
""")
config = {
"file_extensions": [".py"],
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) > 0
assert any("os.system" in f.title for f in dangerous_findings)
async def test_detect_pickle_loads(self, security_analyzer, temp_workspace):
"""Test detection of pickle.loads() usage"""
code_file = temp_workspace / "serializer.py"
code_file.write_text("""
import pickle
def deserialize(data):
return pickle.loads(data)
""")
config = {
"file_extensions": [".py"],
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) > 0
async def test_detect_javascript_eval(self, security_analyzer, temp_workspace):
"""Test detection of eval() in JavaScript"""
code_file = temp_workspace / "app.js"
code_file.write_text("""
function processInput(userInput) {
return eval(userInput);
}
""")
config = {
"file_extensions": [".js"],
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) > 0
async def test_detect_innerHTML(self, security_analyzer, temp_workspace):
"""Test detection of innerHTML (XSS risk)"""
code_file = temp_workspace / "dom.js"
code_file.write_text("""
function updateContent(html) {
document.getElementById("content").innerHTML = html;
}
""")
config = {
"file_extensions": [".js"],
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) > 0
async def test_no_dangerous_detection_when_disabled(self, security_analyzer, temp_workspace):
"""Test that dangerous function detection can be disabled"""
code_file = temp_workspace / "code.py"
code_file.write_text('result = eval(user_input)')
config = {
"file_extensions": [".py"],
"check_dangerous_functions": False
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(dangerous_findings) == 0
@pytest.mark.asyncio
class TestSecurityAnalyzerMultipleIssues:
"""Test detection of multiple issues in same file"""
async def test_detect_multiple_vulnerabilities(self, security_analyzer, temp_workspace):
"""Test detection of multiple vulnerability types"""
code_file = temp_workspace / "vulnerable.py"
code_file.write_text("""
import os
# Hardcoded credentials
api_key = "apikey_live_abcdefghijklmnopqrstuvwxyzabcdef"
password = "MySecureP@ssw0rd"
def process_query(user_input):
# SQL injection
query = "SELECT * FROM users WHERE name = " + user_input
# Dangerous function
result = eval(user_input)
# Command injection
os.system(user_input)
return result
""")
config = {
"file_extensions": [".py"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
# Should find multiple types of issues
secret_findings = [f for f in result.findings if f.category == "hardcoded_secret"]
sql_findings = [f for f in result.findings if f.category == "sql_injection"]
dangerous_findings = [f for f in result.findings if f.category == "dangerous_function"]
assert len(secret_findings) > 0
assert len(sql_findings) > 0
assert len(dangerous_findings) > 0
@pytest.mark.asyncio
class TestSecurityAnalyzerSummary:
"""Test result summary generation"""
async def test_summary_structure(self, security_analyzer, temp_workspace):
"""Test that summary has correct structure"""
(temp_workspace / "test.py").write_text("print('hello')")
config = {"file_extensions": [".py"]}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
assert "files_analyzed" in result.summary
assert "total_findings" in result.summary
assert "extensions_scanned" in result.summary
assert isinstance(result.summary["files_analyzed"], int)
assert isinstance(result.summary["total_findings"], int)
assert isinstance(result.summary["extensions_scanned"], list)
async def test_empty_workspace(self, security_analyzer, temp_workspace):
"""Test analyzing empty workspace"""
config = {"file_extensions": [".py"]}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "partial" # No files found
assert result.summary["files_analyzed"] == 0
async def test_analyze_multiple_file_types(self, security_analyzer, temp_workspace):
"""Test analyzing multiple file types"""
(temp_workspace / "app.py").write_text("print('hello')")
(temp_workspace / "script.js").write_text("console.log('hello')")
(temp_workspace / "index.php").write_text("<?php echo 'hello'; ?>")
config = {"file_extensions": [".py", ".js", ".php"]}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
assert result.summary["files_analyzed"] == 3
@pytest.mark.asyncio
class TestSecurityAnalyzerFalsePositives:
"""Test false positive filtering"""
async def test_skip_test_secrets(self, security_analyzer, temp_workspace):
"""Test that test/example secrets are filtered"""
code_file = temp_workspace / "test_config.py"
code_file.write_text("""
# Test configuration - should be filtered
api_key = "test_key_example"
password = "dummy_password_123"
token = "sample_token_placeholder"
""")
config = {
"file_extensions": [".py"],
"check_secrets": True
}
result = await security_analyzer.execute(config, temp_workspace)
assert result.status == "success"
# These should be filtered as false positives
secret_findings = [f for f in result.findings if f.category == "hardcoded_secret"]
# Should have fewer or no findings due to false positive filtering
assert len(secret_findings) == 0 or all(
not any(fp in f.description.lower() for fp in ['test', 'example', 'dummy', 'sample'])
for f in secret_findings
)
@@ -0,0 +1,369 @@
"""
FuzzForge Common Storage Activities
Activities for interacting with MinIO storage:
- get_target_activity: Download target from MinIO to local cache
- cleanup_cache_activity: Remove target from local cache
- upload_results_activity: Upload workflow results to MinIO
"""
import logging
import os
import shutil
from pathlib import Path
import boto3
from botocore.exceptions import ClientError
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Initialize S3 client (MinIO)
s3_client = boto3.client(
's3',
endpoint_url=os.getenv('S3_ENDPOINT', 'http://minio:9000'),
aws_access_key_id=os.getenv('S3_ACCESS_KEY', 'fuzzforge'),
aws_secret_access_key=os.getenv('S3_SECRET_KEY', 'fuzzforge123'),
region_name=os.getenv('S3_REGION', 'us-east-1'),
use_ssl=os.getenv('S3_USE_SSL', 'false').lower() == 'true'
)
# Configuration
S3_BUCKET = os.getenv('S3_BUCKET', 'targets')
CACHE_DIR = Path(os.getenv('CACHE_DIR', '/cache'))
CACHE_MAX_SIZE_GB = int(os.getenv('CACHE_MAX_SIZE', '10').rstrip('GB'))
@activity.defn(name="get_target")
async def get_target_activity(
target_id: str,
run_id: str = None,
workspace_isolation: str = "isolated"
) -> str:
"""
Download target from MinIO to local cache.
Args:
target_id: UUID of the uploaded target
run_id: Workflow run ID for isolation (required for isolated mode)
workspace_isolation: Isolation mode - "isolated" (default), "shared", or "copy-on-write"
Returns:
Local path to the cached target workspace
Raises:
FileNotFoundError: If target doesn't exist in MinIO
ValueError: If run_id not provided for isolated mode
Exception: For other download errors
"""
logger.info(
f"Activity: get_target (target_id={target_id}, run_id={run_id}, "
f"isolation={workspace_isolation})"
)
# Validate isolation mode
valid_modes = ["isolated", "shared", "copy-on-write"]
if workspace_isolation not in valid_modes:
raise ValueError(
f"Invalid workspace_isolation mode: {workspace_isolation}. "
f"Must be one of: {valid_modes}"
)
# Require run_id for isolated and copy-on-write modes
if workspace_isolation in ["isolated", "copy-on-write"] and not run_id:
raise ValueError(
f"run_id is required for workspace_isolation='{workspace_isolation}'"
)
# Define cache paths based on isolation mode
if workspace_isolation == "isolated":
# Each run gets its own isolated workspace
cache_path = CACHE_DIR / target_id / run_id
cached_file = cache_path / "target"
elif workspace_isolation == "shared":
# All runs share the same workspace (legacy behavior)
cache_path = CACHE_DIR / target_id
cached_file = cache_path / "target"
else: # copy-on-write
# Shared download, run-specific copy
shared_cache_path = CACHE_DIR / target_id / "shared"
cache_path = CACHE_DIR / target_id / run_id
cached_file = shared_cache_path / "target"
# Handle copy-on-write mode
if workspace_isolation == "copy-on-write":
# Check if shared cache exists
if cached_file.exists():
logger.info(f"Copy-on-write: Shared cache HIT for {target_id}")
# Copy shared workspace to run-specific path
shared_workspace = shared_cache_path / "workspace"
run_workspace = cache_path / "workspace"
if shared_workspace.exists():
logger.info(f"Copying workspace to isolated run path: {run_workspace}")
cache_path.mkdir(parents=True, exist_ok=True)
shutil.copytree(shared_workspace, run_workspace)
return str(run_workspace)
else:
# Shared file exists but not extracted (non-tarball)
run_file = cache_path / "target"
cache_path.mkdir(parents=True, exist_ok=True)
shutil.copy2(cached_file, run_file)
return str(run_file)
# If shared cache doesn't exist, fall through to download
# Check if target is already cached (isolated or shared mode)
elif cached_file.exists():
# Update access time for LRU
cached_file.touch()
logger.info(f"Cache HIT: {target_id} (mode: {workspace_isolation})")
# Check if workspace directory exists (extracted tarball)
workspace_dir = cache_path / "workspace"
if workspace_dir.exists() and workspace_dir.is_dir():
logger.info(f"Returning cached workspace: {workspace_dir}")
return str(workspace_dir)
else:
# Return cached file (not a tarball)
return str(cached_file)
# Cache miss - download from MinIO
logger.info(
f"Cache MISS: {target_id} (mode: {workspace_isolation}), "
f"downloading from MinIO..."
)
try:
# Create cache directory
cache_path.mkdir(parents=True, exist_ok=True)
# Download from S3/MinIO
s3_key = f'{target_id}/target'
logger.info(f"Downloading s3://{S3_BUCKET}/{s3_key} -> {cached_file}")
s3_client.download_file(
Bucket=S3_BUCKET,
Key=s3_key,
Filename=str(cached_file)
)
# Verify file was downloaded
if not cached_file.exists():
raise FileNotFoundError(f"Downloaded file not found: {cached_file}")
file_size = cached_file.stat().st_size
logger.info(
f"✓ Downloaded target {target_id} "
f"({file_size / 1024 / 1024:.2f} MB)"
)
# Extract tarball if it's an archive
import tarfile
workspace_dir = cache_path / "workspace"
if tarfile.is_tarfile(str(cached_file)):
logger.info(f"Extracting tarball to {workspace_dir}...")
workspace_dir.mkdir(parents=True, exist_ok=True)
with tarfile.open(str(cached_file), 'r:*') as tar:
tar.extractall(path=workspace_dir)
logger.info(f"✓ Extracted tarball to {workspace_dir}")
# For copy-on-write mode, copy to run-specific path
if workspace_isolation == "copy-on-write":
run_cache_path = CACHE_DIR / target_id / run_id
run_workspace = run_cache_path / "workspace"
logger.info(f"Copy-on-write: Copying to {run_workspace}")
run_cache_path.mkdir(parents=True, exist_ok=True)
shutil.copytree(workspace_dir, run_workspace)
return str(run_workspace)
return str(workspace_dir)
else:
# Not a tarball
if workspace_isolation == "copy-on-write":
# Copy file to run-specific path
run_cache_path = CACHE_DIR / target_id / run_id
run_file = run_cache_path / "target"
logger.info(f"Copy-on-write: Copying file to {run_file}")
run_cache_path.mkdir(parents=True, exist_ok=True)
shutil.copy2(cached_file, run_file)
return str(run_file)
return str(cached_file)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == '404' or error_code == 'NoSuchKey':
logger.error(f"Target not found in MinIO: {target_id}")
raise FileNotFoundError(f"Target {target_id} not found in storage")
else:
logger.error(f"S3/MinIO error downloading target: {e}", exc_info=True)
raise
except Exception as e:
logger.error(f"Failed to download target {target_id}: {e}", exc_info=True)
# Cleanup partial download
if cache_path.exists():
shutil.rmtree(cache_path, ignore_errors=True)
raise
@activity.defn(name="cleanup_cache")
async def cleanup_cache_activity(
target_path: str,
workspace_isolation: str = "isolated"
) -> None:
"""
Remove target from local cache after workflow completes.
Args:
target_path: Path to the cached target workspace (from get_target_activity)
workspace_isolation: Isolation mode used - determines cleanup scope
Notes:
- "isolated" mode: Removes the entire run-specific directory
- "copy-on-write" mode: Removes run-specific directory, keeps shared cache
- "shared" mode: Does NOT remove cache (shared across runs)
"""
logger.info(
f"Activity: cleanup_cache (path={target_path}, "
f"isolation={workspace_isolation})"
)
try:
target = Path(target_path)
# For shared mode, don't clean up (cache is shared across runs)
if workspace_isolation == "shared":
logger.info(
f"Skipping cleanup for shared workspace (mode={workspace_isolation})"
)
return
# For isolated and copy-on-write modes, clean up run-specific directory
# Navigate up to the run-specific directory: /cache/{target_id}/{run_id}/
if target.name == "workspace":
# Path is .../workspace, go up one level to run directory
run_dir = target.parent
else:
# Path is a file, go up one level to run directory
run_dir = target.parent
# Validate it's in cache and looks like a run-specific path
if run_dir.exists() and run_dir.is_relative_to(CACHE_DIR):
# Check if parent is target_id directory (validate structure)
target_id_dir = run_dir.parent
if target_id_dir.is_relative_to(CACHE_DIR):
shutil.rmtree(run_dir)
logger.info(
f"✓ Cleaned up run-specific directory: {run_dir} "
f"(mode={workspace_isolation})"
)
else:
logger.warning(
f"Unexpected cache structure, skipping cleanup: {run_dir}"
)
else:
logger.warning(
f"Cache path not in CACHE_DIR or doesn't exist: {run_dir}"
)
except Exception as e:
# Don't fail workflow if cleanup fails
logger.error(
f"Failed to cleanup cache {target_path}: {e}",
exc_info=True
)
@activity.defn(name="upload_results")
async def upload_results_activity(
workflow_id: str,
results: dict,
results_format: str = "json"
) -> str:
"""
Upload workflow results to MinIO.
Args:
workflow_id: Workflow execution ID
results: Results dictionary to upload
results_format: Format for results (json, sarif, etc.)
Returns:
S3 URL to the uploaded results
"""
logger.info(
f"Activity: upload_results "
f"(workflow_id={workflow_id}, format={results_format})"
)
try:
import json
# Prepare results content
if results_format == "json":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
elif results_format == "sarif":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/sarif+json'
file_ext = 'sarif'
else:
# Default to JSON
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
# Upload to MinIO
s3_key = f'{workflow_id}/results.{file_ext}'
logger.info(f"Uploading results to s3://results/{s3_key}")
s3_client.put_object(
Bucket='results',
Key=s3_key,
Body=content,
ContentType=content_type,
Metadata={
'workflow_id': workflow_id,
'format': results_format
}
)
# Construct S3 URL
s3_endpoint = os.getenv('S3_ENDPOINT', 'http://minio:9000')
s3_url = f"{s3_endpoint}/results/{s3_key}"
logger.info(f"✓ Uploaded results: {s3_url}")
return s3_url
except Exception as e:
logger.error(
f"Failed to upload results for workflow {workflow_id}: {e}",
exc_info=True
)
raise
def _check_cache_size():
"""Check total cache size and log warning if exceeding limit"""
try:
total_size = 0
for item in CACHE_DIR.rglob('*'):
if item.is_file():
total_size += item.stat().st_size
total_size_gb = total_size / (1024 ** 3)
if total_size_gb > CACHE_MAX_SIZE_GB:
logger.warning(
f"Cache size ({total_size_gb:.2f} GB) exceeds "
f"limit ({CACHE_MAX_SIZE_GB} GB). Consider cleanup."
)
except Exception as e:
logger.error(f"Failed to check cache size: {e}")
@@ -16,7 +16,7 @@ Security Analyzer Module - Analyzes code for security vulnerabilities
import logging
import re
from pathlib import Path
from typing import Dict, Any, List, Optional
from typing import Dict, Any, List
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
-1
View File
@@ -17,7 +17,6 @@ from abc import ABC, abstractmethod
from pathlib import Path
from typing import Dict, Any, List, Optional
from pydantic import BaseModel, Field
from datetime import datetime
import logging
logger = logging.getLogger(__name__)
@@ -0,0 +1,10 @@
"""
Fuzzing modules for FuzzForge
This package contains fuzzing modules for different fuzzing engines.
"""
from .atheris_fuzzer import AtherisFuzzer
from .cargo_fuzzer import CargoFuzzer
__all__ = ["AtherisFuzzer", "CargoFuzzer"]
@@ -0,0 +1,608 @@
"""
Atheris Fuzzer Module
Reusable module for fuzzing Python code using Atheris.
Discovers and fuzzes user-provided Python targets with TestOneInput() function.
"""
import asyncio
import base64
import importlib.util
import logging
import multiprocessing
import os
import sys
import time
from datetime import datetime
from pathlib import Path
from typing import Dict, Any, List, Optional, Callable
import uuid
import httpx
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
def _run_atheris_in_subprocess(
target_path_str: str,
corpus_dir_str: str,
max_iterations: int,
timeout_seconds: int,
shared_crashes: Any,
exec_counter: multiprocessing.Value,
crash_counter: multiprocessing.Value,
coverage_counter: multiprocessing.Value
):
"""
Run atheris.Fuzz() in a separate process to isolate os._exit() calls.
This function runs in a subprocess and loads the target module,
sets up atheris, and runs fuzzing. Stats are communicated via shared memory.
Args:
target_path_str: String path to target file
corpus_dir_str: String path to corpus directory
max_iterations: Maximum fuzzing iterations
timeout_seconds: Timeout in seconds
shared_crashes: Manager().list() for storing crash details
exec_counter: Shared counter for executions
crash_counter: Shared counter for crashes
coverage_counter: Shared counter for coverage edges
"""
import atheris
import importlib.util
import traceback
from pathlib import Path
target_path = Path(target_path_str)
total_executions = 0
# NOTE: Crash details are written directly to shared_crashes (Manager().list())
# so they can be accessed by parent process after subprocess exits.
# We don't use a local crashes list because os._exit() prevents cleanup code.
try:
# Load target module in subprocess
module_name = f"fuzz_target_{uuid.uuid4().hex[:8]}"
spec = importlib.util.spec_from_file_location(module_name, target_path)
if spec is None or spec.loader is None:
raise ImportError(f"Could not load module from {target_path}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
if not hasattr(module, "TestOneInput"):
raise AttributeError("Module does not have TestOneInput() function")
test_one_input = module.TestOneInput
# Wrapper to track executions and crashes
def fuzz_wrapper(data):
nonlocal total_executions
total_executions += 1
# Update shared counter for live stats
with exec_counter.get_lock():
exec_counter.value += 1
try:
test_one_input(data)
except Exception as e:
# Capture crash details to shared memory
crash_info = {
"input": bytes(data), # Convert to bytes for serialization
"exception_type": type(e).__name__,
"exception_message": str(e),
"stack_trace": traceback.format_exc(),
"execution": total_executions
}
# Write to shared memory so parent process can access crash details
shared_crashes.append(crash_info)
# Update shared crash counter
with crash_counter.get_lock():
crash_counter.value += 1
# Re-raise so Atheris detects it
raise
# Check for dictionary file in target directory
dict_args = []
target_dir = target_path.parent
for dict_name in ["fuzz.dict", "fuzzing.dict", "dict.txt"]:
dict_path = target_dir / dict_name
if dict_path.exists():
dict_args.append(f"-dict={dict_path}")
break
# Configure Atheris
atheris_args = [
"atheris_fuzzer",
f"-runs={max_iterations}",
f"-max_total_time={timeout_seconds}",
"-print_final_stats=1"
] + dict_args + [corpus_dir_str] # Corpus directory as positional arg
atheris.Setup(atheris_args, fuzz_wrapper)
# Run fuzzing (this will call os._exit() when done)
atheris.Fuzz()
except SystemExit:
# Atheris exits when done - this is normal
# Crash details already written to shared_crashes
pass
except Exception:
# Fatal error - traceback already written to shared memory
# via crash handler in fuzz_wrapper
pass
class AtherisFuzzer(BaseModule):
"""
Atheris fuzzing module - discovers and fuzzes Python code.
This module can be used by any workflow to fuzz Python targets.
"""
def __init__(self):
super().__init__()
self.crashes = []
self.total_executions = 0
self.start_time = None
self.last_stats_time = 0
self.run_id = None
def get_metadata(self) -> ModuleMetadata:
"""Return module metadata"""
return ModuleMetadata(
name="atheris_fuzzer",
version="1.0.0",
description="Python fuzzing using Atheris - discovers and fuzzes TestOneInput() functions",
author="FuzzForge Team",
category="fuzzer",
tags=["fuzzing", "atheris", "python", "coverage"],
input_schema={
"type": "object",
"properties": {
"target_file": {
"type": "string",
"description": "Python file with TestOneInput() function (auto-discovered if not specified)"
},
"max_iterations": {
"type": "integer",
"description": "Maximum fuzzing iterations",
"default": 100000
},
"timeout_seconds": {
"type": "integer",
"description": "Fuzzing timeout in seconds",
"default": 300
},
"stats_callback": {
"description": "Optional callback for real-time statistics"
}
}
},
requires_workspace=True
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate fuzzing configuration"""
max_iterations = config.get("max_iterations", 100000)
if not isinstance(max_iterations, int) or max_iterations <= 0:
raise ValueError(f"max_iterations must be positive integer, got: {max_iterations}")
timeout = config.get("timeout_seconds", 300)
if not isinstance(timeout, int) or timeout <= 0:
raise ValueError(f"timeout_seconds must be positive integer, got: {timeout}")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""
Execute Atheris fuzzing on user code.
Args:
config: Fuzzing configuration
workspace: Path to user's uploaded code
Returns:
ModuleResult with crash findings
"""
self.start_timer()
self.start_time = time.time()
# Validate configuration
self.validate_config(config)
self.validate_workspace(workspace)
# Extract config
target_file = config.get("target_file")
max_iterations = config.get("max_iterations", 100000)
timeout_seconds = config.get("timeout_seconds", 300)
stats_callback = config.get("stats_callback")
self.run_id = config.get("run_id")
logger.info(
f"Starting Atheris fuzzing (max_iterations={max_iterations}, "
f"timeout={timeout_seconds}s, target={target_file or 'auto-discover'})"
)
try:
# Step 1: Discover or load target
target_path = self._discover_target(workspace, target_file)
logger.info(f"Using fuzz target: {target_path}")
# Step 2: Load target module
test_one_input = self._load_target_module(target_path)
logger.info(f"Loaded TestOneInput function from {target_path}")
# Step 3: Run fuzzing
await self._run_fuzzing(
test_one_input=test_one_input,
target_path=target_path,
workspace=workspace,
max_iterations=max_iterations,
timeout_seconds=timeout_seconds,
stats_callback=stats_callback
)
# Step 4: Generate findings from crashes
findings = await self._generate_findings(target_path)
logger.info(
f"Fuzzing completed: {self.total_executions} executions, "
f"{len(self.crashes)} crashes found"
)
# Generate SARIF report (always, even with no findings)
from modules.reporter import SARIFReporter
reporter = SARIFReporter()
reporter_config = {
"findings": findings,
"tool_name": "Atheris Fuzzer",
"tool_version": self._metadata.version
}
reporter_result = await reporter.execute(reporter_config, workspace)
sarif_report = reporter_result.sarif
return ModuleResult(
module=self._metadata.name,
version=self._metadata.version,
status="success",
execution_time=self.get_execution_time(),
findings=findings,
summary={
"total_executions": self.total_executions,
"crashes_found": len(self.crashes),
"execution_time": self.get_execution_time(),
"target_file": str(target_path.relative_to(workspace))
},
metadata={
"max_iterations": max_iterations,
"timeout_seconds": timeout_seconds
},
sarif=sarif_report
)
except Exception as e:
logger.error(f"Fuzzing failed: {e}", exc_info=True)
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
def _discover_target(self, workspace: Path, target_file: Optional[str]) -> Path:
"""
Discover fuzz target in workspace.
Args:
workspace: Path to workspace
target_file: Explicit target file or None for auto-discovery
Returns:
Path to target file
"""
if target_file:
# Use specified target
target_path = workspace / target_file
if not target_path.exists():
raise FileNotFoundError(f"Target file not found: {target_file}")
return target_path
# Auto-discover: look for fuzz_*.py or *_fuzz.py
logger.info("Auto-discovering fuzz targets...")
candidates = []
# Use rglob for recursive search (searches all subdirectories)
for pattern in ["fuzz_*.py", "*_fuzz.py", "fuzz_target.py"]:
matches = list(workspace.rglob(pattern))
candidates.extend(matches)
if not candidates:
raise FileNotFoundError(
"No fuzz targets found. Expected files matching: fuzz_*.py, *_fuzz.py, or fuzz_target.py"
)
# Use first candidate
target = candidates[0]
if len(candidates) > 1:
logger.warning(
f"Multiple fuzz targets found: {[str(c) for c in candidates]}. "
f"Using: {target.name}"
)
return target
def _load_target_module(self, target_path: Path) -> Callable:
"""
Load target module and get TestOneInput function.
Args:
target_path: Path to Python file with TestOneInput
Returns:
TestOneInput function
"""
# Add target directory to sys.path
target_dir = target_path.parent
if str(target_dir) not in sys.path:
sys.path.insert(0, str(target_dir))
# Load module dynamically
module_name = target_path.stem
spec = importlib.util.spec_from_file_location(module_name, target_path)
if spec is None or spec.loader is None:
raise ImportError(f"Cannot load module from {target_path}")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Get TestOneInput function
if not hasattr(module, "TestOneInput"):
raise AttributeError(
f"Module {module_name} does not have TestOneInput() function. "
"Atheris requires a TestOneInput(data: bytes) function."
)
return module.TestOneInput
async def _run_fuzzing(
self,
test_one_input: Callable,
target_path: Path,
workspace: Path,
max_iterations: int,
timeout_seconds: int,
stats_callback: Optional[Callable] = None
):
"""
Run Atheris fuzzing with real-time monitoring.
Args:
test_one_input: TestOneInput function to fuzz (not used, loaded in subprocess)
target_path: Path to target file
workspace: Path to workspace directory
max_iterations: Max iterations
timeout_seconds: Timeout in seconds
stats_callback: Optional callback for stats
"""
self.crashes = []
self.total_executions = 0
# Create corpus directory in workspace
corpus_dir = workspace / ".fuzzforge_corpus"
corpus_dir.mkdir(exist_ok=True)
logger.info(f"Using corpus directory: {corpus_dir}")
logger.info(f"Starting Atheris fuzzer in subprocess (max_runs={max_iterations}, timeout={timeout_seconds}s)...")
# Create shared memory for subprocess communication
ctx = multiprocessing.get_context('spawn')
manager = ctx.Manager()
shared_crashes = manager.list() # Shared list for crash details
exec_counter = ctx.Value('i', 0) # Shared execution counter
crash_counter = ctx.Value('i', 0) # Shared crash counter
coverage_counter = ctx.Value('i', 0) # Shared coverage counter
# Start fuzzing in subprocess
process = ctx.Process(
target=_run_atheris_in_subprocess,
args=(str(target_path), str(corpus_dir), max_iterations, timeout_seconds, shared_crashes, exec_counter, crash_counter, coverage_counter)
)
# Run fuzzing in a separate task with monitoring
async def monitor_stats():
"""Monitor and report stats every 0.5 seconds"""
while True:
await asyncio.sleep(0.5)
if stats_callback:
elapsed = time.time() - self.start_time
# Read from shared counters
current_execs = exec_counter.value
current_crashes = crash_counter.value
current_coverage = coverage_counter.value
execs_per_sec = current_execs / elapsed if elapsed > 0 else 0
# Count corpus files
try:
corpus_size = len(list(corpus_dir.iterdir())) if corpus_dir.exists() else 0
except Exception:
corpus_size = 0
# TODO: Get real coverage from Atheris
# For now use corpus_size as proxy
coverage_value = current_coverage if current_coverage > 0 else corpus_size
await stats_callback({
"total_execs": current_execs,
"execs_per_sec": execs_per_sec,
"crashes": current_crashes,
"corpus_size": corpus_size,
"coverage": coverage_value, # Using corpus as coverage proxy
"elapsed_time": int(elapsed)
})
# Start monitoring task
monitor_task = None
if stats_callback:
monitor_task = asyncio.create_task(monitor_stats())
try:
# Start subprocess
process.start()
logger.info(f"Fuzzing subprocess started (PID: {process.pid})")
# Wait for subprocess to complete
while process.is_alive():
await asyncio.sleep(0.1)
# NOTE: We cannot use result_queue because Atheris calls os._exit()
# which terminates immediately without putting results in the queue.
# Instead, we rely on shared memory (Manager().list() and Value counters).
# Read final values from shared memory
self.total_executions = exec_counter.value
total_crashes = crash_counter.value
# Read crash details from shared memory and convert to our format
self.crashes = []
for crash_data in shared_crashes:
# Reconstruct crash info with exception object
crash_info = {
"input": crash_data["input"],
"exception": Exception(crash_data["exception_message"]),
"exception_type": crash_data["exception_type"],
"stack_trace": crash_data["stack_trace"],
"execution": crash_data["execution"]
}
self.crashes.append(crash_info)
logger.warning(
f"Crash found (execution {crash_data['execution']}): "
f"{crash_data['exception_type']}: {crash_data['exception_message']}"
)
logger.info(f"Fuzzing completed: {self.total_executions} executions, {total_crashes} crashes found")
# Send final stats update
if stats_callback:
elapsed = time.time() - self.start_time
execs_per_sec = self.total_executions / elapsed if elapsed > 0 else 0
# Count final corpus size
try:
final_corpus_size = len(list(corpus_dir.iterdir())) if corpus_dir.exists() else 0
except Exception:
final_corpus_size = 0
# TODO: Parse coverage from Atheris output
# For now, use corpus size as proxy (corpus grows with coverage)
# libFuzzer writes coverage to stdout but sys.stdout redirection
# doesn't work because it writes to FD 1 directly from C++
final_coverage = coverage_counter.value if coverage_counter.value > 0 else final_corpus_size
await stats_callback({
"total_execs": self.total_executions,
"execs_per_sec": execs_per_sec,
"crashes": total_crashes,
"corpus_size": final_corpus_size,
"coverage": final_coverage,
"elapsed_time": int(elapsed)
})
# Wait for process to fully terminate
process.join(timeout=5)
if process.exitcode is not None and process.exitcode != 0:
logger.warning(f"Subprocess exited with code: {process.exitcode}")
except Exception as e:
logger.error(f"Fuzzing execution error: {e}")
if process.is_alive():
logger.warning("Terminating fuzzing subprocess...")
process.terminate()
process.join(timeout=5)
if process.is_alive():
process.kill()
raise
finally:
# Stop monitoring
if monitor_task:
monitor_task.cancel()
try:
await monitor_task
except asyncio.CancelledError:
pass
async def _generate_findings(self, target_path: Path) -> List[ModuleFinding]:
"""
Generate ModuleFinding objects from crashes.
Args:
target_path: Path to target file
Returns:
List of findings
"""
findings = []
for idx, crash in enumerate(self.crashes):
# Encode crash input for storage
crash_input_b64 = base64.b64encode(crash["input"]).decode()
finding = self.create_finding(
title=f"Crash: {crash['exception_type']}",
description=(
f"Atheris found crash during fuzzing:\n"
f"Exception: {crash['exception_type']}\n"
f"Message: {str(crash['exception'])}\n"
f"Execution: {crash['execution']}"
),
severity="critical",
category="crash",
file_path=str(target_path),
metadata={
"crash_input_base64": crash_input_b64,
"crash_input_hex": crash["input"].hex(),
"exception_type": crash["exception_type"],
"stack_trace": crash["stack_trace"],
"execution_number": crash["execution"]
},
recommendation=(
"Review the crash stack trace and input to identify the vulnerability. "
"The crash input is provided in base64 and hex formats for reproduction."
)
)
findings.append(finding)
# Report crash to backend for real-time monitoring
if self.run_id:
try:
crash_report = {
"run_id": self.run_id,
"crash_id": f"crash_{idx + 1}",
"timestamp": datetime.utcnow().isoformat(),
"crash_type": crash["exception_type"],
"stack_trace": crash["stack_trace"],
"input_file": crash_input_b64,
"severity": "critical",
"exploitability": "unknown"
}
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
await client.post(
f"{backend_url}/fuzzing/{self.run_id}/crash",
json=crash_report
)
logger.debug(f"Crash report sent to backend: {crash_report['crash_id']}")
except Exception as e:
logger.debug(f"Failed to post crash report to backend: {e}")
return findings
@@ -0,0 +1,455 @@
"""
Cargo Fuzzer Module
Reusable module for fuzzing Rust code using cargo-fuzz (libFuzzer).
Discovers and fuzzes user-provided Rust targets with fuzz_target!() macros.
"""
import asyncio
import logging
import os
import re
import time
from pathlib import Path
from typing import Dict, Any, List, Optional, Callable
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
class CargoFuzzer(BaseModule):
"""
Cargo-fuzz (libFuzzer) fuzzer module for Rust code.
Discovers fuzz targets in user's Rust project and runs cargo-fuzz
to find crashes, undefined behavior, and memory safety issues.
"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="cargo_fuzz",
version="0.11.2",
description="Fuzz Rust code using cargo-fuzz with libFuzzer backend",
author="FuzzForge Team",
category="fuzzer",
tags=["fuzzing", "rust", "cargo-fuzz", "libfuzzer", "memory-safety"],
input_schema={
"type": "object",
"properties": {
"target_name": {
"type": "string",
"description": "Fuzz target name (auto-discovered if not specified)"
},
"max_iterations": {
"type": "integer",
"default": 1000000,
"description": "Maximum fuzzing iterations"
},
"timeout_seconds": {
"type": "integer",
"default": 1800,
"description": "Fuzzing timeout in seconds"
},
"sanitizer": {
"type": "string",
"enum": ["address", "memory", "undefined"],
"default": "address",
"description": "Sanitizer to use (address, memory, undefined)"
}
}
},
output_schema={
"type": "object",
"properties": {
"findings": {
"type": "array",
"description": "Crashes and memory safety issues found"
},
"summary": {
"type": "object",
"description": "Fuzzing execution summary"
}
}
}
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate configuration"""
max_iterations = config.get("max_iterations", 1000000)
if not isinstance(max_iterations, int) or max_iterations < 1:
raise ValueError("max_iterations must be a positive integer")
timeout = config.get("timeout_seconds", 1800)
if not isinstance(timeout, int) or timeout < 1:
raise ValueError("timeout_seconds must be a positive integer")
sanitizer = config.get("sanitizer", "address")
if sanitizer not in ["address", "memory", "undefined"]:
raise ValueError("sanitizer must be one of: address, memory, undefined")
return True
async def execute(
self,
config: Dict[str, Any],
workspace: Path,
stats_callback: Optional[Callable] = None
) -> ModuleResult:
"""
Execute cargo-fuzz on user's Rust code.
Args:
config: Fuzzer configuration
workspace: Path to workspace directory containing Rust project
stats_callback: Optional callback for real-time stats updates
Returns:
ModuleResult containing findings and summary
"""
self.start_timer()
try:
# Validate inputs
self.validate_config(config)
self.validate_workspace(workspace)
logger.info(f"Running cargo-fuzz on {workspace}")
# Step 1: Discover fuzz targets
targets = await self._discover_fuzz_targets(workspace)
if not targets:
return self.create_result(
findings=[],
status="failed",
error="No fuzz targets found. Expected fuzz targets in fuzz/fuzz_targets/"
)
# Get target name from config or use first discovered target
target_name = config.get("target_name")
if not target_name:
target_name = targets[0]
logger.info(f"No target specified, using first discovered target: {target_name}")
elif target_name not in targets:
return self.create_result(
findings=[],
status="failed",
error=f"Target '{target_name}' not found. Available targets: {', '.join(targets)}"
)
# Step 2: Build fuzz target
logger.info(f"Building fuzz target: {target_name}")
build_success = await self._build_fuzz_target(workspace, target_name, config)
if not build_success:
return self.create_result(
findings=[],
status="failed",
error=f"Failed to build fuzz target: {target_name}"
)
# Step 3: Run fuzzing
logger.info(f"Starting fuzzing: {target_name}")
findings, stats = await self._run_fuzzing(
workspace,
target_name,
config,
stats_callback
)
# Step 4: Parse crash artifacts
crash_findings = await self._parse_crash_artifacts(workspace, target_name)
findings.extend(crash_findings)
logger.info(f"Fuzzing completed: {len(findings)} crashes found")
return self.create_result(
findings=findings,
status="success",
summary=stats
)
except Exception as e:
logger.error(f"Cargo fuzzer failed: {e}")
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
async def _discover_fuzz_targets(self, workspace: Path) -> List[str]:
"""
Discover fuzz targets in the project.
Looks for fuzz targets in fuzz/fuzz_targets/ directory.
"""
fuzz_targets_dir = workspace / "fuzz" / "fuzz_targets"
if not fuzz_targets_dir.exists():
logger.warning(f"No fuzz targets directory found: {fuzz_targets_dir}")
return []
targets = []
for file in fuzz_targets_dir.glob("*.rs"):
target_name = file.stem
targets.append(target_name)
logger.info(f"Discovered fuzz target: {target_name}")
return targets
async def _build_fuzz_target(
self,
workspace: Path,
target_name: str,
config: Dict[str, Any]
) -> bool:
"""Build the fuzz target with instrumentation"""
try:
sanitizer = config.get("sanitizer", "address")
# Build command
cmd = [
"cargo", "fuzz", "build",
target_name,
f"--sanitizer={sanitizer}"
]
logger.debug(f"Build command: {' '.join(cmd)}")
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=workspace,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
stdout, stderr = await proc.communicate()
if proc.returncode != 0:
logger.error(f"Build failed: {stderr.decode()}")
return False
logger.info("Build successful")
return True
except Exception as e:
logger.error(f"Build error: {e}")
return False
async def _run_fuzzing(
self,
workspace: Path,
target_name: str,
config: Dict[str, Any],
stats_callback: Optional[Callable]
) -> tuple[List[ModuleFinding], Dict[str, Any]]:
"""
Run cargo-fuzz and collect statistics.
Returns:
Tuple of (findings, stats_dict)
"""
max_iterations = config.get("max_iterations", 1000000)
timeout_seconds = config.get("timeout_seconds", 1800)
sanitizer = config.get("sanitizer", "address")
findings = []
stats = {
"total_executions": 0,
"crashes_found": 0,
"corpus_size": 0,
"coverage": 0.0,
"execution_time": 0.0
}
try:
# Cargo fuzz run command
cmd = [
"cargo", "fuzz", "run",
target_name,
f"--sanitizer={sanitizer}",
"--",
f"-runs={max_iterations}",
f"-max_total_time={timeout_seconds}"
]
logger.debug(f"Fuzz command: {' '.join(cmd)}")
start_time = time.time()
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=workspace,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT
)
# Monitor output and extract stats
last_stats_time = time.time()
async for line in proc.stdout:
line_str = line.decode('utf-8', errors='ignore').strip()
# Parse libFuzzer stats
# Example: "#12345 NEW cov: 123 ft: 456 corp: 10/234b"
stats_match = re.match(r'#(\d+)\s+.*cov:\s*(\d+).*corp:\s*(\d+)', line_str)
if stats_match:
execs = int(stats_match.group(1))
cov = int(stats_match.group(2))
corp = int(stats_match.group(3))
stats["total_executions"] = execs
stats["coverage"] = float(cov)
stats["corpus_size"] = corp
stats["execution_time"] = time.time() - start_time
# Invoke stats callback for real-time monitoring
if stats_callback and time.time() - last_stats_time >= 0.5:
await stats_callback({
"total_execs": execs,
"execs_per_sec": execs / stats["execution_time"] if stats["execution_time"] > 0 else 0,
"crashes": stats["crashes_found"],
"coverage": cov,
"corpus_size": corp,
"elapsed_time": int(stats["execution_time"])
})
last_stats_time = time.time()
# Detect crash line
if "SUMMARY:" in line_str or "ERROR:" in line_str:
logger.info(f"Detected crash: {line_str}")
stats["crashes_found"] += 1
await proc.wait()
stats["execution_time"] = time.time() - start_time
# Send final stats update
if stats_callback:
await stats_callback({
"total_execs": stats["total_executions"],
"execs_per_sec": stats["total_executions"] / stats["execution_time"] if stats["execution_time"] > 0 else 0,
"crashes": stats["crashes_found"],
"coverage": stats["coverage"],
"corpus_size": stats["corpus_size"],
"elapsed_time": int(stats["execution_time"])
})
logger.info(
f"Fuzzing completed: {stats['total_executions']} execs, "
f"{stats['crashes_found']} crashes"
)
except Exception as e:
logger.error(f"Fuzzing error: {e}")
return findings, stats
async def _parse_crash_artifacts(
self,
workspace: Path,
target_name: str
) -> List[ModuleFinding]:
"""
Parse crash artifacts from fuzz/artifacts directory.
Cargo-fuzz stores crashes in: fuzz/artifacts/<target_name>/
"""
findings = []
artifacts_dir = workspace / "fuzz" / "artifacts" / target_name
if not artifacts_dir.exists():
logger.info("No crash artifacts found")
return findings
# Find all crash files
for crash_file in artifacts_dir.glob("crash-*"):
try:
finding = await self._analyze_crash(workspace, target_name, crash_file)
if finding:
findings.append(finding)
except Exception as e:
logger.warning(f"Failed to analyze crash {crash_file}: {e}")
logger.info(f"Parsed {len(findings)} crash artifacts")
return findings
async def _analyze_crash(
self,
workspace: Path,
target_name: str,
crash_file: Path
) -> Optional[ModuleFinding]:
"""
Analyze a single crash file.
Runs cargo-fuzz with the crash input to reproduce and get stack trace.
"""
try:
# Read crash input
crash_input = crash_file.read_bytes()
# Reproduce crash to get stack trace
cmd = [
"cargo", "fuzz", "run",
target_name,
str(crash_file)
]
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=workspace,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT,
env={**os.environ, "RUST_BACKTRACE": "1"}
)
stdout, _ = await proc.communicate()
output = stdout.decode('utf-8', errors='ignore')
# Parse stack trace and error type
error_type = "Unknown Crash"
stack_trace = output
# Extract error type
if "SEGV" in output:
error_type = "Segmentation Fault"
severity = "critical"
elif "heap-use-after-free" in output:
error_type = "Use After Free"
severity = "critical"
elif "heap-buffer-overflow" in output:
error_type = "Heap Buffer Overflow"
severity = "critical"
elif "stack-buffer-overflow" in output:
error_type = "Stack Buffer Overflow"
severity = "high"
elif "panic" in output.lower():
error_type = "Panic"
severity = "medium"
else:
severity = "high"
# Create finding
finding = self.create_finding(
title=f"Crash: {error_type} in {target_name}",
description=f"Cargo-fuzz discovered a crash in target '{target_name}'. "
f"Error type: {error_type}. "
f"Input size: {len(crash_input)} bytes.",
severity=severity,
category="crash",
file_path=f"fuzz/fuzz_targets/{target_name}.rs",
code_snippet=stack_trace[:500],
recommendation="Review the crash details and fix the underlying bug. "
"Use AddressSanitizer to identify memory safety issues. "
"Consider adding bounds checks or using safer APIs.",
metadata={
"error_type": error_type,
"crash_file": crash_file.name,
"input_size": len(crash_input),
"reproducer": crash_file.name,
"stack_trace": stack_trace
}
)
return finding
except Exception as e:
logger.warning(f"Failed to analyze crash {crash_file}: {e}")
return None
@@ -17,7 +17,6 @@ import logging
from pathlib import Path
from typing import Dict, Any, List
from datetime import datetime
import json
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
@@ -16,16 +16,16 @@ File Scanner Module - Scans and enumerates files in the workspace
import logging
import mimetypes
from pathlib import Path
from typing import Dict, Any, List
from typing import Dict, Any
import hashlib
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
except ImportError:
try:
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
from modules.base import BaseModule, ModuleMetadata, ModuleResult
except ImportError:
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
logger = logging.getLogger(__name__)
@@ -0,0 +1,9 @@
"""
Atheris Fuzzing Workflow
Fuzzes user-provided Python code using Atheris.
"""
from .workflow import AtherisFuzzingWorkflow
__all__ = ["AtherisFuzzingWorkflow"]
@@ -0,0 +1,122 @@
"""
Atheris Fuzzing Workflow Activities
Activities specific to the Atheris fuzzing workflow.
"""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
import os
import httpx
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="fuzz_with_atheris")
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
"""
Fuzzing activity using the AtherisFuzzer module on user code.
This activity:
1. Imports the reusable AtherisFuzzer module
2. Sets up real-time stats callback
3. Executes fuzzing on user's TestOneInput() function
4. Returns findings as ModuleResult
Args:
workspace_path: Path to the workspace directory (user's uploaded code)
config: Fuzzer configuration (target_file, max_iterations, timeout_seconds)
Returns:
Fuzzer results dictionary (findings, summary, metadata)
"""
logger.info(f"Activity: fuzz_with_atheris (workspace={workspace_path})")
try:
# Import reusable AtherisFuzzer module
from modules.fuzzer import AtherisFuzzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
# Get activity info for real-time stats
info = activity.info()
run_id = info.workflow_id
# Define stats callback for real-time monitoring
async def stats_callback(stats_data: Dict[str, Any]):
"""Callback for live fuzzing statistics"""
try:
# Prepare stats payload for backend
coverage_value = stats_data.get("coverage", 0)
logger.info(f"COVERAGE_DEBUG: coverage from stats_data = {coverage_value}")
stats_payload = {
"run_id": run_id,
"workflow": "atheris_fuzzing",
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"unique_crashes": stats_data.get("crashes", 0),
"coverage": coverage_value,
"corpus_size": stats_data.get("corpus_size", 0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"last_crash_time": None
}
# POST stats to backend API for real-time monitoring
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
try:
await client.post(
f"{backend_url}/fuzzing/{run_id}/stats",
json=stats_payload
)
except Exception as http_err:
logger.debug(f"Failed to post stats to backend: {http_err}")
# Also log for debugging
logger.info("LIVE_STATS", extra={
"stats_type": "fuzzing_live_update",
"workflow_type": "atheris_fuzzing",
"run_id": run_id,
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"corpus_size": stats_data.get("corpus_size", 0),
"coverage": stats_data.get("coverage", 0.0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"timestamp": datetime.utcnow().isoformat()
})
except Exception as e:
logger.warning(f"Error in stats callback: {e}")
# Add stats callback and run_id to config
config["stats_callback"] = stats_callback
config["run_id"] = run_id
# Execute the fuzzer module
fuzzer = AtherisFuzzer()
result = await fuzzer.execute(config, workspace)
logger.info(
f"✓ Fuzzing completed: "
f"{result.summary.get('total_executions', 0)} executions, "
f"{result.summary.get('crashes_found', 0)} crashes"
)
return result.dict()
except Exception as e:
logger.error(f"Fuzzing failed: {e}", exc_info=True)
raise
@@ -0,0 +1,65 @@
name: atheris_fuzzing
version: "1.0.0"
vertical: python
description: "Fuzz Python code using Atheris with real-time monitoring. Automatically discovers and fuzzes TestOneInput() functions in user code."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "atheris"
- "python"
- "coverage"
- "security"
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
workspace_isolation: "isolated"
default_parameters:
target_file: null
max_iterations: 1000000
timeout_seconds: 1800
parameters:
type: object
properties:
target_file:
type: string
description: "Python file with TestOneInput() function (auto-discovered if not specified)"
max_iterations:
type: integer
default: 1000000
description: "Maximum fuzzing iterations"
timeout_seconds:
type: integer
default: 1800
description: "Fuzzing timeout in seconds (30 minutes)"
output_schema:
type: object
properties:
findings:
type: array
description: "Crashes and vulnerabilities found during fuzzing"
items:
type: object
properties:
title:
type: string
severity:
type: string
category:
type: string
metadata:
type: object
summary:
type: object
description: "Fuzzing execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
execution_time:
type: number
@@ -0,0 +1,175 @@
"""
Atheris Fuzzing Workflow - Temporal Version
Fuzzes user-provided Python code using Atheris with real-time monitoring.
"""
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class AtherisFuzzingWorkflow:
"""
Fuzz Python code using Atheris.
User workflow:
1. User runs: ff workflow run atheris_fuzzing .
2. CLI uploads project to MinIO
3. Worker downloads project
4. Worker fuzzes TestOneInput() function
5. Crashes reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # MinIO UUID of uploaded user code
target_file: Optional[str] = None, # Optional: specific file to fuzz
max_iterations: int = 1000000,
timeout_seconds: int = 1800 # 30 minutes default for fuzzing
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of the uploaded target in MinIO
target_file: Optional specific Python file with TestOneInput() (auto-discovered if None)
max_iterations: Maximum fuzzing iterations
timeout_seconds: Fuzzing timeout in seconds
Returns:
Dictionary containing findings and summary
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting AtherisFuzzingWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id}, "
f"target_file={target_file or 'auto-discover'}, max_iterations={max_iterations}, "
f"timeout_seconds={timeout_seconds})"
)
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation
run_id = workflow.info().run_id
# Step 1: Download user's project from MinIO
workflow.logger.info("Step 1: Downloading user code from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
# Step 2: Run Atheris fuzzing
workflow.logger.info("Step 2: Running Atheris fuzzing")
# Use defaults if parameters are None
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
fuzz_config = {
"target_file": target_file,
"max_iterations": actual_max_iterations,
"timeout_seconds": actual_timeout_seconds
}
fuzz_results = await workflow.execute_activity(
"fuzz_with_atheris",
args=[target_path, fuzz_config],
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 60),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
results["steps"].append({
"step": "fuzzing",
"status": "success",
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
})
workflow.logger.info(
f"✓ Fuzzing completed: "
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
)
# Step 3: Upload results to MinIO
workflow.logger.info("Step 3: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, fuzz_results, "json"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 4: Cleanup cache
workflow.logger.info("Step 4: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "isolated"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["findings"] = fuzz_results.get("findings", [])
results["summary"] = fuzz_results.get("summary", {})
results["sarif"] = fuzz_results.get("sarif") or {}
workflow.logger.info(
f"✓ Workflow completed successfully: {workflow_id} "
f"({results['summary'].get('crashes_found', 0)} crashes found)"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
@@ -0,0 +1,5 @@
"""Cargo Fuzzing Workflow"""
from .workflow import CargoFuzzingWorkflow
__all__ = ["CargoFuzzingWorkflow"]
@@ -0,0 +1,203 @@
"""
Cargo Fuzzing Workflow Activities
Activities specific to the cargo-fuzz fuzzing workflow.
"""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
import os
import httpx
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="fuzz_with_cargo")
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
"""
Fuzzing activity using the CargoFuzzer module on user code.
This activity:
1. Imports the reusable CargoFuzzer module
2. Sets up real-time stats callback
3. Executes fuzzing on user's fuzz_target!() functions
4. Returns findings as ModuleResult
Args:
workspace_path: Path to the workspace directory (user's uploaded Rust project)
config: Fuzzer configuration (target_name, max_iterations, timeout_seconds, sanitizer)
Returns:
Fuzzer results dictionary (findings, summary, metadata)
"""
logger.info(f"Activity: fuzz_with_cargo (workspace={workspace_path})")
try:
# Import reusable CargoFuzzer module
from modules.fuzzer import CargoFuzzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
# Get activity info for real-time stats
info = activity.info()
run_id = info.workflow_id
# Define stats callback for real-time monitoring
async def stats_callback(stats_data: Dict[str, Any]):
"""Callback for live fuzzing statistics"""
try:
# Prepare stats payload for backend
coverage_value = stats_data.get("coverage", 0)
stats_payload = {
"run_id": run_id,
"workflow": "cargo_fuzzing",
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"unique_crashes": stats_data.get("crashes", 0),
"coverage": coverage_value,
"corpus_size": stats_data.get("corpus_size", 0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"last_crash_time": None
}
# POST stats to backend API for real-time monitoring
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
try:
await client.post(
f"{backend_url}/fuzzing/{run_id}/stats",
json=stats_payload
)
except Exception as http_err:
logger.debug(f"Failed to post stats to backend: {http_err}")
# Also log for debugging
logger.info("LIVE_STATS", extra={
"stats_type": "fuzzing_live_update",
"workflow_type": "cargo_fuzzing",
"run_id": run_id,
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"corpus_size": stats_data.get("corpus_size", 0),
"coverage": stats_data.get("coverage", 0.0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"timestamp": datetime.utcnow().isoformat()
})
except Exception as e:
logger.error(f"Stats callback error: {e}")
# Initialize CargoFuzzer module
fuzzer = CargoFuzzer()
# Execute fuzzing with stats callback
module_result = await fuzzer.execute(
config=config,
workspace=workspace,
stats_callback=stats_callback
)
# Convert ModuleResult to dictionary
result_dict = {
"findings": [],
"summary": module_result.summary,
"metadata": module_result.metadata,
"status": module_result.status,
"error": module_result.error
}
# Convert findings to dict format
for finding in module_result.findings:
finding_dict = {
"id": finding.id,
"title": finding.title,
"description": finding.description,
"severity": finding.severity,
"category": finding.category,
"file_path": finding.file_path,
"line_start": finding.line_start,
"line_end": finding.line_end,
"code_snippet": finding.code_snippet,
"recommendation": finding.recommendation,
"metadata": finding.metadata
}
result_dict["findings"].append(finding_dict)
# Generate SARIF report from findings
if module_result.findings:
# Convert findings to SARIF format
severity_map = {
"critical": "error",
"high": "error",
"medium": "warning",
"low": "note",
"info": "note"
}
results = []
for finding in module_result.findings:
result = {
"ruleId": finding.metadata.get("rule_id", finding.category),
"level": severity_map.get(finding.severity, "warning"),
"message": {"text": finding.description},
"locations": []
}
if finding.file_path:
location = {
"physicalLocation": {
"artifactLocation": {"uri": finding.file_path},
"region": {
"startLine": finding.line_start or 1,
"endLine": finding.line_end or finding.line_start or 1
}
}
}
result["locations"].append(location)
results.append(result)
result_dict["sarif"] = {
"version": "2.1.0",
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"runs": [{
"tool": {
"driver": {
"name": "cargo-fuzz",
"version": "0.11.2"
}
},
"results": results
}]
}
else:
result_dict["sarif"] = {
"version": "2.1.0",
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"runs": []
}
logger.info(
f"Fuzzing activity completed: {len(module_result.findings)} crashes found, "
f"{module_result.summary.get('total_executions', 0)} executions"
)
return result_dict
except Exception as e:
logger.error(f"Fuzzing activity failed: {e}", exc_info=True)
raise
@@ -0,0 +1,71 @@
name: cargo_fuzzing
version: "1.0.0"
vertical: rust
description: "Fuzz Rust code using cargo-fuzz with real-time monitoring. Automatically discovers and fuzzes fuzz_target!() functions in user code."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "cargo-fuzz"
- "rust"
- "libfuzzer"
- "memory-safety"
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
workspace_isolation: "isolated"
default_parameters:
target_name: null
max_iterations: 1000000
timeout_seconds: 1800
sanitizer: "address"
parameters:
type: object
properties:
target_name:
type: string
description: "Fuzz target name from fuzz/fuzz_targets/ (auto-discovered if not specified)"
max_iterations:
type: integer
default: 1000000
description: "Maximum fuzzing iterations"
timeout_seconds:
type: integer
default: 1800
description: "Fuzzing timeout in seconds (30 minutes)"
sanitizer:
type: string
enum: ["address", "memory", "undefined"]
default: "address"
description: "Sanitizer to use (address, memory, undefined)"
output_schema:
type: object
properties:
findings:
type: array
description: "Crashes and memory safety issues found during fuzzing"
items:
type: object
properties:
title:
type: string
severity:
type: string
category:
type: string
metadata:
type: object
summary:
type: object
description: "Fuzzing execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
execution_time:
type: number
@@ -0,0 +1,180 @@
"""
Cargo Fuzzing Workflow - Temporal Version
Fuzzes user-provided Rust code using cargo-fuzz with real-time monitoring.
"""
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class CargoFuzzingWorkflow:
"""
Fuzz Rust code using cargo-fuzz (libFuzzer).
User workflow:
1. User runs: ff workflow run cargo_fuzzing .
2. CLI uploads Rust project to MinIO
3. Worker downloads project
4. Worker discovers fuzz targets in fuzz/fuzz_targets/
5. Worker fuzzes the target with cargo-fuzz
6. Crashes reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # MinIO UUID of uploaded user code
target_name: Optional[str] = None, # Optional: specific fuzz target name
max_iterations: int = 1000000,
timeout_seconds: int = 1800, # 30 minutes default for fuzzing
sanitizer: str = "address"
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of the uploaded target in MinIO
target_name: Optional specific fuzz target name (auto-discovered if None)
max_iterations: Maximum fuzzing iterations
timeout_seconds: Fuzzing timeout in seconds
sanitizer: Sanitizer to use (address, memory, undefined)
Returns:
Dictionary containing findings and summary
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting CargoFuzzingWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id}, "
f"target_name={target_name or 'auto-discover'}, max_iterations={max_iterations}, "
f"timeout_seconds={timeout_seconds}, sanitizer={sanitizer})"
)
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation
run_id = workflow.info().run_id
# Step 1: Download user's Rust project from MinIO
workflow.logger.info("Step 1: Downloading user code from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
# Step 2: Run cargo-fuzz
workflow.logger.info("Step 2: Running cargo-fuzz")
# Use defaults if parameters are None
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
actual_sanitizer = sanitizer if sanitizer is not None else "address"
fuzz_config = {
"target_name": target_name,
"max_iterations": actual_max_iterations,
"timeout_seconds": actual_timeout_seconds,
"sanitizer": actual_sanitizer
}
fuzz_results = await workflow.execute_activity(
"fuzz_with_cargo",
args=[target_path, fuzz_config],
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 120),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
results["steps"].append({
"step": "fuzzing",
"status": "success",
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
})
workflow.logger.info(
f"✓ Fuzzing completed: "
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
)
# Step 3: Upload results to MinIO
workflow.logger.info("Step 3: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, fuzz_results, "json"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 4: Cleanup cache
workflow.logger.info("Step 4: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "isolated"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["findings"] = fuzz_results.get("findings", [])
results["summary"] = fuzz_results.get("summary", {})
results["sarif"] = fuzz_results.get("sarif") or {}
workflow.logger.info(
f"✓ Workflow completed successfully: {workflow_id} "
f"({results['summary'].get('crashes_found', 0)} crashes found)"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
@@ -1,12 +0,0 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
@@ -1,47 +0,0 @@
# Secret Detection Workflow Dockerfile
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog (use direct binary download to avoid install script issues)
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
&& tar -xzf trufflehog.tar.gz \
&& mv trufflehog /usr/local/bin/ \
&& rm trufflehog.tar.gz
# Install Gitleaks (use specific version to avoid API rate limiting)
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_8.18.2_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create toolbox directory structure
RUN mkdir -p /opt/prefect/toolbox
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# The toolbox code will be mounted at runtime from the backend container
# This includes:
# - /opt/prefect/toolbox/modules/base.py
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
VOLUME /opt/prefect/toolbox
# Set working directory for execution
WORKDIR /opt/prefect
@@ -1,58 +0,0 @@
# Secret Detection Workflow Dockerfile - Self-Contained Version
# This version copies all required modules into the image for complete isolation
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
# Install Gitleaks
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
&& tar -xzf gitleaks_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create directory structure
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
/opt/prefect/toolbox/modules/reporter \
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
# Copy the base module and required modules
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
# Copy the workflow code
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
# Copy toolbox init files
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
# Install Python dependencies for the modules
RUN pip install --no-cache-dir \
pydantic \
asyncio
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# Set default command (can be overridden)
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]
@@ -1,130 +0,0 @@
# Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple industry-standard tools:
- **TruffleHog**: Comprehensive secret detection with verification capabilities
- **Gitleaks**: Git-specific secret scanning and leak detection
## Features
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
- **Deduplication**: Automatically removes duplicate findings across tools
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
- **Configurable**: Supports extensive configuration for both tools
## Dependencies
### Required Modules
- `toolbox.modules.secret_detection.trufflehog`
- `toolbox.modules.secret_detection.gitleaks`
- `toolbox.modules.reporter` (SARIF reporter)
- `toolbox.modules.base` (Base module interface)
### External Tools
- TruffleHog v3.63.2+
- Gitleaks v8.18.0+
## Docker Deployment
This workflow provides two Docker deployment approaches:
### 1. Volume-Based Approach (Default: `Dockerfile`)
**Advantages:**
- Live code updates without rebuilding images
- Smaller image sizes
- Consistent module versions across workflows
- Faster development iteration
**How it works:**
- Docker image contains only external tools (TruffleHog, Gitleaks)
- Python modules are mounted at runtime from the backend container
- Backend manages code synchronization via shared volumes
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
**Advantages:**
- Complete isolation and reproducibility
- No runtime dependencies on backend code
- Can run independently of FuzzForge platform
- Better for CI/CD integration
**How it works:**
- All required Python modules are copied into the Docker image
- Image is completely self-contained
- Larger image size but fully portable
## Configuration
### TruffleHog Configuration
```json
{
"trufflehog_config": {
"verify": true, // Verify discovered secrets
"concurrency": 10, // Number of concurrent workers
"max_depth": 10, // Maximum directory depth
"include_detectors": [], // Specific detectors to include
"exclude_detectors": [] // Specific detectors to exclude
}
}
```
### Gitleaks Configuration
```json
{
"gitleaks_config": {
"scan_mode": "detect", // "detect" or "protect"
"redact": true, // Redact secrets in output
"max_target_megabytes": 100, // Maximum file size (MB)
"no_git": false, // Scan without Git context
"config_file": "", // Custom Gitleaks config
"baseline_file": "" // Baseline file for known findings
}
}
```
## Usage Example
```bash
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/path/to/scan",
"volume_mode": "ro",
"parameters": {
"trufflehog_config": {
"verify": true,
"concurrency": 15
},
"gitleaks_config": {
"scan_mode": "detect",
"max_target_megabytes": 200
}
}
}'
```
## Output Format
The workflow generates a SARIF report containing:
- All unique findings from both tools
- Severity levels mapped to standard scale
- File locations and line numbers
- Detailed descriptions and recommendations
- Tool-specific metadata
## Performance Considerations
- **TruffleHog**: CPU-intensive with verification enabled
- **Gitleaks**: Memory-intensive for large repositories
- **Recommended Resources**: 512Mi memory, 500m CPU
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
## Security Notes
- Secrets are redacted in output by default
- Verified secrets are marked with higher severity
- Both tools support custom rules and exclusions
- Consider using baseline files for known false positives
@@ -1,17 +0,0 @@
"""
Secret Detection Scan Workflow
This package contains the comprehensive secret detection workflow that combines
multiple secret detection tools for thorough analysis.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
@@ -1,113 +0,0 @@
name: secret_detection_scan
version: "2.0.0"
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "secrets"
- "credentials"
- "detection"
- "trufflehog"
- "gitleaks"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "trufflehog"
- "gitleaks"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
trufflehog_config: {}
gitleaks_config: {}
reporter_config: {}
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
trufflehog_config:
type: object
description: "TruffleHog configuration"
properties:
verify:
type: boolean
description: "Verify discovered secrets"
concurrency:
type: integer
description: "Number of concurrent workers"
max_depth:
type: integer
description: "Maximum directory depth to scan"
include_detectors:
type: array
items:
type: string
description: "Specific detectors to include"
exclude_detectors:
type: array
items:
type: string
description: "Specific detectors to exclude"
gitleaks_config:
type: object
description: "Gitleaks configuration"
properties:
scan_mode:
type: string
enum: ["detect", "protect"]
description: "Scan mode"
redact:
type: boolean
description: "Redact secrets in output"
max_target_megabytes:
type: integer
description: "Maximum file size to scan (MB)"
no_git:
type: boolean
description: "Scan files without Git context"
config_file:
type: string
description: "Path to custom configuration file"
baseline_file:
type: string
description: "Path to baseline file"
reporter_config:
type: object
description: "SARIF reporter configuration"
properties:
output_file:
type: string
description: "Output SARIF file name"
include_code_flows:
type: boolean
description: "Include code flow information"
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted security findings"
@@ -1,290 +0,0 @@
"""
Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple tools:
- TruffleHog: Comprehensive secret detection with verification
- Gitleaks: Git-specific secret scanning
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from typing import Dict, Any, List, Optional
from prefect import flow, task
from prefect.artifacts import create_markdown_artifact, create_table_artifact
import asyncio
import json
# Add modules to path
sys.path.insert(0, '/app')
# Import modules
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
from toolbox.modules.reporter import SARIFReporter
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="trufflehog_scan")
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run TruffleHog secret detection.
Args:
workspace: Path to the workspace
config: TruffleHog configuration
Returns:
TruffleHog results
"""
logger.info("Running TruffleHog secret detection")
module = TruffleHogModule()
result = await module.execute(config, workspace)
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
return result.dict()
@task(name="gitleaks_scan")
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run Gitleaks secret detection.
Args:
workspace: Path to the workspace
config: Gitleaks configuration
Returns:
Gitleaks results
"""
logger.info("Running Gitleaks secret detection")
module = GitleaksModule()
result = await module.execute(config, workspace)
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
return result.dict()
@task(name="aggregate_findings")
async def aggregate_findings_task(
trufflehog_results: Dict[str, Any],
gitleaks_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to aggregate findings from all secret detection tools.
Args:
trufflehog_results: Results from TruffleHog
gitleaks_results: Results from Gitleaks
config: Reporter configuration
workspace: Path to workspace
Returns:
Aggregated SARIF report
"""
logger.info("Aggregating secret detection findings")
# Combine all findings
all_findings = []
# Add TruffleHog findings
trufflehog_findings = trufflehog_results.get("findings", [])
all_findings.extend(trufflehog_findings)
# Add Gitleaks findings
gitleaks_findings = gitleaks_results.get("findings", [])
all_findings.extend(gitleaks_findings)
# Deduplicate findings based on file path and line number
unique_findings = []
seen_signatures = set()
for finding in all_findings:
# Create signature for deduplication
signature = (
finding.get("file_path", ""),
finding.get("line_start", 0),
finding.get("title", "").lower()[:50] # First 50 chars of title
)
if signature not in seen_signatures:
seen_signatures.add(signature)
unique_findings.append(finding)
else:
logger.debug(f"Deduplicated finding: {signature}")
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
# Generate SARIF report
reporter = SARIFReporter()
reporter_config = {
**config,
"findings": unique_findings,
"tool_name": "FuzzForge Secret Detection",
"tool_version": "1.0.0",
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
}
result = await reporter.execute(reporter_config, workspace)
return result.dict().get("sarif", {})
@flow(name="secret_detection_scan", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
trufflehog_config: Optional[Dict[str, Any]] = None,
gitleaks_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main secret detection workflow.
This workflow:
1. Runs TruffleHog for comprehensive secret detection
2. Runs Gitleaks for Git-specific secret detection
3. Aggregates and deduplicates findings
4. Generates a unified SARIF report
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
trufflehog_config: Configuration for TruffleHog
gitleaks_config: Configuration for Gitleaks
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
"""
logger.info("Starting comprehensive secret detection workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
# Default configurations - merge with provided configs to ensure defaults are always applied
default_trufflehog_config = {
"verify": False,
"concurrency": 10,
"max_depth": 10,
"no_git": True # Add no_git for filesystem scanning
}
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
default_gitleaks_config = {
"scan_mode": "detect",
"redact": True,
"max_target_megabytes": 100,
"no_git": True # Critical for non-git directories
}
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
default_reporter_config = {
"include_code_flows": False
}
reporter_config = {**default_reporter_config, **(reporter_config or {})}
try:
# Run secret detection tools in parallel
logger.info("Phase 1: Running secret detection tools")
# Create tasks for parallel execution
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
# Wait for both to complete
trufflehog_results, gitleaks_results = await asyncio.gather(
trufflehog_task_result,
gitleaks_task_result,
return_exceptions=True
)
# Handle any exceptions
if isinstance(trufflehog_results, Exception):
logger.error(f"TruffleHog failed: {trufflehog_results}")
trufflehog_results = {"findings": [], "status": "failed"}
if isinstance(gitleaks_results, Exception):
logger.error(f"Gitleaks failed: {gitleaks_results}")
gitleaks_results = {"findings": [], "status": "failed"}
# Aggregate findings
logger.info("Phase 2: Aggregating findings")
sarif_report = await aggregate_findings_task(
trufflehog_results,
gitleaks_results,
reporter_config,
workspace
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
# Log tool-specific stats
trufflehog_count = len(trufflehog_results.get("findings", []))
gitleaks_count = len(gitleaks_results.get("findings", []))
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
else:
logger.info("Workflow completed successfully with no findings")
return sarif_report
except Exception as e:
logger.error(f"Secret detection workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Secret Detection",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
}
if __name__ == "__main__":
# For local testing
import asyncio
asyncio.run(main_flow(
target_path="/tmp/test",
trufflehog_config={"verify": True, "max_depth": 5},
gitleaks_config={"scan_mode": "detect"}
))
@@ -0,0 +1,113 @@
name: ossfuzz_campaign
version: "1.0.0"
vertical: ossfuzz
description: "Generic OSS-Fuzz fuzzing campaign. Automatically reads project configuration from OSS-Fuzz repo and runs fuzzing using Google's infrastructure."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "oss-fuzz"
- "libfuzzer"
- "afl"
- "honggfuzz"
- "memory-safety"
- "security"
# Workspace isolation mode
# OSS-Fuzz campaigns use isolated mode for safe concurrent campaigns
workspace_isolation: "isolated"
default_parameters:
project_name: null
campaign_duration_hours: 1
override_engine: null
override_sanitizer: null
max_iterations: null
parameters:
type: object
required:
- project_name
properties:
project_name:
type: string
description: "OSS-Fuzz project name (e.g., 'curl', 'sqlite3', 'libxml2')"
examples:
- "curl"
- "sqlite3"
- "libxml2"
- "openssl"
- "zlib"
campaign_duration_hours:
type: integer
default: 1
minimum: 1
maximum: 168 # 1 week max
description: "How many hours to run the fuzzing campaign"
override_engine:
type: string
enum: ["libfuzzer", "afl", "honggfuzz"]
description: "Override fuzzing engine from project.yaml (optional)"
override_sanitizer:
type: string
enum: ["address", "memory", "undefined", "dataflow"]
description: "Override sanitizer from project.yaml (optional)"
max_iterations:
type: integer
minimum: 1000
description: "Optional limit on fuzzing iterations (optional)"
output_schema:
type: object
properties:
project_name:
type: string
description: "OSS-Fuzz project that was fuzzed"
summary:
type: object
description: "Campaign execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
unique_crashes:
type: integer
duration_hours:
type: number
engine_used:
type: string
sanitizer_used:
type: string
crashes:
type: array
description: "List of crash file paths"
items:
type: string
sarif:
type: object
description: "SARIF-formatted crash reports (future)"
examples:
- name: "Fuzz curl for 1 hour"
parameters:
project_name: "curl"
campaign_duration_hours: 1
- name: "Fuzz sqlite3 with AFL"
parameters:
project_name: "sqlite3"
campaign_duration_hours: 2
override_engine: "afl"
- name: "Fuzz libxml2 with memory sanitizer"
parameters:
project_name: "libxml2"
campaign_duration_hours: 6
override_sanitizer: "memory"
@@ -0,0 +1,219 @@
"""
OSS-Fuzz Campaign Workflow - Temporal Version
Generic workflow for running OSS-Fuzz campaigns using Google's infrastructure.
Automatically reads project configuration from OSS-Fuzz project.yaml files.
"""
import asyncio
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class OssfuzzCampaignWorkflow:
"""
Generic OSS-Fuzz fuzzing campaign workflow.
User workflow:
1. User runs: ff workflow run ossfuzz_campaign . project_name=curl
2. Worker loads project config from OSS-Fuzz repo
3. Worker builds project using OSS-Fuzz's build system
4. Worker runs fuzzing with engines from project.yaml
5. Crashes and corpus reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # Required by FuzzForge (not used, OSS-Fuzz downloads from Google)
project_name: str, # Required: OSS-Fuzz project name (e.g., "curl", "sqlite3")
campaign_duration_hours: int = 1,
override_engine: Optional[str] = None, # Override engine from project.yaml
override_sanitizer: Optional[str] = None, # Override sanitizer from project.yaml
max_iterations: Optional[int] = None # Optional: limit fuzzing iterations
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of uploaded target (not used, required by FuzzForge)
project_name: Name of OSS-Fuzz project (e.g., "curl", "sqlite3", "libxml2")
campaign_duration_hours: How many hours to fuzz (default: 1)
override_engine: Override fuzzing engine from project.yaml
override_sanitizer: Override sanitizer from project.yaml
max_iterations: Optional limit on fuzzing iterations
Returns:
Dictionary containing crashes, stats, and SARIF report
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting OSS-Fuzz Campaign for project '{project_name}' "
f"(workflow_id={workflow_id}, duration={campaign_duration_hours}h)"
)
results = {
"workflow_id": workflow_id,
"project_name": project_name,
"status": "running",
"steps": []
}
try:
# Step 1: Load OSS-Fuzz project configuration
workflow.logger.info(f"Step 1: Loading project config for '{project_name}'")
project_config = await workflow.execute_activity(
"load_ossfuzz_project",
args=[project_name],
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "load_config",
"status": "success",
"language": project_config.get("language"),
"engines": project_config.get("fuzzing_engines", []),
"sanitizers": project_config.get("sanitizers", [])
})
workflow.logger.info(
f"✓ Loaded config: language={project_config.get('language')}, "
f"engines={project_config.get('fuzzing_engines')}"
)
# Step 2: Build project using OSS-Fuzz infrastructure
workflow.logger.info(f"Step 2: Building project '{project_name}'")
build_result = await workflow.execute_activity(
"build_ossfuzz_project",
args=[
project_name,
project_config,
override_sanitizer,
override_engine
],
start_to_close_timeout=timedelta(minutes=30),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "build_project",
"status": "success",
"fuzz_targets": len(build_result.get("fuzz_targets", [])),
"sanitizer": build_result.get("sanitizer_used"),
"engine": build_result.get("engine_used")
})
workflow.logger.info(
f"✓ Build completed: {len(build_result.get('fuzz_targets', []))} fuzz targets found"
)
if not build_result.get("fuzz_targets"):
raise Exception(f"No fuzz targets found for project {project_name}")
# Step 3: Run fuzzing on discovered targets
workflow.logger.info(f"Step 3: Fuzzing {len(build_result['fuzz_targets'])} targets")
# Determine which engine to use
engine_to_use = override_engine if override_engine else build_result["engine_used"]
duration_seconds = campaign_duration_hours * 3600
# Fuzz each target (in parallel if multiple targets)
fuzz_futures = []
for target_path in build_result["fuzz_targets"]:
future = workflow.execute_activity(
"fuzz_target",
args=[target_path, engine_to_use, duration_seconds, None, None],
start_to_close_timeout=timedelta(seconds=duration_seconds + 300),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
fuzz_futures.append(future)
# Wait for all fuzzing to complete
fuzz_results = await asyncio.gather(*fuzz_futures, return_exceptions=True)
# Aggregate results
total_execs = 0
total_crashes = 0
all_crashes = []
for i, result in enumerate(fuzz_results):
if isinstance(result, Exception):
workflow.logger.error(f"Fuzzing failed for target {i}: {result}")
continue
total_execs += result.get("total_executions", 0)
total_crashes += result.get("crashes", 0)
all_crashes.extend(result.get("crash_files", []))
results["steps"].append({
"step": "fuzzing",
"status": "success",
"total_executions": total_execs,
"crashes_found": total_crashes,
"targets_fuzzed": len(build_result["fuzz_targets"])
})
workflow.logger.info(
f"✓ Fuzzing completed: {total_execs} executions, {total_crashes} crashes"
)
# Step 4: Generate SARIF report
workflow.logger.info("Step 4: Generating SARIF report")
# TODO: Implement crash minimization and SARIF generation
# For now, return raw results
results["status"] = "success"
results["summary"] = {
"project": project_name,
"total_executions": total_execs,
"crashes_found": total_crashes,
"unique_crashes": len(set(all_crashes)),
"duration_hours": campaign_duration_hours,
"engine_used": engine_to_use,
"sanitizer_used": build_result.get("sanitizer_used")
}
results["crashes"] = all_crashes[:100] # Limit to first 100 crashes
workflow.logger.info(
f"✓ Campaign completed: {project_name} - "
f"{total_execs} execs, {total_crashes} crashes"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
-187
View File
@@ -1,187 +0,0 @@
"""
Manual Workflow Registry for Prefect Deployment
This file contains the manual registry of all workflows that can be deployed.
Developers MUST add their workflows here after creating them.
This approach is required because:
1. Prefect cannot deploy dynamically imported flows
2. Docker deployment needs static flow references
3. Explicit registration provides better control and visibility
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from typing import Dict, Any, Callable
import logging
logger = logging.getLogger(__name__)
# Import only essential workflows
# Import each workflow individually to handle failures gracefully
security_assessment_flow = None
secret_detection_flow = None
# Try to import each workflow individually
try:
from .security_assessment.workflow import main_flow as security_assessment_flow
except ImportError as e:
logger.warning(f"Failed to import security_assessment workflow: {e}")
try:
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
except ImportError as e:
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
# Manual registry - developers add workflows here after creation
# Only include workflows that were successfully imported
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
# Add workflows that were successfully imported
if security_assessment_flow is not None:
WORKFLOW_REGISTRY["security_assessment"] = {
"flow": security_assessment_flow,
"module_path": "toolbox.workflows.security_assessment.workflow",
"function_name": "main_flow",
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
}
if secret_detection_flow is not None:
WORKFLOW_REGISTRY["secret_detection_scan"] = {
"flow": secret_detection_flow,
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
"function_name": "main_flow",
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
}
#
# To add a new workflow, follow this pattern:
#
# "my_new_workflow": {
# "flow": my_new_flow_function, # Import the flow function above
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
# "function_name": "my_new_flow_function",
# "description": "Description of what this workflow does",
# "version": "1.0.0",
# "author": "Developer Name",
# "tags": ["tag1", "tag2"]
# }
def get_workflow_flow(workflow_name: str) -> Callable:
"""
Get the flow function for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Flow function
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}. "
f"Please add the workflow to toolbox/workflows/registry.py"
)
return WORKFLOW_REGISTRY[workflow_name]["flow"]
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information dictionary
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}"
)
return WORKFLOW_REGISTRY[workflow_name]
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
"""
Get all registered workflows.
Returns:
Dictionary of all workflow registry entries
"""
return WORKFLOW_REGISTRY.copy()
def validate_registry() -> bool:
"""
Validate the workflow registry for consistency.
Returns:
True if valid, raises exceptions if not
Raises:
ValueError: If registry is invalid
"""
if not WORKFLOW_REGISTRY:
raise ValueError("Workflow registry is empty")
required_fields = ["flow", "module_path", "function_name", "description"]
for name, entry in WORKFLOW_REGISTRY.items():
# Check required fields
missing_fields = [field for field in required_fields if field not in entry]
if missing_fields:
raise ValueError(
f"Workflow '{name}' missing required fields: {missing_fields}"
)
# Check if flow is callable
if not callable(entry["flow"]):
raise ValueError(f"Workflow '{name}' flow is not callable")
# Check if flow has the required Prefect attributes
if not hasattr(entry["flow"], "deploy"):
raise ValueError(
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
)
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
return True
# Validate registry on import
try:
validate_registry()
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
except Exception as e:
logger.error(f"Workflow registry validation failed: {e}")
raise
@@ -1,30 +0,0 @@
FROM prefecthq/prefect:3-python3.11
WORKDIR /app
# Create toolbox directory structure to match expected import paths
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
# Copy base module infrastructure
COPY modules/__init__.py /app/toolbox/modules/
COPY modules/base.py /app/toolbox/modules/
# Copy only required modules (manual selection)
COPY modules/scanner /app/toolbox/modules/scanner
COPY modules/analyzer /app/toolbox/modules/analyzer
COPY modules/reporter /app/toolbox/modules/reporter
# Copy this workflow
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
# Install workflow-specific requirements if they exist
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
# Install common requirements
RUN pip install --no-cache-dir pyyaml
# Set Python path
ENV PYTHONPATH=/app:$PYTHONPATH
# Create workspace directory
RUN mkdir -p /workspace
@@ -0,0 +1,150 @@
"""
Security Assessment Workflow Activities
Activities specific to the security assessment workflow:
- scan_files_activity: Scan files in the workspace
- analyze_security_activity: Analyze security vulnerabilities
- generate_sarif_report_activity: Generate SARIF report from findings
"""
import logging
import sys
from pathlib import Path
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="scan_files")
async def scan_files_activity(workspace_path: str, config: dict) -> dict:
"""
Scan files in the workspace.
Args:
workspace_path: Path to the workspace directory
config: Scanner configuration
Returns:
Scanner results dictionary
"""
logger.info(f"Activity: scan_files (workspace={workspace_path})")
try:
from modules.scanner import FileScanner
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
scanner = FileScanner()
result = await scanner.execute(config, workspace)
logger.info(
f"✓ File scanning completed: "
f"{result.summary.get('total_files', 0)} files scanned"
)
return result.dict()
except Exception as e:
logger.error(f"File scanning failed: {e}", exc_info=True)
raise
@activity.defn(name="analyze_security")
async def analyze_security_activity(workspace_path: str, config: dict) -> dict:
"""
Analyze security vulnerabilities in the workspace.
Args:
workspace_path: Path to the workspace directory
config: Analyzer configuration
Returns:
Analysis results dictionary
"""
logger.info(f"Activity: analyze_security (workspace={workspace_path})")
try:
from modules.analyzer import SecurityAnalyzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
analyzer = SecurityAnalyzer()
result = await analyzer.execute(config, workspace)
logger.info(
f"✓ Security analysis completed: "
f"{result.summary.get('total_findings', 0)} findings"
)
return result.dict()
except Exception as e:
logger.error(f"Security analysis failed: {e}", exc_info=True)
raise
@activity.defn(name="generate_sarif_report")
async def generate_sarif_report_activity(
scan_results: dict,
analysis_results: dict,
config: dict,
workspace_path: str
) -> dict:
"""
Generate SARIF report from scan and analysis results.
Args:
scan_results: Results from file scanner
analysis_results: Results from security analyzer
config: Reporter configuration
workspace_path: Path to the workspace
Returns:
SARIF report dictionary
"""
logger.info("Activity: generate_sarif_report")
try:
from modules.reporter import SARIFReporter
workspace = Path(workspace_path)
# Combine findings from all modules
all_findings = []
# Add scanner findings (only sensitive files, not all files)
scanner_findings = scan_results.get("findings", [])
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
all_findings.extend(sensitive_findings)
# Add analyzer findings
analyzer_findings = analysis_results.get("findings", [])
all_findings.extend(analyzer_findings)
# Prepare reporter config
reporter_config = {
**config,
"findings": all_findings,
"tool_name": "FuzzForge Security Assessment",
"tool_version": "1.0.0"
}
reporter = SARIFReporter()
result = await reporter.execute(reporter_config, workspace)
# Extract SARIF from result
sarif = result.dict().get("sarif", {})
logger.info(f"✓ SARIF report generated with {len(all_findings)} findings")
return sarif
except Exception as e:
logger.error(f"SARIF report generation failed: {e}", exc_info=True)
raise
@@ -1,8 +1,8 @@
name: security_assessment
version: "2.0.0"
vertical: rust
description: "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "security"
- "scanner"
@@ -11,28 +11,14 @@ tags:
- "sarif"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "file_scanner"
- "security_analyzer"
- "sarif_reporter"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
# Using "shared" mode for read-only security analysis (no file modifications)
workspace_isolation: "shared"
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
scanner_config: {}
analyzer_config: {}
reporter_config: {}
@@ -40,15 +26,6 @@ default_parameters:
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
scanner_config:
type: object
description: "File scanner configuration"
@@ -1,4 +0,0 @@
# Requirements for security assessment workflow
pydantic>=2.0.0
pyyaml>=6.0
aiofiles>=23.0.0
@@ -1,5 +1,7 @@
"""
Security Assessment Workflow - Comprehensive security analysis using multiple modules
Security Assessment Workflow - Temporal Version
Comprehensive security analysis using multiple modules.
"""
# Copyright (c) 2025 FuzzingLabs
@@ -13,240 +15,219 @@ Security Assessment Workflow - Comprehensive security analysis using multiple mo
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from datetime import timedelta
from typing import Dict, Any, Optional
from prefect import flow, task
import json
# Add modules to path
sys.path.insert(0, '/app')
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import modules
from toolbox.modules.scanner import FileScanner
from toolbox.modules.analyzer import SecurityAnalyzer
from toolbox.modules.reporter import SARIFReporter
# Import activity interfaces (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="file_scanning")
async def scan_files_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
@workflow.defn
class SecurityAssessmentWorkflow:
"""
Task to scan files in the workspace.
Args:
workspace: Path to the workspace
config: Scanner configuration
Returns:
Scanner results
"""
logger.info(f"Starting file scanning in {workspace}")
scanner = FileScanner()
result = await scanner.execute(config, workspace)
logger.info(f"File scanning completed: {result.summary.get('total_files', 0)} files found")
return result.dict()
@task(name="security_analysis")
async def analyze_security_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to analyze security vulnerabilities.
Args:
workspace: Path to the workspace
config: Analyzer configuration
Returns:
Analysis results
"""
logger.info("Starting security analysis")
analyzer = SecurityAnalyzer()
result = await analyzer.execute(config, workspace)
logger.info(
f"Security analysis completed: {result.summary.get('total_findings', 0)} findings"
)
return result.dict()
@task(name="report_generation")
async def generate_report_task(
scan_results: Dict[str, Any],
analysis_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to generate SARIF report from all findings.
Args:
scan_results: Results from scanner
analysis_results: Results from analyzer
config: Reporter configuration
workspace: Path to the workspace
Returns:
SARIF report
"""
logger.info("Generating SARIF report")
reporter = SARIFReporter()
# Combine findings from all modules
all_findings = []
# Add scanner findings (only sensitive files, not all files)
scanner_findings = scan_results.get("findings", [])
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
all_findings.extend(sensitive_findings)
# Add analyzer findings
analyzer_findings = analysis_results.get("findings", [])
all_findings.extend(analyzer_findings)
# Prepare reporter config
reporter_config = {
**config,
"findings": all_findings,
"tool_name": "FuzzForge Security Assessment",
"tool_version": "1.0.0"
}
result = await reporter.execute(reporter_config, workspace)
# Extract SARIF from result
sarif = result.dict().get("sarif", {})
logger.info(f"Report generated with {len(all_findings)} total findings")
return sarif
@flow(name="security_assessment", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
scanner_config: Optional[Dict[str, Any]] = None,
analyzer_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main security assessment workflow.
Comprehensive security assessment workflow.
This workflow:
1. Scans files in the workspace
2. Analyzes code for security vulnerabilities
3. Generates a SARIF report with all findings
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
scanner_config: Configuration for file scanner
analyzer_config: Configuration for security analyzer
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
1. Downloads target from MinIO
2. Scans files in the workspace
3. Analyzes code for security vulnerabilities
4. Generates a SARIF report with all findings
5. Uploads results to MinIO
6. Cleans up cache
"""
logger.info(f"Starting security assessment workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
@workflow.run
async def run(
self,
target_id: str,
scanner_config: Optional[Dict[str, Any]] = None,
analyzer_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main workflow execution.
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
Args:
target_id: UUID of the uploaded target in MinIO
scanner_config: Configuration for file scanner
analyzer_config: Configuration for security analyzer
reporter_config: Configuration for SARIF reporter
# Default configurations
if not scanner_config:
scanner_config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False,
"max_file_size": 10485760 # 10MB
}
Returns:
Dictionary containing SARIF report and summary
"""
workflow_id = workflow.info().workflow_id
if not analyzer_config:
analyzer_config = {
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
if not reporter_config:
reporter_config = {
"include_code_flows": False
}
try:
# Execute workflow tasks
logger.info("Phase 1: File scanning")
scan_results = await scan_files_task(workspace, scanner_config)
logger.info("Phase 2: Security analysis")
analysis_results = await analyze_security_task(workspace, analyzer_config)
logger.info("Phase 3: Report generation")
sarif_report = await generate_report_task(
scan_results,
analysis_results,
reporter_config,
workspace
workflow.logger.info(
f"Starting SecurityAssessmentWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id})"
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} findings")
else:
logger.info("Workflow completed successfully")
# Default configurations
if not scanner_config:
scanner_config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False,
"max_file_size": 10485760 # 10MB
}
return sarif_report
if not analyzer_config:
analyzer_config = {
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
except Exception as e:
logger.error(f"Workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Security Assessment",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
if not reporter_config:
reporter_config = {
"include_code_flows": False
}
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation (using shared mode for read-only analysis)
run_id = workflow.info().run_id
if __name__ == "__main__":
# For local testing
import asyncio
# Step 1: Download target from MinIO
workflow.logger.info("Step 1: Downloading target from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "shared"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ Target downloaded to: {target_path}")
asyncio.run(main_flow(
target_path="/tmp/test",
scanner_config={"patterns": ["*.py"]},
analyzer_config={"check_secrets": True}
))
# Step 2: File scanning
workflow.logger.info("Step 2: Scanning files")
scan_results = await workflow.execute_activity(
"scan_files",
args=[target_path, scanner_config],
start_to_close_timeout=timedelta(minutes=10),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "file_scanning",
"status": "success",
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
})
workflow.logger.info(
f"✓ File scanning completed: "
f"{scan_results.get('summary', {}).get('total_files', 0)} files"
)
# Step 3: Security analysis
workflow.logger.info("Step 3: Analyzing security vulnerabilities")
analysis_results = await workflow.execute_activity(
"analyze_security",
args=[target_path, analyzer_config],
start_to_close_timeout=timedelta(minutes=15),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "security_analysis",
"status": "success",
"findings": analysis_results.get("summary", {}).get("total_findings", 0)
})
workflow.logger.info(
f"✓ Security analysis completed: "
f"{analysis_results.get('summary', {}).get('total_findings', 0)} findings"
)
# Step 4: Generate SARIF report
workflow.logger.info("Step 4: Generating SARIF report")
sarif_report = await workflow.execute_activity(
"generate_sarif_report",
args=[scan_results, analysis_results, reporter_config, target_path],
start_to_close_timeout=timedelta(minutes=5)
)
results["steps"].append({
"step": "report_generation",
"status": "success"
})
# Count total findings in SARIF
total_findings = 0
if sarif_report and "runs" in sarif_report:
total_findings = len(sarif_report["runs"][0].get("results", []))
workflow.logger.info(f"✓ SARIF report generated with {total_findings} findings")
# Step 5: Upload results to MinIO
workflow.logger.info("Step 5: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, sarif_report, "sarif"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 6: Cleanup cache
workflow.logger.info("Step 6: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "shared"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up (skipped for shared mode)")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["sarif"] = sarif_report
results["summary"] = {
"total_findings": total_findings,
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
}
workflow.logger.info(f"✓ Workflow completed successfully: {workflow_id}")
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
+310 -926
View File
File diff suppressed because it is too large Load Diff
+67 -2
View File
@@ -153,10 +153,10 @@ fuzzforge workflows parameters security_assessment --no-interactive
### Workflow Execution
#### `fuzzforge workflow <workflow> <target-path>`
Execute a security testing workflow.
Execute a security testing workflow with **automatic file upload**.
```bash
# Basic execution
# Basic execution - CLI automatically detects local files and uploads them
fuzzforge workflow security_assessment /path/to/code
# With parameters
@@ -172,6 +172,49 @@ fuzzforge workflow security_assessment /path/to/code \
fuzzforge workflow security_assessment /path/to/code --wait
```
**Automatic File Upload Behavior:**
The CLI intelligently handles target files based on whether they exist locally:
1. **Local file/directory exists****Automatic upload to MinIO**:
- CLI creates a compressed tarball (`.tar.gz`) for directories
- Uploads via HTTP to backend API
- Backend stores in MinIO with unique `target_id`
- Worker downloads from MinIO when ready to analyze
-**Works from any machine** (no shared filesystem needed)
2. **Path doesn't exist locally****Path-based submission** (legacy):
- Path is sent to backend as-is
- Backend expects target to be accessible on its filesystem
- ⚠️ Only works when CLI and backend share filesystem
**Example workflow:**
```bash
$ ff workflow security_assessment ./my-project
🔧 Getting workflow information for: security_assessment
📦 Detected local directory: ./my-project (21 files)
🗜️ Creating compressed tarball...
📤 Uploading to backend (0.01 MB)...
✅ Upload complete! Target ID: 548193a1-f73f-4ec1-8068-19ec2660b8e4
🎯 Executing workflow:
Workflow: security_assessment
Target: my-project.tar.gz (uploaded)
Volume Mode: ro
Status: 🔄 RUNNING
✅ Workflow started successfully!
Execution ID: security_assessment-52781925
```
**Upload Details:**
- **Max file size**: 10 GB (configurable on backend)
- **Compression**: Automatic for directories (reduces upload time)
- **Storage**: Files stored in MinIO (S3-compatible)
- **Lifecycle**: Automatic cleanup after 7 days
- **Caching**: Workers cache downloaded targets for faster repeated workflows
**Options:**
- `--param, -p` - Parameter in key=value format (can be used multiple times)
- `--param-file, -f` - JSON file containing parameters
@@ -181,6 +224,22 @@ fuzzforge workflow security_assessment /path/to/code --wait
- `--wait, -w` - Wait for execution to complete
- `--live, -l` - Show live monitoring during execution
**Worker Lifecycle Options (v0.7.0):**
- `--auto-start/--no-auto-start` - Auto-start required worker (default: from config)
- `--auto-stop/--no-auto-stop` - Auto-stop worker after completion (default: from config)
**Examples:**
```bash
# Worker starts automatically (default behavior)
fuzzforge workflow ossfuzz_campaign . project_name=zlib
# Disable auto-start (worker must be running already)
fuzzforge workflow ossfuzz_campaign . --no-auto-start
# Auto-stop worker after completion
fuzzforge workflow ossfuzz_campaign . --wait --auto-stop
```
#### `fuzzforge workflow status [execution-id]`
Check the status of a workflow execution.
@@ -402,6 +461,12 @@ preferences:
show_progress_bars: true
table_style: "rich"
color_output: true
workers:
auto_start_workers: true # Auto-start workers when needed
auto_stop_workers: false # Auto-stop workers after completion
worker_startup_timeout: 60 # Worker startup timeout (seconds)
docker_compose_file: null # Custom docker-compose.yml path
```
## 🔧 Advanced Usage
+2 -2
View File
@@ -207,7 +207,7 @@ def install_zsh_completion():
# Add fpath to .zshrc if not present
zshrc = Path.home() / ".zshrc"
fpath_line = f'fpath=(~/.zsh/completions $fpath)'
fpath_line = 'fpath=(~/.zsh/completions $fpath)'
autoload_line = 'autoload -U compinit && compinit'
if zshrc.exists():
@@ -222,7 +222,7 @@ def install_zsh_completion():
if lines_to_add:
with zshrc.open("a") as f:
f.write(f"\n# FuzzForge CLI completion\n")
f.write("\n# FuzzForge CLI completion\n")
for line in lines_to_add:
f.write(f"{line}\n")
print("✅ Added completion setup to ~/.zshrc")
-1
View File
@@ -15,7 +15,6 @@ This module provides the main entry point for the FuzzForge CLI application.
# Additional attribution and requirements are provided in the NOTICE file.
import typer
from src.fuzzforge_cli.main import app
if __name__ == "__main__":
+2 -3
View File
@@ -14,10 +14,10 @@ API response validation and graceful degradation utilities.
import logging
from typing import Any, Dict, List, Optional, Union
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, ValidationError as PydanticValidationError
from .exceptions import ValidationError, APIConnectionError
from .exceptions import ValidationError
logger = logging.getLogger(__name__)
@@ -29,7 +29,6 @@ class WorkflowMetadata(BaseModel):
author: Optional[str] = None
description: Optional[str] = None
parameters: Dict[str, Any] = {}
supported_volume_modes: List[str] = ["ro", "rw"]
class RunStatus(BaseModel):
-4
View File
@@ -15,15 +15,11 @@ from __future__ import annotations
import asyncio
import os
from datetime import datetime
from typing import Optional
import typer
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from ..config import ProjectConfigManager
console = Console()
app = typer.Typer(name="ai", help="Interact with the FuzzForge AI system")
+2 -5
View File
@@ -18,13 +18,11 @@ from pathlib import Path
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.prompt import Prompt, Confirm
from rich.prompt import Confirm
from rich import box
from typing import Optional
from ..config import (
get_project_config,
ensure_project_config,
get_global_config,
save_global_config,
FuzzForgeConfig
@@ -335,7 +333,6 @@ def edit_config(
"""
📝 Open configuration file in default editor
"""
import os
import subprocess
if global_config:
@@ -369,7 +366,7 @@ def edit_config(
try:
console.print(f"📝 Opening {config_type} configuration in {editor}...")
subprocess.run([editor, str(config_path)], check=True)
console.print(f"✅ Configuration file edited", style="green")
console.print("✅ Configuration file edited", style="green")
except subprocess.CalledProcessError as e:
console.print(f"❌ Failed to open editor: {e}", style="red")
+10 -11
View File
@@ -21,18 +21,17 @@ from typing import Optional, Dict, Any, List
import typer
from rich.console import Console
from rich.table import Table, Column
from rich.table import Table
from rich.panel import Panel
from rich.syntax import Syntax
from rich.tree import Tree
from rich.text import Text
from rich import box
from ..config import get_project_config, FuzzForgeConfig
from ..database import get_project_db, ensure_project_db, FindingRecord
from ..exceptions import (
handle_error, retry_on_network_error, validate_run_id,
require_project, ValidationError, DatabaseError
retry_on_network_error, validate_run_id,
require_project, ValidationError
)
from fuzzforge_sdk import FuzzForgeClient
@@ -159,7 +158,7 @@ def display_findings_table(sarif_data: Dict[str, Any]):
driver = tool.get("driver", {})
# Tool information
console.print(f"\n🔍 [bold]Security Analysis Results[/bold]")
console.print("\n🔍 [bold]Security Analysis Results[/bold]")
if driver.get("name"):
console.print(f"Tool: {driver.get('name')} v{driver.get('version', 'unknown')}")
@@ -241,7 +240,7 @@ def display_findings_table(sarif_data: Dict[str, Any]):
location_text
)
console.print(f"\n📋 [bold]Detailed Results[/bold]")
console.print("\n📋 [bold]Detailed Results[/bold]")
if len(results) > 50:
console.print(f"Showing first 50 of {len(results)} results")
console.print()
@@ -297,7 +296,7 @@ def findings_history(
console.print(f"\n📚 [bold]Findings History ({len(findings)})[/bold]\n")
console.print(table)
console.print(f"\n💡 Use [bold cyan]fuzzforge finding <run-id>[/bold cyan] to view detailed findings")
console.print("\n💡 Use [bold cyan]fuzzforge finding <run-id>[/bold cyan] to view detailed findings")
except Exception as e:
console.print(f"❌ Failed to get findings history: {e}", style="red")
@@ -710,10 +709,10 @@ def all_findings(
if show_findings:
display_detailed_findings(findings, max_findings)
console.print(f"\n💡 Use filters to refine results: --workflow, --severity, --since")
console.print(f"💡 Show findings content: --show-findings")
console.print(f"💡 Export findings: --export json --output report.json")
console.print(f"💡 View specific findings: [bold cyan]fuzzforge finding <run-id>[/bold cyan]")
console.print("\n💡 Use filters to refine results: --workflow, --severity, --since")
console.print("💡 Show findings content: --show-findings")
console.print("💡 Export findings: --export json --output report.json")
console.print("💡 View specific findings: [bold cyan]fuzzforge finding <run-id>[/bold cyan]")
except Exception as e:
console.print(f"❌ Failed to get all findings: {e}", style="red")
+1 -1
View File
@@ -164,7 +164,7 @@ fuzzforge finding <run-id>
console.print("📚 Created README.md")
console.print("\n✅ FuzzForge project initialized successfully!", style="green")
console.print(f"\n🎯 Next steps:")
console.print("\n🎯 Next steps:")
console.print(" • ff workflows - See available workflows")
console.print(" • ff status - Check API connectivity")
console.print(" • ff workflow <workflow> <path> - Start your first analysis")
+87 -95
View File
@@ -13,23 +13,18 @@ Real-time monitoring and statistics commands.
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import time
from datetime import datetime, timedelta
from typing import Optional
from datetime import datetime
import typer
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.live import Live
from rich.layout import Layout
from rich.progress import Progress, BarColumn, TextColumn, SpinnerColumn
from rich.align import Align
from rich import box
from ..config import get_project_config, FuzzForgeConfig
from ..database import get_project_db, ensure_project_db, CrashRecord
from ..database import ensure_project_db, CrashRecord
from fuzzforge_sdk import FuzzForgeClient
console = Console()
@@ -93,9 +88,21 @@ def fuzzing_stats(
with Live(auto_refresh=False, console=console) as live:
while True:
try:
# Check workflow status
run_status = client.get_run_status(run_id)
stats = client.get_fuzzing_stats(run_id)
table = create_stats_table(stats)
live.update(table, refresh=True)
# Exit if workflow completed or failed
if getattr(run_status, 'is_completed', False) or getattr(run_status, 'is_failed', False):
final_status = getattr(run_status, 'status', 'Unknown')
if getattr(run_status, 'is_completed', False):
console.print("\n✅ [bold green]Workflow completed[/bold green]", style="green")
else:
console.print(f"\n⚠️ [bold yellow]Workflow ended[/bold yellow] | Status: {final_status}", style="yellow")
break
time.sleep(refresh)
except KeyboardInterrupt:
console.print("\n📊 Monitoring stopped", style="yellow")
@@ -124,8 +131,8 @@ def create_stats_table(stats) -> Panel:
stats_table.add_row("Total Crashes", format_number(stats.crashes))
stats_table.add_row("Unique Crashes", format_number(stats.unique_crashes))
if stats.coverage is not None:
stats_table.add_row("Code Coverage", f"{stats.coverage:.1f}%")
if stats.coverage is not None and stats.coverage > 0:
stats_table.add_row("Code Coverage", f"{stats.coverage} edges")
stats_table.add_row("Corpus Size", format_number(stats.corpus_size))
stats_table.add_row("Elapsed Time", format_duration(stats.elapsed_time))
@@ -206,7 +213,7 @@ def crash_reports(
console.print(
Panel.fit(
summary_table,
title=f"🐛 Crash Summary",
title="🐛 Crash Summary",
box=box.ROUNDED
)
)
@@ -246,7 +253,7 @@ def crash_reports(
input_display
)
console.print(f"\n🐛 [bold]Crash Details[/bold]")
console.print("\n🐛 [bold]Crash Details[/bold]")
if len(crashes) > limit:
console.print(f"Showing first {limit} of {len(crashes)} crashes")
console.print()
@@ -260,78 +267,70 @@ def crash_reports(
def _live_monitor(run_id: str, refresh: int):
"""Helper for live monitoring to allow for cleaner exit handling"""
"""Helper for live monitoring with inline real-time display"""
with get_client() as client:
start_time = time.time()
def render_layout(run_status, stats):
layout = Layout()
layout.split_column(
Layout(name="header", size=3),
Layout(name="main", ratio=1),
Layout(name="footer", size=3)
)
layout["main"].split_row(
Layout(name="stats", ratio=1),
Layout(name="progress", ratio=1)
)
header = Panel(
f"[bold]FuzzForge Live Monitor[/bold]\n"
f"Run: {run_id[:12]}... | Status: {run_status.status} | "
f"Uptime: {format_duration(int(time.time() - start_time))}",
box=box.ROUNDED,
style="cyan"
)
layout["header"].update(header)
layout["stats"].update(create_stats_table(stats))
def render_inline_stats(run_status, stats):
"""Render inline stats display (non-dashboard)"""
lines = []
progress_table = Table(show_header=False, box=box.SIMPLE)
progress_table.add_column("Metric", style="bold")
progress_table.add_column("Progress")
if stats.executions > 0:
exec_rate_percent = min(100, (stats.executions_per_sec / 1000) * 100)
progress_table.add_row("Exec Rate", create_progress_bar(exec_rate_percent, "green"))
crash_rate = (stats.crashes / stats.executions) * 100000
crash_rate_percent = min(100, crash_rate * 10)
progress_table.add_row("Crash Rate", create_progress_bar(crash_rate_percent, "red"))
if stats.coverage is not None:
progress_table.add_row("Coverage", create_progress_bar(stats.coverage, "blue"))
layout["progress"].update(Panel.fit(progress_table, title="📊 Progress Indicators", box=box.ROUNDED))
# Header line
workflow_name = getattr(stats, 'workflow', 'unknown')
status_emoji = "🔄" if not getattr(run_status, 'is_completed', False) else ""
status_color = "yellow" if not getattr(run_status, 'is_completed', False) else "green"
footer = Panel(
f"Last updated: {datetime.now().strftime('%H:%M:%S')} | "
f"Refresh interval: {refresh}s | Press Ctrl+C to exit",
box=box.ROUNDED,
style="dim"
)
layout["footer"].update(footer)
return layout
lines.append(f"\n[bold cyan]📊 Live Fuzzing Monitor[/bold cyan] - {workflow_name} (Run: {run_id[:12]}...)\n")
with Live(auto_refresh=False, console=console, screen=True) as live:
# Stats lines with emojis
lines.append(f" [bold]⚡ Executions[/bold] {format_number(stats.executions):>8} [dim]({stats.executions_per_sec:,.1f}/sec)[/dim]")
lines.append(f" [bold]💥 Crashes[/bold] {stats.crashes:>8} [dim](unique: {stats.unique_crashes})[/dim]")
lines.append(f" [bold]📦 Corpus[/bold] {stats.corpus_size:>8} inputs")
if stats.coverage is not None and stats.coverage > 0:
lines.append(f" [bold]📈 Coverage[/bold] {stats.coverage:>8} edges")
lines.append(f" [bold]⏱️ Elapsed[/bold] {format_duration(stats.elapsed_time):>8}")
# Last crash info
if stats.last_crash_time:
time_since = datetime.now() - stats.last_crash_time
crash_ago = format_duration(int(time_since.total_seconds()))
lines.append(f" [bold red]🐛 Last Crash[/bold red] {crash_ago:>8} ago")
# Status line
status_text = getattr(run_status, 'status', 'Unknown')
current_time = datetime.now().strftime('%H:%M:%S')
lines.append(f"\n[{status_color}]{status_emoji} Status: {status_text}[/{status_color}] | Last update: [dim]{current_time}[/dim] | Refresh: {refresh}s | [dim]Press Ctrl+C to stop[/dim]")
return "\n".join(lines)
# Fallback stats class
class FallbackStats:
def __init__(self, run_id):
self.run_id = run_id
self.workflow = "unknown"
self.executions = 0
self.executions_per_sec = 0.0
self.crashes = 0
self.unique_crashes = 0
self.coverage = None
self.corpus_size = 0
self.elapsed_time = 0
self.last_crash_time = None
with Live(auto_refresh=False, console=console) as live:
# Initial fetch
try:
run_status = client.get_run_status(run_id)
stats = client.get_fuzzing_stats(run_id)
except Exception:
# Minimal fallback stats
class FallbackStats:
def __init__(self, run_id):
self.run_id = run_id
self.workflow = "unknown"
self.executions = 0
self.executions_per_sec = 0.0
self.crashes = 0
self.unique_crashes = 0
self.coverage = None
self.corpus_size = 0
self.elapsed_time = 0
self.last_crash_time = None
stats = FallbackStats(run_id)
run_status = type("RS", (), {"status":"Unknown","is_completed":False,"is_failed":False})()
live.update(render_layout(run_status, stats), refresh=True)
live.update(render_inline_stats(run_status, stats), refresh=True)
# Simple polling approach that actually works
# Polling loop
consecutive_errors = 0
max_errors = 5
@@ -344,7 +343,7 @@ def _live_monitor(run_id: str, refresh: int):
except Exception as e:
consecutive_errors += 1
if consecutive_errors >= max_errors:
console.print(f"❌ Too many errors getting run status: {e}", style="red")
console.print(f"\n❌ Too many errors getting run status: {e}", style="red")
break
time.sleep(refresh)
continue
@@ -352,18 +351,14 @@ def _live_monitor(run_id: str, refresh: int):
# Try to get fuzzing stats
try:
stats = client.get_fuzzing_stats(run_id)
except Exception as e:
# Create fallback stats if not available
except Exception:
stats = FallbackStats(run_id)
# Update display
live.update(render_layout(run_status, stats), refresh=True)
live.update(render_inline_stats(run_status, stats), refresh=True)
# Check if completed
if getattr(run_status, 'is_completed', False) or getattr(run_status, 'is_failed', False):
# Show final state for a few seconds
console.print("\n🏁 Run completed. Showing final state for 10 seconds...")
time.sleep(10)
break
# Wait before next poll
@@ -372,17 +367,17 @@ def _live_monitor(run_id: str, refresh: int):
except KeyboardInterrupt:
raise
except Exception as e:
console.print(f"⚠️ Monitoring error: {e}", style="yellow")
console.print(f"\n⚠️ Monitoring error: {e}", style="yellow")
time.sleep(refresh)
# Completed status update
final_message = (
f"[bold]FuzzForge Live Monitor - COMPLETED[/bold]\n"
f"Run: {run_id[:12]}... | Status: {run_status.status} | "
f"Total runtime: {format_duration(int(time.time() - start_time))}"
)
style = "green" if getattr(run_status, 'is_completed', False) else "red"
live.update(Panel(final_message, box=box.ROUNDED, style=style), refresh=True)
# Final status
final_status = getattr(run_status, 'status', 'Unknown')
total_time = format_duration(int(time.time() - start_time))
if getattr(run_status, 'is_completed', False):
console.print(f"\n✅ [bold green]Run completed successfully[/bold green] | Total runtime: {total_time}")
else:
console.print(f"\n⚠️ [bold yellow]Run ended[/bold yellow] | Status: {final_status} | Total runtime: {total_time}")
@app.command("live")
@@ -390,21 +385,18 @@ def live_monitor(
run_id: str = typer.Argument(..., help="Run ID to monitor live"),
refresh: int = typer.Option(
2, "--refresh", "-r",
help="Refresh interval in seconds (fallback when streaming unavailable)"
help="Refresh interval in seconds"
)
):
"""
📺 Real-time monitoring dashboard with live updates (WebSocket/SSE with REST fallback)
📺 Real-time inline monitoring with live statistics updates
"""
console.print(f"📺 [bold]Live Monitoring Dashboard[/bold]")
console.print(f"Run: {run_id}")
console.print(f"Press Ctrl+C to stop monitoring\n")
try:
_live_monitor(run_id, refresh)
except KeyboardInterrupt:
console.print("\n📊 Monitoring stopped by user.", style="yellow")
console.print("\n\n📊 Monitoring stopped by user.", style="yellow")
except Exception as e:
console.print(f"❌ Failed to start live monitoring: {e}", style="red")
console.print(f"\n❌ Failed to start live monitoring: {e}", style="red")
raise typer.Exit(1)
@@ -426,11 +418,11 @@ def monitor_callback(ctx: typer.Context):
# Let the subcommand handle it
return
# Show not implemented message for default command
# Show help message for default command
from rich.console import Console
console = Console()
console.print("🚧 [yellow]Monitor command is not fully implemented yet.[/yellow]")
console.print("Please use specific subcommands:")
console.print("📊 [bold cyan]Monitor Command[/bold cyan]")
console.print("\nAvailable subcommands:")
console.print(" • [cyan]ff monitor stats <run-id>[/cyan] - Show execution statistics")
console.print(" • [cyan]ff monitor crashes <run-id>[/cyan] - Show crash reports")
console.print(" • [cyan]ff monitor live <run-id>[/cyan] - Live monitoring dashboard")
console.print(" • [cyan]ff monitor live <run-id>[/cyan] - Real-time inline monitoring")
+1 -1
View File
@@ -115,7 +115,7 @@ def show_status():
api_table.add_column("Property", style="bold cyan")
api_table.add_column("Value")
api_table.add_row("Status", f"✅ Connected")
api_table.add_row("Status", "✅ Connected")
api_table.add_row("Service", f"{api_status.name} v{api_status.version}")
api_table.add_row("Workflows", str(len(workflows)))
+250 -59
View File
@@ -24,27 +24,25 @@ import typer
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
from rich.prompt import Prompt, Confirm
from rich.live import Live
from rich import box
from ..config import get_project_config, FuzzForgeConfig
from ..database import get_project_db, ensure_project_db, RunRecord
from ..exceptions import (
handle_error, retry_on_network_error, safe_json_load, require_project,
APIConnectionError, ValidationError, DatabaseError, FileOperationError
ValidationError, DatabaseError
)
from ..validation import (
validate_run_id, validate_workflow_name, validate_target_path,
validate_volume_mode, validate_parameters, validate_timeout
validate_parameters, validate_timeout
)
from ..progress import progress_manager, spinner, step_progress
from ..completion import WorkflowNameComplete, TargetPathComplete, VolumeModetComplete
from ..progress import step_progress
from ..constants import (
STATUS_EMOJIS, MAX_RUN_ID_DISPLAY_LENGTH, DEFAULT_VOLUME_MODE,
PROGRESS_STEP_DELAYS, MAX_RETRIES, RETRY_DELAY, POLL_INTERVAL
)
from ..worker_manager import WorkerManager
from fuzzforge_sdk import FuzzForgeClient, WorkflowSubmission
console = Console()
@@ -63,6 +61,47 @@ def status_emoji(status: str) -> str:
return STATUS_EMOJIS.get(status.lower(), STATUS_EMOJIS["unknown"])
def should_fail_build(sarif_data: Dict[str, Any], fail_on: str) -> bool:
"""
Check if findings warrant build failure based on SARIF severity levels.
Args:
sarif_data: SARIF format findings data
fail_on: Comma-separated SARIF levels (error,warning,note,info,all,none)
Returns:
True if build should fail, False otherwise
"""
if fail_on == "none":
return False
# Parse fail_on parameter - accept SARIF levels
if fail_on == "all":
check_levels = {"error", "warning", "note", "info"}
else:
check_levels = {s.strip().lower() for s in fail_on.split(",")}
# Validate levels
valid_levels = {"error", "warning", "note", "info", "none"}
invalid = check_levels - valid_levels
if invalid:
console.print(f"⚠️ Invalid SARIF levels: {', '.join(invalid)}", style="yellow")
console.print("Valid levels: error, warning, note, info, all, none")
# Check SARIF results
runs = sarif_data.get("runs", [])
if not runs:
return False
results = runs[0].get("results", [])
for result in results:
level = result.get("level", "note") # SARIF default is "note"
if level in check_levels:
return True
return False
def parse_inline_parameters(params: List[str]) -> Dict[str, Any]:
"""Parse inline key=value parameters using improved validation"""
return validate_parameters(params)
@@ -77,17 +116,15 @@ def execute_workflow_submission(
timeout: Optional[int],
interactive: bool
) -> Any:
"""Handle the workflow submission process"""
"""Handle the workflow submission process with file upload"""
# Get workflow metadata for parameter validation
console.print(f"🔧 Getting workflow information for: {workflow}")
workflow_meta = client.get_workflow_metadata(workflow)
param_response = client.get_workflow_parameters(workflow)
# Interactive parameter input
if interactive and workflow_meta.parameters.get("properties"):
properties = workflow_meta.parameters.get("properties", {})
required_params = set(workflow_meta.parameters.get("required", []))
defaults = param_response.defaults
missing_required = required_params - set(parameters.keys())
@@ -123,24 +160,10 @@ def execute_workflow_submission(
except ValueError as e:
console.print(f"❌ Invalid {param_type}: {e}", style="red")
# Validate volume mode
validate_volume_mode(volume_mode)
if volume_mode not in workflow_meta.supported_volume_modes:
raise ValidationError(
"volume mode", volume_mode,
f"one of: {', '.join(workflow_meta.supported_volume_modes)}"
)
# Create submission
submission = WorkflowSubmission(
target_path=target_path,
volume_mode=volume_mode,
parameters=parameters,
timeout=timeout
)
# Note: volume_mode is no longer used (Temporal uses MinIO storage)
# Show submission summary
console.print(f"\n🎯 [bold]Executing workflow:[/bold]")
console.print("\n🎯 [bold]Executing workflow:[/bold]")
console.print(f" Workflow: {workflow}")
console.print(f" Target: {target_path}")
console.print(f" Volume Mode: {volume_mode}")
@@ -149,6 +172,22 @@ def execute_workflow_submission(
if timeout:
console.print(f" Timeout: {timeout}s")
# Check if target path exists locally
target_path_obj = Path(target_path)
use_upload = target_path_obj.exists()
if use_upload:
# Show file/directory info
if target_path_obj.is_dir():
num_files = sum(1 for _ in target_path_obj.rglob("*") if _.is_file())
console.print(f" Upload: Directory with {num_files} files")
else:
size_mb = target_path_obj.stat().st_size / (1024 * 1024)
console.print(f" Upload: File ({size_mb:.2f} MB)")
else:
console.print(" [yellow]⚠️ Warning: Target path does not exist locally[/yellow]")
console.print(" [yellow] Attempting to use path-based submission (backend must have access)[/yellow]")
# Only ask for confirmation in interactive mode
if interactive:
if not Confirm.ask("\nExecute workflow?", default=True, console=console):
@@ -160,32 +199,74 @@ def execute_workflow_submission(
# Submit the workflow with enhanced progress
console.print(f"\n🚀 Executing workflow: [bold yellow]{workflow}[/bold yellow]")
steps = [
"Validating workflow configuration",
"Connecting to FuzzForge API",
"Uploading parameters and settings",
"Creating workflow deployment",
"Initializing execution environment"
]
if use_upload:
# Use new upload-based submission
steps = [
"Validating workflow configuration",
"Creating tarball (if directory)",
"Uploading target to backend",
"Starting workflow execution",
"Initializing execution environment"
]
with step_progress(steps, f"Executing {workflow}") as progress:
progress.next_step() # Validating
time.sleep(PROGRESS_STEP_DELAYS["validating"])
with step_progress(steps, f"Executing {workflow}") as progress:
progress.next_step() # Validating
time.sleep(PROGRESS_STEP_DELAYS["validating"])
progress.next_step() # Connecting
time.sleep(PROGRESS_STEP_DELAYS["connecting"])
progress.next_step() # Creating tarball
time.sleep(PROGRESS_STEP_DELAYS["connecting"])
progress.next_step() # Uploading
response = client.submit_workflow(workflow, submission)
time.sleep(PROGRESS_STEP_DELAYS["uploading"])
progress.next_step() # Uploading
# Use the new upload method
response = client.submit_workflow_with_upload(
workflow_name=workflow,
target_path=target_path,
parameters=parameters,
timeout=timeout
)
time.sleep(PROGRESS_STEP_DELAYS["uploading"])
progress.next_step() # Creating deployment
time.sleep(PROGRESS_STEP_DELAYS["creating"])
progress.next_step() # Starting
time.sleep(PROGRESS_STEP_DELAYS["creating"])
progress.next_step() # Initializing
time.sleep(PROGRESS_STEP_DELAYS["initializing"])
progress.next_step() # Initializing
time.sleep(PROGRESS_STEP_DELAYS["initializing"])
progress.complete(f"Workflow started successfully!")
progress.complete("Workflow started successfully!")
else:
# Fall back to path-based submission (for backward compatibility)
steps = [
"Validating workflow configuration",
"Connecting to FuzzForge API",
"Submitting workflow parameters",
"Creating workflow deployment",
"Initializing execution environment"
]
with step_progress(steps, f"Executing {workflow}") as progress:
progress.next_step() # Validating
time.sleep(PROGRESS_STEP_DELAYS["validating"])
progress.next_step() # Connecting
time.sleep(PROGRESS_STEP_DELAYS["connecting"])
progress.next_step() # Submitting
submission = WorkflowSubmission(
target_path=target_path,
volume_mode=volume_mode,
parameters=parameters,
timeout=timeout
)
response = client.submit_workflow(workflow, submission)
time.sleep(PROGRESS_STEP_DELAYS["uploading"])
progress.next_step() # Creating deployment
time.sleep(PROGRESS_STEP_DELAYS["creating"])
progress.next_step() # Initializing
time.sleep(PROGRESS_STEP_DELAYS["initializing"])
progress.complete("Workflow started successfully!")
return response
@@ -219,6 +300,22 @@ def execute_workflow(
live: bool = typer.Option(
False, "--live", "-l",
help="Start live monitoring after execution (useful for fuzzing workflows)"
),
auto_start: Optional[bool] = typer.Option(
None, "--auto-start/--no-auto-start",
help="Automatically start required worker if not running (default: from config)"
),
auto_stop: Optional[bool] = typer.Option(
None, "--auto-stop/--no-auto-stop",
help="Automatically stop worker after execution completes (default: from config)"
),
fail_on: Optional[str] = typer.Option(
None, "--fail-on",
help="Fail build if findings match severity (critical,high,medium,low,all,none). Use with --wait"
),
export_sarif: Optional[str] = typer.Option(
None, "--export-sarif",
help="Export SARIF results to file after completion. Use with --wait"
)
):
"""
@@ -226,6 +323,8 @@ def execute_workflow(
Use --live for fuzzing workflows to see real-time progress.
Use --wait to wait for completion without live dashboard.
Use --fail-on with --wait to fail CI builds based on finding severity.
Use --export-sarif with --wait to export SARIF findings to a file.
"""
try:
# Validate inputs
@@ -261,14 +360,60 @@ def execute_workflow(
except Exception as e:
handle_error(e, "parsing parameters")
# Get config for worker management settings
config = get_project_config() or FuzzForgeConfig()
should_auto_start = auto_start if auto_start is not None else config.workers.auto_start_workers
should_auto_stop = auto_stop if auto_stop is not None else config.workers.auto_stop_workers
worker_container = None # Track for cleanup
worker_mgr = None
wait_completed = False # Track if wait completed successfully
try:
with get_client() as client:
# Get worker information for this workflow
try:
console.print(f"🔍 Checking worker requirements for: {workflow}")
worker_info = client.get_workflow_worker_info(workflow)
# Initialize worker manager
compose_file = config.workers.docker_compose_file
worker_mgr = WorkerManager(
compose_file=Path(compose_file) if compose_file else None,
startup_timeout=config.workers.worker_startup_timeout
)
# Ensure worker is running
worker_container = worker_info["worker_container"]
if not worker_mgr.ensure_worker_running(worker_info, auto_start=should_auto_start):
console.print(
f"❌ Worker not available: {worker_info['vertical']}",
style="red"
)
console.print(
f"💡 Start the worker manually: docker-compose start {worker_container}"
)
raise typer.Exit(1)
except typer.Exit:
raise # Re-raise Exit to preserve exit code
except Exception as e:
# If we can't get worker info, warn but continue (might be old backend)
console.print(
f"⚠️ Could not check worker requirements: {e}",
style="yellow"
)
console.print(
" Continuing without worker management...",
style="yellow"
)
response = execute_workflow_submission(
client, workflow, target_path, parameters,
volume_mode, timeout, interactive
)
console.print(f"✅ Workflow execution started!", style="green")
console.print("✅ Workflow execution started!", style="green")
console.print(f" Execution ID: [bold cyan]{response.run_id}[/bold cyan]")
console.print(f" Status: {status_emoji(response.status)} {response.status}")
@@ -288,22 +433,22 @@ def execute_workflow(
# Don't fail the whole operation if database save fails
console.print(f"⚠️ Failed to save execution to database: {e}", style="yellow")
console.print(f"\n💡 Monitor progress: [bold cyan]fuzzforge monitor {response.run_id}[/bold cyan]")
console.print(f"\n💡 Monitor progress: [bold cyan]fuzzforge monitor stats {response.run_id}[/bold cyan]")
console.print(f"💡 Check status: [bold cyan]fuzzforge workflow status {response.run_id}[/bold cyan]")
# Suggest --live for fuzzing workflows
if not live and not wait and "fuzzing" in workflow.lower():
console.print(f"💡 Next time try: [bold cyan]fuzzforge workflow {workflow} {target_path} --live[/bold cyan] for real-time fuzzing dashboard", style="dim")
console.print(f"💡 Next time try: [bold cyan]fuzzforge workflow {workflow} {target_path} --live[/bold cyan] for real-time monitoring", style="dim")
# Start live monitoring if requested
if live:
# Check if this is a fuzzing workflow to show appropriate messaging
is_fuzzing = "fuzzing" in workflow.lower()
if is_fuzzing:
console.print(f"\n📺 Starting live fuzzing dashboard...")
console.print("\n📺 Starting live fuzzing monitor...")
console.print("💡 You'll see real-time crash discovery, execution stats, and coverage data.")
else:
console.print(f"\n📺 Starting live monitoring dashboard...")
console.print("\n📺 Starting live monitoring...")
console.print("Press Ctrl+C to stop monitoring (execution continues in background).\n")
@@ -312,14 +457,14 @@ def execute_workflow(
# Import monitor command and run it
live_monitor(response.run_id, refresh=3)
except KeyboardInterrupt:
console.print(f"\n⏹️ Live monitoring stopped (execution continues in background)", style="yellow")
console.print("\n⏹️ Live monitoring stopped (execution continues in background)", style="yellow")
except Exception as e:
console.print(f"⚠️ Failed to start live monitoring: {e}", style="yellow")
console.print(f"💡 You can still monitor manually: [bold cyan]fuzzforge monitor {response.run_id}[/bold cyan]")
# Wait for completion if requested
elif wait:
console.print(f"\n⏳ Waiting for execution to complete...")
console.print("\n⏳ Waiting for execution to complete...")
try:
final_status = client.wait_for_completion(response.run_id, poll_interval=POLL_INTERVAL)
@@ -334,17 +479,63 @@ def execute_workflow(
console.print(f"⚠️ Failed to update database: {e}", style="yellow")
console.print(f"🏁 Execution completed with status: {status_emoji(final_status.status)} {final_status.status}")
wait_completed = True # Mark wait as completed
if final_status.is_completed:
console.print(f"💡 View findings: [bold cyan]fuzzforge findings {response.run_id}[/bold cyan]")
# Export SARIF if requested
if export_sarif:
try:
console.print("\n📤 Exporting SARIF results...")
findings = client.get_run_findings(response.run_id)
output_path = Path(export_sarif)
with open(output_path, 'w') as f:
json.dump(findings.sarif, f, indent=2)
console.print(f"✅ SARIF exported to: [bold cyan]{output_path}[/bold cyan]")
except Exception as e:
console.print(f"⚠️ Failed to export SARIF: {e}", style="yellow")
# Check if build should fail based on findings
if fail_on:
try:
console.print(f"\n🔍 Checking findings against severity threshold: {fail_on}")
findings = client.get_run_findings(response.run_id)
if should_fail_build(findings.sarif, fail_on):
console.print("❌ [bold red]Build failed: Found blocking security issues[/bold red]")
console.print(f"💡 View details: [bold cyan]fuzzforge finding {response.run_id}[/bold cyan]")
raise typer.Exit(1)
else:
console.print("✅ [bold green]No blocking security issues found[/bold green]")
except typer.Exit:
raise # Re-raise Exit to preserve exit code
except Exception as e:
console.print(f"⚠️ Failed to check findings: {e}", style="yellow")
if not fail_on and not export_sarif:
console.print(f"💡 View findings: [bold cyan]fuzzforge findings {response.run_id}[/bold cyan]")
except KeyboardInterrupt:
console.print(f"\n⏹️ Monitoring cancelled (execution continues in background)", style="yellow")
console.print("\n⏹️ Monitoring cancelled (execution continues in background)", style="yellow")
except typer.Exit:
raise # Re-raise Exit to preserve exit code
except Exception as e:
handle_error(e, "waiting for completion")
except typer.Exit:
raise # Re-raise Exit to preserve exit code
except Exception as e:
handle_error(e, "executing workflow")
finally:
# Stop worker if auto-stop is enabled and wait completed
if should_auto_stop and worker_container and worker_mgr and wait_completed:
try:
console.print("\n🛑 Stopping worker (auto-stop enabled)...")
if worker_mgr.stop_worker(worker_container):
console.print(f"✅ Worker stopped: {worker_container}")
except Exception as e:
console.print(
f"⚠️ Failed to stop worker: {e}",
style="yellow"
)
@app.command("status")
@@ -409,7 +600,7 @@ def workflow_status(
console.print(
Panel.fit(
status_table,
title=f"📊 Status Information",
title="📊 Status Information",
box=box.ROUNDED
)
)
@@ -479,7 +670,7 @@ def workflow_history(
console.print()
console.print(table)
console.print(f"\n💡 Use [bold cyan]fuzzforge workflow status <execution-id>[/bold cyan] for detailed status")
console.print("\n💡 Use [bold cyan]fuzzforge workflow status <execution-id>[/bold cyan] for detailed status")
except Exception as e:
handle_error(e, "listing execution history")
@@ -527,7 +718,7 @@ def retry_workflow(
# Modify parameters if requested
if modify_params and parameters:
console.print(f"\n📝 [bold]Current parameters:[/bold]")
console.print("\n📝 [bold]Current parameters:[/bold]")
for key, value in parameters.items():
new_value = Prompt.ask(
f"{key}",
@@ -559,7 +750,7 @@ def retry_workflow(
response = client.submit_workflow(original_run.workflow, submission)
console.print(f"\n✅ Retry submitted successfully!", style="green")
console.print("\n✅ Retry submitted successfully!", style="green")
console.print(f" New Execution ID: [bold cyan]{response.run_id}[/bold cyan]")
console.print(f" Status: {status_emoji(response.status)} {response.status}")
@@ -578,7 +769,7 @@ def retry_workflow(
except Exception as e:
console.print(f"⚠️ Failed to save execution to database: {e}", style="yellow")
console.print(f"\n💡 Monitor progress: [bold cyan]fuzzforge monitor {response.run_id}[/bold cyan]")
console.print(f"\n💡 Monitor progress: [bold cyan]fuzzforge monitor stats {response.run_id}[/bold cyan]")
except Exception as e:
handle_error(e, "retrying workflow")
+4 -5
View File
@@ -18,10 +18,10 @@ import typer
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.prompt import Prompt, Confirm
from rich.prompt import Prompt
from rich.syntax import Syntax
from rich import box
from typing import Optional, Dict, Any
from typing import Optional
from ..config import get_project_config, FuzzForgeConfig
from ..fuzzy import enhanced_workflow_not_found_handler
@@ -68,7 +68,7 @@ def list_workflows():
console.print(f"\n🔧 [bold]Available Workflows ({len(workflows)})[/bold]\n")
console.print(table)
console.print(f"\n💡 Use [bold cyan]fuzzforge workflows info <name>[/bold cyan] for detailed information")
console.print("\n💡 Use [bold cyan]fuzzforge workflows info <name>[/bold cyan] for detailed information")
except Exception as e:
console.print(f"❌ Failed to fetch workflows: {e}", style="red")
@@ -100,7 +100,6 @@ def workflow_info(
info_table.add_row("Author", workflow.author)
if workflow.tags:
info_table.add_row("Tags", ", ".join(workflow.tags))
info_table.add_row("Volume Modes", ", ".join(workflow.supported_volume_modes))
info_table.add_row("Custom Docker", "✅ Yes" if workflow.has_custom_docker else "❌ No")
console.print(
@@ -193,7 +192,7 @@ def workflow_parameters(
parameters = {}
properties = workflow.parameters.get("properties", {})
required_params = set(workflow.parameters.get("required", []))
defaults = param_response.defaults
defaults = param_response.default_parameters
if interactive:
console.print("🔧 Enter parameter values (press Enter for default):\n")
+1 -1
View File
@@ -16,7 +16,7 @@ Provides intelligent tab completion for commands, workflows, run IDs, and parame
import typer
from typing import List, Optional
from typing import List
from pathlib import Path
from .config import get_project_config, FuzzForgeConfig
+10
View File
@@ -66,6 +66,15 @@ class PreferencesConfig(BaseModel):
color_output: bool = True
class WorkerConfig(BaseModel):
"""Worker lifecycle management configuration."""
auto_start_workers: bool = True
auto_stop_workers: bool = False
worker_startup_timeout: int = 60
docker_compose_file: Optional[str] = None
class CogneeConfig(BaseModel):
"""Cognee integration metadata."""
@@ -84,6 +93,7 @@ class FuzzForgeConfig(BaseModel):
project: ProjectConfig = Field(default_factory=ProjectConfig)
retention: RetentionConfig = Field(default_factory=RetentionConfig)
preferences: PreferencesConfig = Field(default_factory=PreferencesConfig)
workers: WorkerConfig = Field(default_factory=WorkerConfig)
cognee: CogneeConfig = Field(default_factory=CogneeConfig)
@classmethod
+1 -1
View File
@@ -163,7 +163,7 @@ class FuzzForgeDatabase:
"Database is corrupted. Use 'ff init --force' to reset."
) from e
raise
except Exception as e:
except Exception:
if conn:
try:
conn.rollback()
+6 -15
View File
@@ -15,7 +15,7 @@ Enhanced exception handling and error utilities for FuzzForge CLI with rich cont
import time
import functools
from typing import Any, Callable, Optional, Type, Union, List
from typing import Any, Callable, Optional, Union, List
from pathlib import Path
import typer
@@ -24,20 +24,10 @@ from rich.console import Console
from rich.panel import Panel
from rich.text import Text
from rich.table import Table
from rich.columns import Columns
from rich.syntax import Syntax
from rich.markdown import Markdown
# Import SDK exceptions for rich handling
from fuzzforge_sdk.exceptions import (
FuzzForgeError as SDKFuzzForgeError,
FuzzForgeHTTPError,
DeploymentError,
WorkflowExecutionError,
ContainerError,
VolumeError,
ValidationError as SDKValidationError,
ConnectionError as SDKConnectionError
FuzzForgeError as SDKFuzzForgeError
)
console = Console()
@@ -335,7 +325,7 @@ def handle_error(error: Exception, context: str = "") -> None:
# Show error details for debugging
console.print(f"\n[dim yellow]Error type: {type(error).__name__}[/dim yellow]")
console.print(f"[dim yellow]Please report this issue if it persists[/dim yellow]")
console.print("[dim yellow]Please report this issue if it persists[/dim yellow]")
console.print()
raise typer.Exit(1)
@@ -430,8 +420,9 @@ def validate_run_id(run_id: str) -> str:
if not run_id or len(run_id) < 8:
raise ValidationError("run_id", run_id, "at least 8 characters")
if not run_id.replace('-', '').isalnum():
raise ValidationError("run_id", run_id, "alphanumeric characters and hyphens only")
# Allow alphanumeric characters, hyphens, and underscores
if not run_id.replace('-', '').replace('_', '').isalnum():
raise ValidationError("run_id", run_id, "alphanumeric characters, hyphens, and underscores only")
return run_id

Some files were not shown because too many files have changed in this diff Show More