CI/CD Integration with Ephemeral Deployment Model (#14)

* feat: Complete migration from Prefect to Temporal

BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal

## Major Changes
- Replace Prefect with Temporal for workflow orchestration
- Implement vertical worker architecture (rust, android)
- Replace Docker registry with MinIO for unified storage
- Refactor activities to be co-located with workflows
- Update all API endpoints for Temporal compatibility

## Infrastructure
- New: docker-compose.temporal.yaml (Temporal + MinIO + workers)
- New: workers/ directory with rust and android vertical workers
- New: backend/src/temporal/ (manager, discovery)
- New: backend/src/storage/ (S3-cached storage with MinIO)
- New: backend/toolbox/common/ (shared storage activities)
- Deleted: docker-compose.yaml (old Prefect setup)
- Deleted: backend/src/core/prefect_manager.py
- Deleted: backend/src/services/prefect_stats_monitor.py
- Deleted: Docker registry and insecure-registries requirement

## Workflows
- Migrated: security_assessment workflow to Temporal
- New: rust_test workflow (example/test workflow)
- Deleted: secret_detection_scan (Prefect-based, to be reimplemented)
- Activities now co-located with workflows for independent testing

## API Changes
- Updated: backend/src/api/workflows.py (Temporal submission)
- Updated: backend/src/api/runs.py (Temporal status/results)
- Updated: backend/src/main.py (727 lines, TemporalManager integration)
- Updated: All 16 MCP tools to use TemporalManager

## Testing
-  All services healthy (Temporal, PostgreSQL, MinIO, workers, backend)
-  All API endpoints functional
-  End-to-end workflow test passed (72 findings from vulnerable_app)
-  MinIO storage integration working (target upload/download, results)
-  Worker activity discovery working (6 activities registered)
-  Tarball extraction working
-  SARIF report generation working

## Documentation
- ARCHITECTURE.md: Complete Temporal architecture documentation
- QUICKSTART_TEMPORAL.md: Getting started guide
- MIGRATION_DECISION.md: Why we chose Temporal over Prefect
- IMPLEMENTATION_STATUS.md: Migration progress tracking
- workers/README.md: Worker development guide

## Dependencies
- Added: temporalio>=1.6.0
- Added: boto3>=1.34.0 (MinIO S3 client)
- Removed: prefect>=3.4.18

* feat: Add Python fuzzing vertical with Atheris integration

This commit implements a complete Python fuzzing workflow using Atheris:

## Python Worker (workers/python/)
- Dockerfile with Python 3.11, Atheris, and build tools
- Generic worker.py for dynamic workflow discovery
- requirements.txt with temporalio, boto3, atheris dependencies
- Added to docker-compose.temporal.yaml with dedicated cache volume

## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/)
- Reusable module extending BaseModule
- Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py)
- Recursive search to find targets in nested directories
- Dynamically loads TestOneInput() function
- Configurable max_iterations and timeout
- Real-time stats callback support for live monitoring
- Returns findings as ModuleFinding objects

## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/)
- Temporal workflow for orchestrating fuzzing
- Downloads user code from MinIO
- Executes AtherisFuzzer module
- Uploads results to MinIO
- Cleans up cache after execution
- metadata.yaml with vertical: python for routing

## Test Project (test_projects/python_fuzz_waterfall/)
- Demonstrates stateful waterfall vulnerability
- main.py with check_secret() that leaks progress
- fuzz_target.py with Atheris TestOneInput() harness
- Complete README with usage instructions

## Backend Fixes
- Fixed parameter merging in REST API endpoints (workflows.py)
- Changed workflow parameter passing from positional args to kwargs (manager.py)
- Default parameters now properly merged with user parameters

## Testing
 Worker discovered AtherisFuzzingWorkflow
 Workflow executed end-to-end successfully
 Fuzz target auto-discovered in nested directories
 Atheris ran 100,000 iterations
 Results uploaded and cache cleaned

* chore: Complete Temporal migration with updated CLI/SDK/docs

This commit includes all remaining Temporal migration changes:

## CLI Updates (cli/)
- Updated workflow execution commands for Temporal
- Enhanced error handling and exceptions
- Updated dependencies in uv.lock

## SDK Updates (sdk/)
- Client methods updated for Temporal workflows
- Updated models for new workflow execution
- Updated dependencies in uv.lock

## Documentation Updates (docs/)
- Architecture documentation for Temporal
- Workflow concept documentation
- Resource management documentation (new)
- Debugging guide (new)
- Updated tutorials and how-to guides
- Troubleshooting updates

## README Updates
- Main README with Temporal instructions
- Backend README
- CLI README
- SDK README

## Other
- Updated IMPLEMENTATION_STATUS.md
- Removed old vulnerable_app.tar.gz

These changes complete the Temporal migration and ensure the
CLI/SDK work correctly with the new backend.

* fix: Use positional args instead of kwargs for Temporal workflows

The Temporal Python SDK's start_workflow() method doesn't accept
a 'kwargs' parameter. Workflows must receive parameters as positional
arguments via the 'args' parameter.

Changed from:
  args=workflow_args  # Positional arguments

This fixes the error:
  TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs'

Workflows now correctly receive parameters in order:
- security_assessment: [target_id, scanner_config, analyzer_config, reporter_config]
- atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds]
- rust_test: [target_id, test_message]

* fix: Filter metadata-only parameters from workflow arguments

SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5.
The issue was that target_path and volume_mode from default_parameters
were being passed to the workflow, when they should only be used by
the system for configuration.

Now filters out metadata-only parameters (target_path, volume_mode)
before passing arguments to workflow execution.

* refactor: Remove Prefect leftovers and volume mounting legacy

Complete cleanup of Prefect migration artifacts:

Backend:
- Delete registry.py and workflow_discovery.py (Prefect-specific files)
- Remove Docker validation from setup.py (no longer needed)
- Remove ResourceLimits and VolumeMount models
- Remove target_path and volume_mode from WorkflowSubmission
- Remove supported_volume_modes from API and discovery
- Clean up metadata.yaml files (remove volume/path fields)
- Simplify parameter filtering in manager.py

SDK:
- Remove volume_mode parameter from client methods
- Remove ResourceLimits and VolumeMount models
- Remove Prefect error patterns from docker_logs.py
- Clean up WorkflowSubmission and WorkflowMetadata models

CLI:
- Remove Volume Modes display from workflow info

All removed features are Prefect-specific or Docker volume mounting
artifacts. Temporal workflows use MinIO storage exclusively.

* feat: Add comprehensive test suite and benchmark infrastructure

- Add 68 unit tests for fuzzer, scanner, and analyzer modules
- Implement pytest-based test infrastructure with fixtures
- Add 6 performance benchmarks with category-specific thresholds
- Configure GitHub Actions for automated testing and benchmarking
- Add test and benchmark documentation

Test coverage:
- AtherisFuzzer: 8 tests
- CargoFuzzer: 14 tests
- FileScanner: 22 tests
- SecurityAnalyzer: 24 tests

All tests passing (68/68)
All benchmarks passing (6/6)

* fix: Resolve all ruff linting violations across codebase

Fixed 27 ruff violations in 12 files:
- Removed unused imports (Depends, Dict, Any, Optional, etc.)
- Fixed undefined workflow_info variable in workflows.py
- Removed dead code with undefined variables in atheris_fuzzer.py
- Changed f-string to regular string where no placeholders used

All files now pass ruff checks for CI/CD compliance.

* fix: Configure CI for unit tests only

- Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility
- Commented out integration-tests job (no integration tests yet)
- Updated test-summary to only depend on lint and unit-tests

CI will now run successfully with 68 unit tests. Integration tests can be added later.

* feat: Add CI/CD integration with ephemeral deployment model

Implements comprehensive CI/CD support for FuzzForge with on-demand worker management:

**Worker Management (v0.7.0)**
- Add WorkerManager for automatic worker lifecycle control
- Auto-start workers from stopped state when workflows execute
- Auto-stop workers after workflow completion
- Health checks and startup timeout handling (90s default)

**CI/CD Features**
- `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info)
- `--export-sarif` flag: Export findings in SARIF 2.1.0 format
- `--auto-start`/`--auto-stop` flags: Control worker lifecycle
- Exit code propagation: Returns 1 on blocking findings, 0 on success

**Exit Code Fix**
- Add `except typer.Exit: raise` handlers at 3 critical locations
- Move worker cleanup to finally block for guaranteed execution
- Exit codes now propagate correctly even when build fails

**CI Scripts & Examples**
- ci-start.sh: Start FuzzForge services with health checks
- ci-stop.sh: Clean shutdown with volume preservation option
- GitHub Actions workflow example (security-scan.yml)
- GitLab CI pipeline example (.gitlab-ci.example.yml)
- docker-compose.ci.yml: CI-optimized compose file with profiles

**OSS-Fuzz Integration**
- New ossfuzz_campaign workflow for running OSS-Fuzz projects
- OSS-Fuzz worker with Docker-in-Docker support
- Configurable campaign duration and project selection

**Documentation**
- Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md)
- Updated architecture docs with worker lifecycle details
- Updated workspace isolation documentation
- CLI README with worker management examples

**SDK Enhancements**
- Add get_workflow_worker_info() endpoint
- Worker vertical metadata in workflow responses

**Testing**
- All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing
- All monitoring commands tested: stats, crashes, status, finding
- Full CI pipeline simulation verified
- Exit codes verified for success/failure scenarios

Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers.

* fix: Resolve ruff linting violations in CI/CD code

- Remove unused variables (run_id, defaults, result)
- Remove unused imports
- Fix f-string without placeholders

All CI/CD integration files now pass ruff checks.
This commit is contained in:
tduhamel42
2025-10-14 10:13:45 +02:00
committed by GitHub
parent 987c49569c
commit 60ca088ecf
167 changed files with 26101 additions and 5703 deletions
-770
View File
@@ -1,770 +0,0 @@
"""
Prefect Manager - Core orchestration for workflow deployment and execution
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import os
import platform
import re
from pathlib import Path
from typing import Dict, Optional, Any
from prefect import get_client
from prefect.docker import DockerImage
from prefect.client.schemas import FlowRun
from src.core.workflow_discovery import WorkflowDiscovery, WorkflowInfo
logger = logging.getLogger(__name__)
def get_registry_url(context: str = "default") -> str:
"""
Get the container registry URL to use for a given operation context.
Goals:
- Work reliably across Linux and macOS Docker Desktop
- Prefer in-network service discovery when running inside containers
- Allow full override via env vars from docker-compose
Env overrides:
- FUZZFORGE_REGISTRY_PUSH_URL: used for image builds/pushes
- FUZZFORGE_REGISTRY_PULL_URL: used for workers to pull images
"""
# Normalize context
ctx = (context or "default").lower()
# Always honor explicit overrides first
if ctx in ("push", "build"):
push_url = os.getenv("FUZZFORGE_REGISTRY_PUSH_URL")
if push_url:
logger.debug("Using FUZZFORGE_REGISTRY_PUSH_URL: %s", push_url)
return push_url
# Default to host-published registry for Docker daemon operations
return "localhost:5001"
if ctx == "pull":
pull_url = os.getenv("FUZZFORGE_REGISTRY_PULL_URL")
if pull_url:
logger.debug("Using FUZZFORGE_REGISTRY_PULL_URL: %s", pull_url)
return pull_url
# Prefect worker pulls via host Docker daemon as well
return "localhost:5001"
# Default/fallback
return os.getenv("FUZZFORGE_REGISTRY_PULL_URL", os.getenv("FUZZFORGE_REGISTRY_PUSH_URL", "localhost:5001"))
def _compose_project_name(default: str = "fuzzforge") -> str:
"""Return the docker-compose project name used for network/volume naming.
Always returns 'fuzzforge' regardless of environment variables.
"""
return "fuzzforge"
class PrefectManager:
"""
Manages Prefect deployments and flow runs for discovered workflows.
This class handles:
- Workflow discovery and registration
- Docker image building through Prefect
- Deployment creation and management
- Flow run submission with volume mounting
- Findings retrieval from completed runs
"""
def __init__(self, workflows_dir: Path = None):
"""
Initialize the Prefect manager.
Args:
workflows_dir: Path to the workflows directory (default: toolbox/workflows)
"""
if workflows_dir is None:
workflows_dir = Path("toolbox/workflows")
self.discovery = WorkflowDiscovery(workflows_dir)
self.workflows: Dict[str, WorkflowInfo] = {}
self.deployments: Dict[str, str] = {} # workflow_name -> deployment_id
# Security: Define allowed and forbidden paths for host mounting
self.allowed_base_paths = [
"/tmp",
"/home",
"/Users", # macOS users
"/opt",
"/var/tmp",
"/workspace", # Common container workspace
"/app" # Container application directory (for test projects)
]
self.forbidden_paths = [
"/etc",
"/root",
"/var/run",
"/sys",
"/proc",
"/dev",
"/boot",
"/var/lib/docker", # Critical Docker data
"/var/log", # System logs
"/usr/bin", # System binaries
"/usr/sbin",
"/sbin",
"/bin"
]
@staticmethod
def _parse_memory_to_bytes(memory_str: str) -> int:
"""
Parse memory string (like '512Mi', '1Gi') to bytes.
Args:
memory_str: Memory string with unit suffix
Returns:
Memory in bytes
Raises:
ValueError: If format is invalid
"""
if not memory_str:
return 0
match = re.match(r'^(\d+(?:\.\d+)?)\s*([GMK]i?)$', memory_str.strip())
if not match:
raise ValueError(f"Invalid memory format: {memory_str}. Expected format like '512Mi', '1Gi'")
value, unit = match.groups()
value = float(value)
# Convert to bytes based on unit (binary units: Ki, Mi, Gi)
if unit in ['K', 'Ki']:
multiplier = 1024
elif unit in ['M', 'Mi']:
multiplier = 1024 * 1024
elif unit in ['G', 'Gi']:
multiplier = 1024 * 1024 * 1024
else:
raise ValueError(f"Unsupported memory unit: {unit}")
return int(value * multiplier)
@staticmethod
def _parse_cpu_to_millicores(cpu_str: str) -> int:
"""
Parse CPU string (like '500m', '1', '2.5') to millicores.
Args:
cpu_str: CPU string
Returns:
CPU in millicores (1 core = 1000 millicores)
Raises:
ValueError: If format is invalid
"""
if not cpu_str:
return 0
cpu_str = cpu_str.strip()
# Handle millicores format (e.g., '500m')
if cpu_str.endswith('m'):
try:
return int(cpu_str[:-1])
except ValueError:
raise ValueError(f"Invalid CPU format: {cpu_str}")
# Handle core format (e.g., '1', '2.5')
try:
cores = float(cpu_str)
return int(cores * 1000) # Convert to millicores
except ValueError:
raise ValueError(f"Invalid CPU format: {cpu_str}")
def _extract_resource_requirements(self, workflow_info: WorkflowInfo) -> Dict[str, str]:
"""
Extract resource requirements from workflow metadata.
Args:
workflow_info: Workflow information with metadata
Returns:
Dictionary with resource requirements in Docker format
"""
metadata = workflow_info.metadata
requirements = metadata.get("requirements", {})
resources = requirements.get("resources", {})
resource_config = {}
# Extract memory requirement
memory = resources.get("memory")
if memory:
try:
# Validate memory format and store original string for Docker
self._parse_memory_to_bytes(memory)
resource_config["memory"] = memory
except ValueError as e:
logger.warning(f"Invalid memory requirement in {workflow_info.name}: {e}")
# Extract CPU requirement
cpu = resources.get("cpu")
if cpu:
try:
# Validate CPU format and store original string for Docker
self._parse_cpu_to_millicores(cpu)
resource_config["cpus"] = cpu
except ValueError as e:
logger.warning(f"Invalid CPU requirement in {workflow_info.name}: {e}")
# Extract timeout
timeout = resources.get("timeout")
if timeout and isinstance(timeout, int):
resource_config["timeout"] = str(timeout)
return resource_config
async def initialize(self):
"""
Initialize the manager by discovering and deploying all workflows.
This method:
1. Discovers all valid workflows in the workflows directory
2. Validates their metadata
3. Deploys each workflow to Prefect with Docker images
"""
try:
# Discover workflows
self.workflows = await self.discovery.discover_workflows()
if not self.workflows:
logger.warning("No workflows discovered")
return
logger.info(f"Discovered {len(self.workflows)} workflows: {list(self.workflows.keys())}")
# Deploy each workflow
for name, info in self.workflows.items():
try:
await self._deploy_workflow(name, info)
except Exception as e:
logger.error(f"Failed to deploy workflow '{name}': {e}")
except Exception as e:
logger.error(f"Failed to initialize Prefect manager: {e}")
raise
async def _deploy_workflow(self, name: str, info: WorkflowInfo):
"""
Deploy a single workflow to Prefect with Docker image.
Args:
name: Workflow name
info: Workflow information including metadata and paths
"""
logger.info(f"Deploying workflow '{name}'...")
# Get the flow function from registry
flow_func = self.discovery.get_flow_function(name)
if not flow_func:
logger.error(
f"Failed to get flow function for '{name}' from registry. "
f"Ensure the workflow is properly registered in toolbox/workflows/registry.py"
)
return
# Use the mandatory Dockerfile with absolute paths for Docker Compose
# Get absolute paths for build context and dockerfile
toolbox_path = info.path.parent.parent.resolve()
dockerfile_abs_path = info.dockerfile.resolve()
# Calculate relative dockerfile path from toolbox context
try:
dockerfile_rel_path = dockerfile_abs_path.relative_to(toolbox_path)
except ValueError:
# If relative path fails, use the workflow-specific path
dockerfile_rel_path = Path("workflows") / name / "Dockerfile"
# Determine deployment strategy based on Dockerfile presence
base_image = "prefecthq/prefect:3-python3.11"
has_custom_dockerfile = info.has_docker and info.dockerfile.exists()
logger.info(f"=== DEPLOYMENT DEBUG for '{name}' ===")
logger.info(f"info.has_docker: {info.has_docker}")
logger.info(f"info.dockerfile: {info.dockerfile}")
logger.info(f"info.dockerfile.exists(): {info.dockerfile.exists()}")
logger.info(f"has_custom_dockerfile: {has_custom_dockerfile}")
logger.info(f"toolbox_path: {toolbox_path}")
logger.info(f"dockerfile_rel_path: {dockerfile_rel_path}")
if has_custom_dockerfile:
logger.info(f"Workflow '{name}' has custom Dockerfile - building custom image")
# Decide whether to use registry or keep images local to host engine
import os
# Default to using the local registry; set FUZZFORGE_USE_REGISTRY=false to bypass (not recommended)
use_registry = os.getenv("FUZZFORGE_USE_REGISTRY", "true").lower() == "true"
if use_registry:
registry_url = get_registry_url(context="push")
image_spec = DockerImage(
name=f"{registry_url}/fuzzforge/{name}",
tag="latest",
dockerfile=str(dockerfile_rel_path),
context=str(toolbox_path)
)
deploy_image = f"{registry_url}/fuzzforge/{name}:latest"
build_custom = True
push_custom = True
logger.info(f"Using registry: {registry_url} for '{name}'")
else:
# Single-host mode: build into host engine cache; no push required
image_spec = DockerImage(
name=f"fuzzforge/{name}",
tag="latest",
dockerfile=str(dockerfile_rel_path),
context=str(toolbox_path)
)
deploy_image = f"fuzzforge/{name}:latest"
build_custom = True
push_custom = False
logger.info("Using single-host image (no registry push): %s", deploy_image)
else:
logger.info(f"Workflow '{name}' using base image - no custom dependencies needed")
deploy_image = base_image
build_custom = False
push_custom = False
# Pre-validate registry connectivity when pushing
if push_custom:
try:
from .setup import validate_registry_connectivity
await validate_registry_connectivity(registry_url)
logger.info(f"Registry connectivity validated for {registry_url}")
except Exception as e:
logger.error(f"Registry connectivity validation failed for {registry_url}: {e}")
raise RuntimeError(f"Cannot deploy workflow '{name}': Registry {registry_url} is not accessible. {e}")
# Deploy the workflow
try:
# Ensure any previous deployment is removed so job variables are updated
try:
async with get_client() as client:
existing = await client.read_deployment_by_name(
f"{name}/{name}-deployment"
)
if existing:
logger.info(f"Removing existing deployment for '{name}' to refresh settings...")
await client.delete_deployment(existing.id)
except Exception:
# If not found or deletion fails, continue with deployment
pass
# Extract resource requirements from metadata
workflow_resource_requirements = self._extract_resource_requirements(info)
logger.info(f"Workflow '{name}' resource requirements: {workflow_resource_requirements}")
# Build job variables with resource requirements
job_variables = {
"image": deploy_image, # Use the worker-accessible registry name
"volumes": [], # Populated at run submission with toolbox mount
"env": {
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect",
"WORKFLOW_NAME": name
}
}
# Add resource requirements to job variables if present
if workflow_resource_requirements:
job_variables["resources"] = workflow_resource_requirements
# Prepare deployment parameters
deploy_params = {
"name": f"{name}-deployment",
"work_pool_name": "docker-pool",
"image": image_spec if has_custom_dockerfile else deploy_image,
"push": push_custom,
"build": build_custom,
"job_variables": job_variables
}
deployment = await flow_func.deploy(**deploy_params)
self.deployments[name] = str(deployment.id) if hasattr(deployment, 'id') else name
logger.info(f"Successfully deployed workflow '{name}'")
except Exception as e:
# Enhanced error reporting with more context
import traceback
logger.error(f"Failed to deploy workflow '{name}': {e}")
logger.error(f"Deployment traceback: {traceback.format_exc()}")
# Try to capture Docker-specific context
error_context = {
"workflow_name": name,
"has_dockerfile": has_custom_dockerfile,
"image_name": deploy_image if 'deploy_image' in locals() else "unknown",
"registry_url": registry_url if 'registry_url' in locals() else "unknown",
"error_type": type(e).__name__,
"error_message": str(e)
}
# Check for specific error patterns with detailed categorization
error_msg_lower = str(e).lower()
if "registry" in error_msg_lower and ("no such host" in error_msg_lower or "connection" in error_msg_lower):
error_context["category"] = "registry_connectivity_error"
error_context["solution"] = f"Cannot reach registry at {error_context['registry_url']}. Check Docker network and registry service."
elif "docker" in error_msg_lower:
error_context["category"] = "docker_error"
if "build" in error_msg_lower:
error_context["subcategory"] = "image_build_failed"
error_context["solution"] = "Check Dockerfile syntax and dependencies."
elif "pull" in error_msg_lower:
error_context["subcategory"] = "image_pull_failed"
error_context["solution"] = "Check if image exists in registry and network connectivity."
elif "push" in error_msg_lower:
error_context["subcategory"] = "image_push_failed"
error_context["solution"] = f"Check registry connectivity and push permissions to {error_context['registry_url']}."
elif "registry" in error_msg_lower:
error_context["category"] = "registry_error"
error_context["solution"] = "Check registry configuration and accessibility."
elif "prefect" in error_msg_lower:
error_context["category"] = "prefect_error"
error_context["solution"] = "Check Prefect server connectivity and deployment configuration."
else:
error_context["category"] = "unknown_deployment_error"
error_context["solution"] = "Check logs for more specific error details."
logger.error(f"Deployment error context: {error_context}")
# Raise enhanced exception with context
enhanced_error = Exception(f"Deployment failed for workflow '{name}': {str(e)} | Context: {error_context}")
enhanced_error.original_error = e
enhanced_error.context = error_context
raise enhanced_error
async def submit_workflow(
self,
workflow_name: str,
target_path: str,
volume_mode: str = "ro",
parameters: Dict[str, Any] = None,
resource_limits: Dict[str, str] = None,
additional_volumes: list = None,
timeout: int = None
) -> FlowRun:
"""
Submit a workflow for execution with volume mounting.
Args:
workflow_name: Name of the workflow to execute
target_path: Host path to mount as volume
volume_mode: Volume mount mode ("ro" for read-only, "rw" for read-write)
parameters: Workflow-specific parameters
resource_limits: CPU/memory limits for container
additional_volumes: List of additional volume mounts
timeout: Timeout in seconds
Returns:
FlowRun object with run information
Raises:
ValueError: If workflow not found or volume mode not supported
"""
if workflow_name not in self.workflows:
raise ValueError(f"Unknown workflow: {workflow_name}")
# Validate volume mode
workflow_info = self.workflows[workflow_name]
supported_modes = workflow_info.metadata.get("supported_volume_modes", ["ro", "rw"])
if volume_mode not in supported_modes:
raise ValueError(
f"Workflow '{workflow_name}' doesn't support volume mode '{volume_mode}'. "
f"Supported modes: {supported_modes}"
)
# Validate target path with security checks
self._validate_target_path(target_path)
# Validate additional volumes if provided
if additional_volumes:
for volume in additional_volumes:
self._validate_target_path(volume.host_path)
async with get_client() as client:
# Get the deployment, auto-redeploy once if missing
try:
deployment = await client.read_deployment_by_name(
f"{workflow_name}/{workflow_name}-deployment"
)
except Exception as e:
import traceback
logger.error(f"Failed to find deployment for workflow '{workflow_name}': {e}")
logger.error(f"Deployment lookup traceback: {traceback.format_exc()}")
# Attempt a one-time auto-deploy to recover from startup races
try:
logger.info(f"Auto-deploying missing workflow '{workflow_name}' and retrying...")
await self._deploy_workflow(workflow_name, workflow_info)
deployment = await client.read_deployment_by_name(
f"{workflow_name}/{workflow_name}-deployment"
)
except Exception as redeploy_exc:
# Enhanced error with context
error_context = {
"workflow_name": workflow_name,
"error_type": type(e).__name__,
"error_message": str(e),
"redeploy_error": str(redeploy_exc),
"available_deployments": list(self.deployments.keys()),
}
enhanced_error = ValueError(
f"Deployment not found and redeploy failed for workflow '{workflow_name}': {e} | Context: {error_context}"
)
enhanced_error.context = error_context
raise enhanced_error
# Determine the Docker Compose network name and volume names
# Hardcoded to 'fuzzforge' to avoid directory name dependencies
import os
compose_project = "fuzzforge"
docker_network = "fuzzforge_default"
# Build volume mounts
# Add toolbox volume mount for workflow code access
backend_toolbox_path = "/app/toolbox" # Path in backend container
# Hardcoded volume names
prefect_storage_volume = "fuzzforge_prefect_storage"
toolbox_code_volume = "fuzzforge_toolbox_code"
volumes = [
f"{target_path}:/workspace:{volume_mode}",
f"{prefect_storage_volume}:/prefect-storage", # Shared storage for results
f"{toolbox_code_volume}:/opt/prefect/toolbox:ro" # Mount workflow code
]
# Add additional volumes if provided
if additional_volumes:
for volume in additional_volumes:
volume_spec = f"{volume.host_path}:{volume.container_path}:{volume.mode}"
volumes.append(volume_spec)
# Build environment variables
env_vars = {
"PREFECT_API_URL": "http://prefect-server:4200/api", # Use internal network hostname
"PREFECT_LOGGING_LEVEL": "INFO",
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage", # Use shared storage
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true", # Enable result persistence
"PREFECT_DEFAULT_RESULT_STORAGE_BLOCK": "local-file-system/fuzzforge-results", # Use our storage block
"WORKSPACE_PATH": "/workspace",
"VOLUME_MODE": volume_mode,
"WORKFLOW_NAME": workflow_name
}
# Add additional volume paths to environment for easy access
if additional_volumes:
for i, volume in enumerate(additional_volumes):
env_vars[f"ADDITIONAL_VOLUME_{i}_PATH"] = volume.container_path
# Determine which image to use based on workflow configuration
workflow_info = self.workflows[workflow_name]
has_custom_dockerfile = workflow_info.has_docker and workflow_info.dockerfile.exists()
# Use pull context for worker to pull from registry
registry_url = get_registry_url(context="pull")
workflow_image = f"{registry_url}/fuzzforge/{workflow_name}:latest" if has_custom_dockerfile else "prefecthq/prefect:3-python3.11"
logger.debug(f"Worker will pull image: {workflow_image} (Registry: {registry_url})")
# Configure job variables with volume mounting and network access
job_variables = {
# Use custom image if available, otherwise base Prefect image
"image": workflow_image,
"volumes": volumes,
"networks": [docker_network], # Connect to Docker Compose network
"env": {
**env_vars,
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect/toolbox/workflows",
"WORKFLOW_NAME": workflow_name
}
}
# Apply resource requirements from workflow metadata and user overrides
workflow_resource_requirements = self._extract_resource_requirements(workflow_info)
final_resource_config = {}
# Start with workflow requirements as base
if workflow_resource_requirements:
final_resource_config.update(workflow_resource_requirements)
# Apply user-provided resource limits (overrides workflow defaults)
if resource_limits:
user_resource_config = {}
if resource_limits.get("cpu_limit"):
user_resource_config["cpus"] = resource_limits["cpu_limit"]
if resource_limits.get("memory_limit"):
user_resource_config["memory"] = resource_limits["memory_limit"]
# Note: cpu_request and memory_request are not directly supported by Docker
# but could be used for Kubernetes in the future
# User overrides take precedence
final_resource_config.update(user_resource_config)
# Apply final resource configuration
if final_resource_config:
job_variables["resources"] = final_resource_config
logger.info(f"Applied resource limits: {final_resource_config}")
# Merge parameters with defaults from metadata
default_params = workflow_info.metadata.get("default_parameters", {})
final_params = {**default_params, **(parameters or {})}
# Set flow parameters that match the flow signature
final_params["target_path"] = "/workspace" # Container path where volume is mounted
final_params["volume_mode"] = volume_mode
# Create and submit the flow run
# Pass job_variables to ensure network, volumes, and environment are configured
logger.info(f"Submitting flow with job_variables: {job_variables}")
logger.info(f"Submitting flow with parameters: {final_params}")
# Prepare flow run creation parameters
flow_run_params = {
"deployment_id": deployment.id,
"parameters": final_params,
"job_variables": job_variables
}
# Note: Timeout is handled through workflow-level configuration
# Additional timeout configuration can be added to deployment metadata if needed
flow_run = await client.create_flow_run_from_deployment(**flow_run_params)
logger.info(
f"Submitted workflow '{workflow_name}' with run_id: {flow_run.id}, "
f"target: {target_path}, mode: {volume_mode}"
)
return flow_run
async def get_flow_run_findings(self, run_id: str) -> Dict[str, Any]:
"""
Retrieve findings from a completed flow run.
Args:
run_id: The flow run ID
Returns:
Dictionary containing SARIF-formatted findings
Raises:
ValueError: If run not completed or not found
"""
async with get_client() as client:
flow_run = await client.read_flow_run(run_id)
if not flow_run.state.is_completed():
raise ValueError(
f"Flow run {run_id} not completed. Current status: {flow_run.state.name}"
)
# Get the findings from the flow run result
try:
findings = await flow_run.state.result()
return findings
except Exception as e:
logger.error(f"Failed to retrieve findings for run {run_id}: {e}")
raise ValueError(f"Failed to retrieve findings: {e}")
async def get_flow_run_status(self, run_id: str) -> Dict[str, Any]:
"""
Get the current status of a flow run.
Args:
run_id: The flow run ID
Returns:
Dictionary with status information
"""
async with get_client() as client:
flow_run = await client.read_flow_run(run_id)
return {
"run_id": str(flow_run.id),
"workflow": flow_run.deployment_id,
"status": flow_run.state.name,
"is_completed": flow_run.state.is_completed(),
"is_failed": flow_run.state.is_failed(),
"is_running": flow_run.state.is_running(),
"created_at": flow_run.created,
"updated_at": flow_run.updated
}
def _validate_target_path(self, target_path: str) -> None:
"""
Validate target path for security before mounting as volume.
Args:
target_path: Host path to validate
Raises:
ValueError: If path is not allowed for security reasons
"""
target = Path(target_path)
# Path must be absolute
if not target.is_absolute():
raise ValueError(f"Target path must be absolute: {target_path}")
# Resolve path to handle symlinks and relative components
try:
resolved_path = target.resolve()
except (OSError, RuntimeError) as e:
raise ValueError(f"Cannot resolve target path: {target_path} - {e}")
resolved_str = str(resolved_path)
# Check against forbidden paths first (more restrictive)
for forbidden in self.forbidden_paths:
if resolved_str.startswith(forbidden):
raise ValueError(
f"Access denied: Path '{target_path}' resolves to forbidden directory '{forbidden}'. "
f"This path contains sensitive system files and cannot be mounted."
)
# Check if path starts with any allowed base path
path_allowed = False
for allowed in self.allowed_base_paths:
if resolved_str.startswith(allowed):
path_allowed = True
break
if not path_allowed:
allowed_list = ", ".join(self.allowed_base_paths)
raise ValueError(
f"Access denied: Path '{target_path}' is not in allowed directories. "
f"Allowed base paths: {allowed_list}"
)
# Additional security checks
if resolved_str == "/":
raise ValueError("Cannot mount root filesystem")
# Warn if path doesn't exist (but don't block - it might be created later)
if not resolved_path.exists():
logger.warning(f"Target path does not exist: {target_path}")
logger.info(f"Path validation passed for: {target_path} -> {resolved_str}")
+10 -367
View File
@@ -1,5 +1,5 @@
"""
Setup utilities for Prefect infrastructure
Setup utilities for FuzzForge infrastructure
"""
# Copyright (c) 2025 FuzzingLabs
@@ -14,364 +14,21 @@ Setup utilities for Prefect infrastructure
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from prefect import get_client
from prefect.client.schemas.actions import WorkPoolCreate
from prefect.client.schemas.objects import WorkPool
from .prefect_manager import get_registry_url
logger = logging.getLogger(__name__)
async def setup_docker_pool():
"""
Create or update the Docker work pool for container execution.
This work pool is configured to:
- Connect to the local Docker daemon
- Support volume mounting at runtime
- Clean up containers after execution
- Use bridge networking by default
"""
import os
async with get_client() as client:
pool_name = "docker-pool"
# Add force recreation flag for debugging fresh install issues
force_recreate = os.getenv('FORCE_RECREATE_WORK_POOL', 'false').lower() == 'true'
debug_setup = os.getenv('DEBUG_WORK_POOL_SETUP', 'false').lower() == 'true'
if force_recreate:
logger.warning(f"FORCE_RECREATE_WORK_POOL=true - Will recreate work pool regardless of existing configuration")
if debug_setup:
logger.warning(f"DEBUG_WORK_POOL_SETUP=true - Enhanced logging enabled")
# Temporarily set logging level to DEBUG for this function
original_level = logger.level
logger.setLevel(logging.DEBUG)
try:
# Check if pool already exists and supports custom images
existing_pools = await client.read_work_pools()
existing_pool = None
for pool in existing_pools:
if pool.name == pool_name:
existing_pool = pool
break
if existing_pool and not force_recreate:
logger.info(f"Found existing work pool '{pool_name}' - validating configuration...")
# Check if the existing pool has the correct configuration
base_template = existing_pool.base_job_template or {}
logger.debug(f"Base template keys: {list(base_template.keys())}")
job_config = base_template.get("job_configuration", {})
logger.debug(f"Job config keys: {list(job_config.keys())}")
image_config = job_config.get("image", "")
has_image_variable = "{{ image }}" in str(image_config)
logger.debug(f"Image config: '{image_config}' -> has_image_variable: {has_image_variable}")
# Check if volume defaults include toolbox mount
variables = base_template.get("variables", {})
properties = variables.get("properties", {})
volume_config = properties.get("volumes", {})
volume_defaults = volume_config.get("default", [])
has_toolbox_volume = any("toolbox_code" in str(vol) for vol in volume_defaults) if volume_defaults else False
logger.debug(f"Volume defaults: {volume_defaults}")
logger.debug(f"Has toolbox volume: {has_toolbox_volume}")
# Check if environment defaults include required settings
env_config = properties.get("env", {})
env_defaults = env_config.get("default", {})
has_api_url = "PREFECT_API_URL" in env_defaults
has_storage_path = "PREFECT_LOCAL_STORAGE_PATH" in env_defaults
has_results_persist = "PREFECT_RESULTS_PERSIST_BY_DEFAULT" in env_defaults
has_required_env = has_api_url and has_storage_path and has_results_persist
logger.debug(f"Environment defaults: {env_defaults}")
logger.debug(f"Has API URL: {has_api_url}, Has storage path: {has_storage_path}, Has results persist: {has_results_persist}")
logger.debug(f"Has required env: {has_required_env}")
# Log the full validation result
logger.info(f"Work pool validation - Image: {has_image_variable}, Toolbox: {has_toolbox_volume}, Environment: {has_required_env}")
if has_image_variable and has_toolbox_volume and has_required_env:
logger.info(f"Docker work pool '{pool_name}' already exists with correct configuration")
return
else:
reasons = []
if not has_image_variable:
reasons.append("missing image template")
if not has_toolbox_volume:
reasons.append("missing toolbox volume mount")
if not has_required_env:
if not has_api_url:
reasons.append("missing PREFECT_API_URL")
if not has_storage_path:
reasons.append("missing PREFECT_LOCAL_STORAGE_PATH")
if not has_results_persist:
reasons.append("missing PREFECT_RESULTS_PERSIST_BY_DEFAULT")
logger.warning(f"Docker work pool '{pool_name}' exists but lacks: {', '.join(reasons)}. Recreating...")
# Delete the old pool and recreate it
try:
await client.delete_work_pool(pool_name)
logger.info(f"Deleted old work pool '{pool_name}'")
except Exception as e:
logger.warning(f"Failed to delete old work pool: {e}")
elif force_recreate and existing_pool:
logger.warning(f"Force recreation enabled - deleting existing work pool '{pool_name}'")
try:
await client.delete_work_pool(pool_name)
logger.info(f"Deleted existing work pool for force recreation")
except Exception as e:
logger.warning(f"Failed to delete work pool for force recreation: {e}")
logger.info(f"Creating Docker work pool '{pool_name}' with custom image support...")
# Create the work pool with proper Docker configuration
work_pool = WorkPoolCreate(
name=pool_name,
type="docker",
description="Docker work pool for FuzzForge workflows with custom image support",
base_job_template={
"job_configuration": {
"image": "{{ image }}", # Template variable for custom images
"volumes": "{{ volumes }}", # List of volume mounts
"env": "{{ env }}", # Environment variables
"networks": "{{ networks }}", # Docker networks
"stream_output": True,
"auto_remove": True,
"privileged": False,
"network_mode": None, # Use networks instead
"labels": {},
"command": None # Let the image's CMD/ENTRYPOINT run
},
"variables": {
"type": "object",
"properties": {
"image": {
"type": "string",
"title": "Docker Image",
"default": "prefecthq/prefect:3-python3.11",
"description": "Docker image for the flow run"
},
"volumes": {
"type": "array",
"title": "Volume Mounts",
"default": [
"fuzzforge_prefect_storage:/prefect-storage",
"fuzzforge_toolbox_code:/opt/prefect/toolbox:ro"
],
"description": "Volume mounts in format 'host:container:mode'",
"items": {
"type": "string"
}
},
"networks": {
"type": "array",
"title": "Docker Networks",
"default": ["fuzzforge_default"],
"description": "Docker networks to connect container to",
"items": {
"type": "string"
}
},
"env": {
"type": "object",
"title": "Environment Variables",
"default": {
"PREFECT_API_URL": "http://prefect-server:4200/api",
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage",
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true"
},
"description": "Environment variables for the container",
"additionalProperties": {
"type": "string"
}
}
}
}
}
)
await client.create_work_pool(work_pool)
logger.info(f"Created Docker work pool '{pool_name}'")
except Exception as e:
logger.error(f"Failed to setup Docker work pool: {e}")
raise
finally:
# Restore original logging level if debug mode was enabled
if debug_setup and 'original_level' in locals():
logger.setLevel(original_level)
def get_actual_compose_project_name():
"""
Return the hardcoded compose project name for FuzzForge.
Always returns 'fuzzforge' as per system requirements.
"""
logger.info("Using hardcoded compose project name: fuzzforge")
return "fuzzforge"
async def setup_result_storage():
"""
Create or update Prefect result storage block for findings persistence.
Setup result storage (MinIO).
This sets up a LocalFileSystem storage block pointing to the shared
/prefect-storage volume for result persistence.
MinIO is used for both target upload and result storage.
This is a placeholder for any MinIO-specific setup if needed.
"""
from prefect.filesystems import LocalFileSystem
storage_name = "fuzzforge-results"
try:
# Create the storage block, overwrite if it exists
logger.info(f"Setting up storage block '{storage_name}'...")
storage = LocalFileSystem(basepath="/prefect-storage")
block_doc_id = await storage.save(name=storage_name, overwrite=True)
logger.info(f"Storage block '{storage_name}' configured successfully")
return str(block_doc_id)
except Exception as e:
logger.error(f"Failed to setup result storage: {e}")
# Don't raise the exception - continue without storage block
logger.warning("Continuing without result storage block - findings may not persist")
return None
async def validate_docker_connection():
"""
Validate that Docker is accessible and running.
Note: In containerized deployments with Docker socket proxy,
the backend doesn't need direct Docker access.
Raises:
RuntimeError: If Docker is not accessible
"""
import os
# Skip Docker validation if running in container without socket access
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
logger.info("Running in container without Docker socket - skipping Docker validation")
return
try:
import docker
client = docker.from_env()
client.ping()
logger.info("Docker connection validated")
except Exception as e:
logger.error(f"Docker is not accessible: {e}")
raise RuntimeError(
"Docker is not running or not accessible. "
"Please ensure Docker is installed and running."
)
async def validate_registry_connectivity(registry_url: str = None):
"""
Validate that the Docker registry is accessible.
Args:
registry_url: URL of the Docker registry to validate (auto-detected if None)
Raises:
RuntimeError: If registry is not accessible
"""
# Resolve a reachable test URL from within this process
if registry_url is None:
# If not specified, prefer internal service name in containers, host port on host
import os
if os.path.exists('/.dockerenv'):
registry_url = "registry:5000"
else:
registry_url = "localhost:5001"
# If we're running inside a container and asked to probe localhost:PORT,
# the probe would hit the container, not the host. Use host.docker.internal instead.
import os
try:
host_part, port_part = registry_url.split(":", 1)
except ValueError:
host_part, port_part = registry_url, "80"
if os.path.exists('/.dockerenv') and host_part in ("localhost", "127.0.0.1"):
test_host = "host.docker.internal"
else:
test_host = host_part
test_url = f"http://{test_host}:{port_part}/v2/"
import aiohttp
import asyncio
logger.info(f"Validating registry connectivity to {registry_url}...")
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
async with session.get(test_url) as response:
if response.status == 200:
logger.info(f"Registry at {registry_url} is accessible (tested via {test_host})")
return
else:
raise RuntimeError(f"Registry returned status {response.status}")
except asyncio.TimeoutError:
raise RuntimeError(f"Registry at {registry_url} is not responding (timeout)")
except aiohttp.ClientError as e:
raise RuntimeError(f"Registry at {registry_url} is not accessible: {e}")
except Exception as e:
raise RuntimeError(f"Failed to validate registry connectivity: {e}")
async def validate_docker_network(network_name: str):
"""
Validate that the specified Docker network exists.
Args:
network_name: Name of the Docker network to validate
Raises:
RuntimeError: If network doesn't exist
"""
import os
# Skip network validation if running in container without Docker socket
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
logger.info("Running in container without Docker socket - skipping network validation")
return
try:
import docker
client = docker.from_env()
# List all networks
networks = client.networks.list(names=[network_name])
if not networks:
# Try to find networks with similar names
all_networks = client.networks.list()
similar_networks = [n.name for n in all_networks if "fuzzforge" in n.name.lower()]
error_msg = f"Docker network '{network_name}' not found."
if similar_networks:
error_msg += f" Available networks: {similar_networks}"
else:
error_msg += " Please ensure Docker Compose is running."
raise RuntimeError(error_msg)
logger.info(f"Docker network '{network_name}' validated")
except Exception as e:
if isinstance(e, RuntimeError):
raise
logger.error(f"Network validation failed: {e}")
raise RuntimeError(f"Failed to validate Docker network: {e}")
logger.info("Result storage (MinIO) configured")
# MinIO is configured via environment variables in docker-compose
# No additional setup needed here
return True
async def validate_infrastructure():
@@ -382,21 +39,7 @@ async def validate_infrastructure():
"""
logger.info("Validating infrastructure...")
# Validate Docker connection
await validate_docker_connection()
# Validate registry connectivity for custom image building
await validate_registry_connectivity()
# Validate network (hardcoded to avoid directory name dependencies)
import os
compose_project = "fuzzforge"
docker_network = "fuzzforge_default"
try:
await validate_docker_network(docker_network)
except RuntimeError as e:
logger.warning(f"Network validation failed: {e}")
logger.warning("Workflows may not be able to connect to Prefect services")
# Setup storage (MinIO)
await setup_result_storage()
logger.info("Infrastructure validation completed")
-459
View File
@@ -1,459 +0,0 @@
"""
Workflow Discovery - Registry-based discovery and loading of workflows
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import yaml
from pathlib import Path
from typing import Dict, Optional, Any, Callable
from pydantic import BaseModel, Field, ConfigDict
logger = logging.getLogger(__name__)
class WorkflowInfo(BaseModel):
"""Information about a discovered workflow"""
name: str = Field(..., description="Workflow name")
path: Path = Field(..., description="Path to workflow directory")
workflow_file: Path = Field(..., description="Path to workflow.py file")
dockerfile: Path = Field(..., description="Path to Dockerfile")
has_docker: bool = Field(..., description="Whether workflow has custom Dockerfile")
metadata: Dict[str, Any] = Field(..., description="Workflow metadata from YAML")
flow_function_name: str = Field(default="main_flow", description="Name of the flow function")
model_config = ConfigDict(arbitrary_types_allowed=True)
class WorkflowDiscovery:
"""
Discovers workflows from the filesystem and validates them against the registry.
This system:
1. Scans for workflows with metadata.yaml files
2. Cross-references them with the manual registry
3. Provides registry-based flow functions for deployment
Workflows must have:
- workflow.py: Contains the Prefect flow
- metadata.yaml: Mandatory metadata file
- Entry in toolbox/workflows/registry.py: Manual registration
- Dockerfile (optional): Custom container definition
- requirements.txt (optional): Python dependencies
"""
def __init__(self, workflows_dir: Path):
"""
Initialize workflow discovery.
Args:
workflows_dir: Path to the workflows directory
"""
self.workflows_dir = workflows_dir
if not self.workflows_dir.exists():
self.workflows_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Created workflows directory: {self.workflows_dir}")
# Import registry - this validates it on import
try:
from toolbox.workflows.registry import WORKFLOW_REGISTRY, list_registered_workflows
self.registry = WORKFLOW_REGISTRY
logger.info(f"Loaded workflow registry with {len(self.registry)} registered workflows")
except ImportError as e:
logger.error(f"Failed to import workflow registry: {e}")
self.registry = {}
except Exception as e:
logger.error(f"Registry validation failed: {e}")
self.registry = {}
# Cache for discovered workflows
self._workflow_cache: Optional[Dict[str, WorkflowInfo]] = None
self._cache_timestamp: Optional[float] = None
self._cache_ttl = 60.0 # Cache TTL in seconds
async def discover_workflows(self) -> Dict[str, WorkflowInfo]:
"""
Discover workflows by cross-referencing filesystem with registry.
Uses caching to avoid frequent filesystem scans.
Returns:
Dictionary mapping workflow names to their information
"""
# Check cache validity
import time
current_time = time.time()
if (self._workflow_cache is not None and
self._cache_timestamp is not None and
(current_time - self._cache_timestamp) < self._cache_ttl):
# Return cached results
logger.debug(f"Returning cached workflow discovery ({len(self._workflow_cache)} workflows)")
return self._workflow_cache
workflows = {}
discovered_dirs = set()
registry_names = set(self.registry.keys())
if not self.workflows_dir.exists():
logger.warning(f"Workflows directory does not exist: {self.workflows_dir}")
return workflows
# Recursively scan all directories and subdirectories
await self._scan_directory_recursive(self.workflows_dir, workflows, discovered_dirs)
# Check for registry entries without corresponding directories
missing_dirs = registry_names - discovered_dirs
if missing_dirs:
logger.warning(
f"Registry contains workflows without filesystem directories: {missing_dirs}. "
f"These workflows cannot be deployed."
)
logger.info(
f"Discovery complete: {len(workflows)} workflows ready for deployment, "
f"{len(missing_dirs)} registry entries missing directories, "
f"{len(discovered_dirs - registry_names)} filesystem workflows not registered"
)
# Update cache
self._workflow_cache = workflows
self._cache_timestamp = current_time
return workflows
async def _scan_directory_recursive(self, directory: Path, workflows: Dict[str, WorkflowInfo], discovered_dirs: set):
"""
Recursively scan directory for workflows.
Args:
directory: Directory to scan
workflows: Dictionary to populate with discovered workflows
discovered_dirs: Set to track discovered workflow names
"""
for item in directory.iterdir():
if not item.is_dir():
continue
if item.name.startswith('_') or item.name.startswith('.'):
continue # Skip hidden or private directories
# Check if this directory contains workflow files (workflow.py and metadata.yaml)
workflow_file = item / "workflow.py"
metadata_file = item / "metadata.yaml"
if workflow_file.exists() and metadata_file.exists():
# This is a workflow directory
workflow_name = item.name
discovered_dirs.add(workflow_name)
# Only process workflows that are in the registry
if workflow_name not in self.registry:
logger.warning(
f"Workflow '{workflow_name}' found in filesystem but not in registry. "
f"Add it to toolbox/workflows/registry.py to enable deployment."
)
continue
try:
workflow_info = await self._load_workflow(item)
if workflow_info:
workflows[workflow_info.name] = workflow_info
logger.info(f"Discovered and registered workflow: {workflow_info.name}")
except Exception as e:
logger.error(f"Failed to load workflow from {item}: {e}")
else:
# This is a category directory, recurse into it
await self._scan_directory_recursive(item, workflows, discovered_dirs)
async def _load_workflow(self, workflow_dir: Path) -> Optional[WorkflowInfo]:
"""
Load and validate a single workflow.
Args:
workflow_dir: Path to the workflow directory
Returns:
WorkflowInfo if valid, None otherwise
"""
workflow_name = workflow_dir.name
# Check for mandatory files
workflow_file = workflow_dir / "workflow.py"
metadata_file = workflow_dir / "metadata.yaml"
if not workflow_file.exists():
logger.warning(f"Workflow {workflow_name} missing workflow.py")
return None
if not metadata_file.exists():
logger.error(f"Workflow {workflow_name} missing mandatory metadata.yaml")
return None
# Load and validate metadata
try:
metadata = self._load_metadata(metadata_file)
if not self._validate_metadata(metadata, workflow_name):
return None
except Exception as e:
logger.error(f"Failed to load metadata for {workflow_name}: {e}")
return None
# Check for mandatory Dockerfile
dockerfile = workflow_dir / "Dockerfile"
if not dockerfile.exists():
logger.error(f"Workflow {workflow_name} missing mandatory Dockerfile")
return None
has_docker = True # Always True since Dockerfile is mandatory
# Get flow function name from metadata or use default
flow_function_name = metadata.get("flow_function", "main_flow")
return WorkflowInfo(
name=workflow_name,
path=workflow_dir,
workflow_file=workflow_file,
dockerfile=dockerfile,
has_docker=has_docker,
metadata=metadata,
flow_function_name=flow_function_name
)
def _load_metadata(self, metadata_file: Path) -> Dict[str, Any]:
"""
Load metadata from YAML file.
Args:
metadata_file: Path to metadata.yaml
Returns:
Dictionary containing metadata
"""
with open(metadata_file, 'r') as f:
metadata = yaml.safe_load(f)
if metadata is None:
raise ValueError("Empty metadata file")
return metadata
def _validate_metadata(self, metadata: Dict[str, Any], workflow_name: str) -> bool:
"""
Validate that metadata contains all required fields.
Args:
metadata: Metadata dictionary
workflow_name: Name of the workflow for logging
Returns:
True if valid, False otherwise
"""
required_fields = ["name", "version", "description", "author", "category", "parameters", "requirements"]
missing_fields = []
for field in required_fields:
if field not in metadata:
missing_fields.append(field)
if missing_fields:
logger.error(
f"Workflow {workflow_name} metadata missing required fields: {missing_fields}"
)
return False
# Validate version format (semantic versioning)
version = metadata.get("version", "")
if not self._is_valid_version(version):
logger.error(f"Workflow {workflow_name} has invalid version format: {version}")
return False
# Validate parameters structure
parameters = metadata.get("parameters", {})
if not isinstance(parameters, dict):
logger.error(f"Workflow {workflow_name} parameters must be a dictionary")
return False
return True
def _is_valid_version(self, version: str) -> bool:
"""
Check if version follows semantic versioning (x.y.z).
Args:
version: Version string
Returns:
True if valid semantic version
"""
try:
parts = version.split('.')
if len(parts) != 3:
return False
for part in parts:
int(part) # Check if each part is a number
return True
except (ValueError, AttributeError):
return False
def invalidate_cache(self) -> None:
"""
Invalidate the workflow discovery cache.
Useful when workflows are added or modified.
"""
self._workflow_cache = None
self._cache_timestamp = None
logger.debug("Workflow discovery cache invalidated")
def get_flow_function(self, workflow_name: str) -> Optional[Callable]:
"""
Get the flow function from the registry.
Args:
workflow_name: Name of the workflow
Returns:
The flow function if found in registry, None otherwise
"""
if workflow_name not in self.registry:
logger.error(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {list(self.registry.keys())}"
)
return None
try:
from toolbox.workflows.registry import get_workflow_flow
flow_func = get_workflow_flow(workflow_name)
logger.debug(f"Retrieved flow function for '{workflow_name}' from registry")
return flow_func
except Exception as e:
logger.error(f"Failed to get flow function for '{workflow_name}': {e}")
return None
def get_registry_info(self, workflow_name: str) -> Optional[Dict[str, Any]]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information if found, None otherwise
"""
if workflow_name not in self.registry:
return None
try:
from toolbox.workflows.registry import get_workflow_info
return get_workflow_info(workflow_name)
except Exception as e:
logger.error(f"Failed to get registry info for '{workflow_name}': {e}")
return None
@staticmethod
def get_metadata_schema() -> Dict[str, Any]:
"""
Get the JSON schema for workflow metadata.
Returns:
JSON schema dictionary
"""
return {
"type": "object",
"required": ["name", "version", "description", "author", "category", "parameters", "requirements"],
"properties": {
"name": {
"type": "string",
"description": "Workflow name"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Semantic version (x.y.z)"
},
"description": {
"type": "string",
"description": "Workflow description"
},
"author": {
"type": "string",
"description": "Workflow author"
},
"category": {
"type": "string",
"enum": ["comprehensive", "specialized", "fuzzing", "focused"],
"description": "Workflow category"
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Workflow tags for categorization"
},
"requirements": {
"type": "object",
"required": ["tools", "resources"],
"properties": {
"tools": {
"type": "array",
"items": {"type": "string"},
"description": "Required security tools"
},
"resources": {
"type": "object",
"required": ["memory", "cpu", "timeout"],
"properties": {
"memory": {
"type": "string",
"pattern": "^\\d+[GMK]i$",
"description": "Memory limit (e.g., 1Gi, 512Mi)"
},
"cpu": {
"type": "string",
"pattern": "^\\d+m?$",
"description": "CPU limit (e.g., 1000m, 2)"
},
"timeout": {
"type": "integer",
"minimum": 60,
"maximum": 7200,
"description": "Workflow timeout in seconds"
}
}
}
}
},
"parameters": {
"type": "object",
"description": "Workflow parameters schema"
},
"default_parameters": {
"type": "object",
"description": "Default parameter values"
},
"required_modules": {
"type": "array",
"items": {"type": "string"},
"description": "Required module names"
},
"supported_volume_modes": {
"type": "array",
"items": {"enum": ["ro", "rw"]},
"default": ["ro", "rw"],
"description": "Supported volume mount modes"
},
"flow_function": {
"type": "string",
"default": "main_flow",
"description": "Name of the flow function in workflow.py"
}
}
}