mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-05-21 10:16:48 +02:00
60ca088ecf
* feat: Complete migration from Prefect to Temporal BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal ## Major Changes - Replace Prefect with Temporal for workflow orchestration - Implement vertical worker architecture (rust, android) - Replace Docker registry with MinIO for unified storage - Refactor activities to be co-located with workflows - Update all API endpoints for Temporal compatibility ## Infrastructure - New: docker-compose.temporal.yaml (Temporal + MinIO + workers) - New: workers/ directory with rust and android vertical workers - New: backend/src/temporal/ (manager, discovery) - New: backend/src/storage/ (S3-cached storage with MinIO) - New: backend/toolbox/common/ (shared storage activities) - Deleted: docker-compose.yaml (old Prefect setup) - Deleted: backend/src/core/prefect_manager.py - Deleted: backend/src/services/prefect_stats_monitor.py - Deleted: Docker registry and insecure-registries requirement ## Workflows - Migrated: security_assessment workflow to Temporal - New: rust_test workflow (example/test workflow) - Deleted: secret_detection_scan (Prefect-based, to be reimplemented) - Activities now co-located with workflows for independent testing ## API Changes - Updated: backend/src/api/workflows.py (Temporal submission) - Updated: backend/src/api/runs.py (Temporal status/results) - Updated: backend/src/main.py (727 lines, TemporalManager integration) - Updated: All 16 MCP tools to use TemporalManager ## Testing - ✅ All services healthy (Temporal, PostgreSQL, MinIO, workers, backend) - ✅ All API endpoints functional - ✅ End-to-end workflow test passed (72 findings from vulnerable_app) - ✅ MinIO storage integration working (target upload/download, results) - ✅ Worker activity discovery working (6 activities registered) - ✅ Tarball extraction working - ✅ SARIF report generation working ## Documentation - ARCHITECTURE.md: Complete Temporal architecture documentation - QUICKSTART_TEMPORAL.md: Getting started guide - MIGRATION_DECISION.md: Why we chose Temporal over Prefect - IMPLEMENTATION_STATUS.md: Migration progress tracking - workers/README.md: Worker development guide ## Dependencies - Added: temporalio>=1.6.0 - Added: boto3>=1.34.0 (MinIO S3 client) - Removed: prefect>=3.4.18 * feat: Add Python fuzzing vertical with Atheris integration This commit implements a complete Python fuzzing workflow using Atheris: ## Python Worker (workers/python/) - Dockerfile with Python 3.11, Atheris, and build tools - Generic worker.py for dynamic workflow discovery - requirements.txt with temporalio, boto3, atheris dependencies - Added to docker-compose.temporal.yaml with dedicated cache volume ## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/) - Reusable module extending BaseModule - Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py) - Recursive search to find targets in nested directories - Dynamically loads TestOneInput() function - Configurable max_iterations and timeout - Real-time stats callback support for live monitoring - Returns findings as ModuleFinding objects ## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/) - Temporal workflow for orchestrating fuzzing - Downloads user code from MinIO - Executes AtherisFuzzer module - Uploads results to MinIO - Cleans up cache after execution - metadata.yaml with vertical: python for routing ## Test Project (test_projects/python_fuzz_waterfall/) - Demonstrates stateful waterfall vulnerability - main.py with check_secret() that leaks progress - fuzz_target.py with Atheris TestOneInput() harness - Complete README with usage instructions ## Backend Fixes - Fixed parameter merging in REST API endpoints (workflows.py) - Changed workflow parameter passing from positional args to kwargs (manager.py) - Default parameters now properly merged with user parameters ## Testing ✅ Worker discovered AtherisFuzzingWorkflow ✅ Workflow executed end-to-end successfully ✅ Fuzz target auto-discovered in nested directories ✅ Atheris ran 100,000 iterations ✅ Results uploaded and cache cleaned * chore: Complete Temporal migration with updated CLI/SDK/docs This commit includes all remaining Temporal migration changes: ## CLI Updates (cli/) - Updated workflow execution commands for Temporal - Enhanced error handling and exceptions - Updated dependencies in uv.lock ## SDK Updates (sdk/) - Client methods updated for Temporal workflows - Updated models for new workflow execution - Updated dependencies in uv.lock ## Documentation Updates (docs/) - Architecture documentation for Temporal - Workflow concept documentation - Resource management documentation (new) - Debugging guide (new) - Updated tutorials and how-to guides - Troubleshooting updates ## README Updates - Main README with Temporal instructions - Backend README - CLI README - SDK README ## Other - Updated IMPLEMENTATION_STATUS.md - Removed old vulnerable_app.tar.gz These changes complete the Temporal migration and ensure the CLI/SDK work correctly with the new backend. * fix: Use positional args instead of kwargs for Temporal workflows The Temporal Python SDK's start_workflow() method doesn't accept a 'kwargs' parameter. Workflows must receive parameters as positional arguments via the 'args' parameter. Changed from: args=workflow_args # Positional arguments This fixes the error: TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs' Workflows now correctly receive parameters in order: - security_assessment: [target_id, scanner_config, analyzer_config, reporter_config] - atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds] - rust_test: [target_id, test_message] * fix: Filter metadata-only parameters from workflow arguments SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5. The issue was that target_path and volume_mode from default_parameters were being passed to the workflow, when they should only be used by the system for configuration. Now filters out metadata-only parameters (target_path, volume_mode) before passing arguments to workflow execution. * refactor: Remove Prefect leftovers and volume mounting legacy Complete cleanup of Prefect migration artifacts: Backend: - Delete registry.py and workflow_discovery.py (Prefect-specific files) - Remove Docker validation from setup.py (no longer needed) - Remove ResourceLimits and VolumeMount models - Remove target_path and volume_mode from WorkflowSubmission - Remove supported_volume_modes from API and discovery - Clean up metadata.yaml files (remove volume/path fields) - Simplify parameter filtering in manager.py SDK: - Remove volume_mode parameter from client methods - Remove ResourceLimits and VolumeMount models - Remove Prefect error patterns from docker_logs.py - Clean up WorkflowSubmission and WorkflowMetadata models CLI: - Remove Volume Modes display from workflow info All removed features are Prefect-specific or Docker volume mounting artifacts. Temporal workflows use MinIO storage exclusively. * feat: Add comprehensive test suite and benchmark infrastructure - Add 68 unit tests for fuzzer, scanner, and analyzer modules - Implement pytest-based test infrastructure with fixtures - Add 6 performance benchmarks with category-specific thresholds - Configure GitHub Actions for automated testing and benchmarking - Add test and benchmark documentation Test coverage: - AtherisFuzzer: 8 tests - CargoFuzzer: 14 tests - FileScanner: 22 tests - SecurityAnalyzer: 24 tests All tests passing (68/68) All benchmarks passing (6/6) * fix: Resolve all ruff linting violations across codebase Fixed 27 ruff violations in 12 files: - Removed unused imports (Depends, Dict, Any, Optional, etc.) - Fixed undefined workflow_info variable in workflows.py - Removed dead code with undefined variables in atheris_fuzzer.py - Changed f-string to regular string where no placeholders used All files now pass ruff checks for CI/CD compliance. * fix: Configure CI for unit tests only - Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility - Commented out integration-tests job (no integration tests yet) - Updated test-summary to only depend on lint and unit-tests CI will now run successfully with 68 unit tests. Integration tests can be added later. * feat: Add CI/CD integration with ephemeral deployment model Implements comprehensive CI/CD support for FuzzForge with on-demand worker management: **Worker Management (v0.7.0)** - Add WorkerManager for automatic worker lifecycle control - Auto-start workers from stopped state when workflows execute - Auto-stop workers after workflow completion - Health checks and startup timeout handling (90s default) **CI/CD Features** - `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info) - `--export-sarif` flag: Export findings in SARIF 2.1.0 format - `--auto-start`/`--auto-stop` flags: Control worker lifecycle - Exit code propagation: Returns 1 on blocking findings, 0 on success **Exit Code Fix** - Add `except typer.Exit: raise` handlers at 3 critical locations - Move worker cleanup to finally block for guaranteed execution - Exit codes now propagate correctly even when build fails **CI Scripts & Examples** - ci-start.sh: Start FuzzForge services with health checks - ci-stop.sh: Clean shutdown with volume preservation option - GitHub Actions workflow example (security-scan.yml) - GitLab CI pipeline example (.gitlab-ci.example.yml) - docker-compose.ci.yml: CI-optimized compose file with profiles **OSS-Fuzz Integration** - New ossfuzz_campaign workflow for running OSS-Fuzz projects - OSS-Fuzz worker with Docker-in-Docker support - Configurable campaign duration and project selection **Documentation** - Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md) - Updated architecture docs with worker lifecycle details - Updated workspace isolation documentation - CLI README with worker management examples **SDK Enhancements** - Add get_workflow_worker_info() endpoint - Worker vertical metadata in workflow responses **Testing** - All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing - All monitoring commands tested: stats, crashes, status, finding - Full CI pipeline simulation verified - Exit codes verified for success/failure scenarios Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers. * fix: Resolve ruff linting violations in CI/CD code - Remove unused variables (run_id, defaults, result) - Remove unused imports - Fix f-string without placeholders All CI/CD integration files now pass ruff checks.
634 lines
22 KiB
Python
634 lines
22 KiB
Python
"""
|
|
API endpoints for workflow management with enhanced error handling
|
|
"""
|
|
|
|
# Copyright (c) 2025 FuzzingLabs
|
|
#
|
|
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
|
# at the root of this repository for details.
|
|
#
|
|
# After the Change Date (four years from publication), this version of the
|
|
# Licensed Work will be made available under the Apache License, Version 2.0.
|
|
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Additional attribution and requirements are provided in the NOTICE file.
|
|
|
|
import logging
|
|
import traceback
|
|
import tempfile
|
|
from typing import List, Dict, Any, Optional
|
|
from fastapi import APIRouter, HTTPException, Depends, UploadFile, File, Form
|
|
from pathlib import Path
|
|
|
|
from src.models.findings import (
|
|
WorkflowSubmission,
|
|
WorkflowMetadata,
|
|
WorkflowListItem,
|
|
RunSubmissionResponse
|
|
)
|
|
from src.temporal.discovery import WorkflowDiscovery
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Configuration for file uploads
|
|
MAX_UPLOAD_SIZE = 10 * 1024 * 1024 * 1024 # 10 GB
|
|
ALLOWED_CONTENT_TYPES = [
|
|
"application/gzip",
|
|
"application/x-gzip",
|
|
"application/x-tar",
|
|
"application/x-compressed-tar",
|
|
"application/octet-stream", # Generic binary
|
|
]
|
|
|
|
router = APIRouter(prefix="/workflows", tags=["workflows"])
|
|
|
|
|
|
def create_structured_error_response(
|
|
error_type: str,
|
|
message: str,
|
|
workflow_name: Optional[str] = None,
|
|
run_id: Optional[str] = None,
|
|
container_info: Optional[Dict[str, Any]] = None,
|
|
deployment_info: Optional[Dict[str, Any]] = None,
|
|
suggestions: Optional[List[str]] = None
|
|
) -> Dict[str, Any]:
|
|
"""Create a structured error response with rich context."""
|
|
error_response = {
|
|
"error": {
|
|
"type": error_type,
|
|
"message": message,
|
|
"timestamp": __import__("datetime").datetime.utcnow().isoformat() + "Z"
|
|
}
|
|
}
|
|
|
|
if workflow_name:
|
|
error_response["error"]["workflow_name"] = workflow_name
|
|
|
|
if run_id:
|
|
error_response["error"]["run_id"] = run_id
|
|
|
|
if container_info:
|
|
error_response["error"]["container"] = container_info
|
|
|
|
if deployment_info:
|
|
error_response["error"]["deployment"] = deployment_info
|
|
|
|
if suggestions:
|
|
error_response["error"]["suggestions"] = suggestions
|
|
|
|
return error_response
|
|
|
|
|
|
def get_temporal_manager():
|
|
"""Dependency to get the Temporal manager instance"""
|
|
from src.main import temporal_mgr
|
|
return temporal_mgr
|
|
|
|
|
|
@router.get("/", response_model=List[WorkflowListItem])
|
|
async def list_workflows(
|
|
temporal_mgr=Depends(get_temporal_manager)
|
|
) -> List[WorkflowListItem]:
|
|
"""
|
|
List all discovered workflows with their metadata.
|
|
|
|
Returns a summary of each workflow including name, version, description,
|
|
author, and tags.
|
|
"""
|
|
workflows = []
|
|
for name, info in temporal_mgr.workflows.items():
|
|
workflows.append(WorkflowListItem(
|
|
name=name,
|
|
version=info.metadata.get("version", "0.6.0"),
|
|
description=info.metadata.get("description", ""),
|
|
author=info.metadata.get("author"),
|
|
tags=info.metadata.get("tags", [])
|
|
))
|
|
|
|
return workflows
|
|
|
|
|
|
@router.get("/metadata/schema")
|
|
async def get_metadata_schema() -> Dict[str, Any]:
|
|
"""
|
|
Get the JSON schema for workflow metadata files.
|
|
|
|
This schema defines the structure and requirements for metadata.yaml files
|
|
that must accompany each workflow.
|
|
"""
|
|
return WorkflowDiscovery.get_metadata_schema()
|
|
|
|
|
|
@router.get("/{workflow_name}/metadata", response_model=WorkflowMetadata)
|
|
async def get_workflow_metadata(
|
|
workflow_name: str,
|
|
temporal_mgr=Depends(get_temporal_manager)
|
|
) -> WorkflowMetadata:
|
|
"""
|
|
Get complete metadata for a specific workflow.
|
|
|
|
Args:
|
|
workflow_name: Name of the workflow
|
|
|
|
Returns:
|
|
Complete metadata including parameters schema, supported volume modes,
|
|
required modules, and more.
|
|
|
|
Raises:
|
|
HTTPException: 404 if workflow not found
|
|
"""
|
|
if workflow_name not in temporal_mgr.workflows:
|
|
available_workflows = list(temporal_mgr.workflows.keys())
|
|
error_response = create_structured_error_response(
|
|
error_type="WorkflowNotFound",
|
|
message=f"Workflow '{workflow_name}' not found",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
f"Available workflows: {', '.join(available_workflows)}",
|
|
"Use GET /workflows/ to see all available workflows",
|
|
"Check workflow name spelling and case sensitivity"
|
|
]
|
|
)
|
|
raise HTTPException(
|
|
status_code=404,
|
|
detail=error_response
|
|
)
|
|
|
|
info = temporal_mgr.workflows[workflow_name]
|
|
metadata = info.metadata
|
|
|
|
return WorkflowMetadata(
|
|
name=workflow_name,
|
|
version=metadata.get("version", "0.6.0"),
|
|
description=metadata.get("description", ""),
|
|
author=metadata.get("author"),
|
|
tags=metadata.get("tags", []),
|
|
parameters=metadata.get("parameters", {}),
|
|
default_parameters=metadata.get("default_parameters", {}),
|
|
required_modules=metadata.get("required_modules", [])
|
|
)
|
|
|
|
|
|
@router.post("/{workflow_name}/submit", response_model=RunSubmissionResponse)
|
|
async def submit_workflow(
|
|
workflow_name: str,
|
|
submission: WorkflowSubmission,
|
|
temporal_mgr=Depends(get_temporal_manager)
|
|
) -> RunSubmissionResponse:
|
|
"""
|
|
Submit a workflow for execution.
|
|
|
|
Args:
|
|
workflow_name: Name of the workflow to execute
|
|
submission: Submission parameters including target path and parameters
|
|
|
|
Returns:
|
|
Run submission response with run_id and initial status
|
|
|
|
Raises:
|
|
HTTPException: 404 if workflow not found, 400 for invalid parameters
|
|
"""
|
|
if workflow_name not in temporal_mgr.workflows:
|
|
available_workflows = list(temporal_mgr.workflows.keys())
|
|
error_response = create_structured_error_response(
|
|
error_type="WorkflowNotFound",
|
|
message=f"Workflow '{workflow_name}' not found",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
f"Available workflows: {', '.join(available_workflows)}",
|
|
"Use GET /workflows/ to see all available workflows",
|
|
"Check workflow name spelling and case sensitivity"
|
|
]
|
|
)
|
|
raise HTTPException(
|
|
status_code=404,
|
|
detail=error_response
|
|
)
|
|
|
|
try:
|
|
# Upload target file to MinIO and get target_id
|
|
target_path = Path(submission.target_path)
|
|
if not target_path.exists():
|
|
raise ValueError(f"Target path does not exist: {submission.target_path}")
|
|
|
|
# Upload target (using anonymous user for now)
|
|
target_id = await temporal_mgr.upload_target(
|
|
file_path=target_path,
|
|
user_id="api-user",
|
|
metadata={"workflow": workflow_name}
|
|
)
|
|
|
|
# Merge default parameters with user parameters
|
|
workflow_info = temporal_mgr.workflows[workflow_name]
|
|
metadata = workflow_info.metadata or {}
|
|
defaults = metadata.get("default_parameters", {})
|
|
user_params = submission.parameters or {}
|
|
workflow_params = {**defaults, **user_params}
|
|
|
|
# Start workflow execution
|
|
handle = await temporal_mgr.run_workflow(
|
|
workflow_name=workflow_name,
|
|
target_id=target_id,
|
|
workflow_params=workflow_params
|
|
)
|
|
|
|
run_id = handle.id
|
|
|
|
# Initialize fuzzing tracking if this looks like a fuzzing workflow
|
|
workflow_info = temporal_mgr.workflows.get(workflow_name, {})
|
|
workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
|
|
if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
|
|
from src.api.fuzzing import initialize_fuzzing_tracking
|
|
initialize_fuzzing_tracking(run_id, workflow_name)
|
|
|
|
return RunSubmissionResponse(
|
|
run_id=run_id,
|
|
status="RUNNING",
|
|
workflow=workflow_name,
|
|
message=f"Workflow '{workflow_name}' submitted successfully"
|
|
)
|
|
|
|
except ValueError as e:
|
|
# Parameter validation errors
|
|
error_response = create_structured_error_response(
|
|
error_type="ValidationError",
|
|
message=str(e),
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
"Check parameter types and values",
|
|
"Use GET /workflows/{workflow_name}/parameters for schema",
|
|
"Ensure all required parameters are provided"
|
|
]
|
|
)
|
|
raise HTTPException(status_code=400, detail=error_response)
|
|
|
|
except Exception as e:
|
|
logger.error(f"Failed to submit workflow '{workflow_name}': {e}")
|
|
logger.error(f"Traceback: {traceback.format_exc()}")
|
|
|
|
# Try to get more context about the error
|
|
container_info = None
|
|
deployment_info = None
|
|
suggestions = []
|
|
|
|
error_message = str(e)
|
|
error_type = "WorkflowSubmissionError"
|
|
|
|
# Detect specific error patterns
|
|
if "workflow" in error_message.lower() and "not found" in error_message.lower():
|
|
error_type = "WorkflowError"
|
|
suggestions.extend([
|
|
"Check if Temporal server is running and accessible",
|
|
"Verify workflow workers are running",
|
|
"Check if workflow is registered with correct vertical",
|
|
"Ensure Docker is running and has sufficient resources"
|
|
])
|
|
|
|
elif "volume" in error_message.lower() or "mount" in error_message.lower():
|
|
error_type = "VolumeError"
|
|
suggestions.extend([
|
|
"Check if the target path exists and is accessible",
|
|
"Verify file permissions (Docker needs read access)",
|
|
"Ensure the path is not in use by another process",
|
|
"Try using an absolute path instead of relative path"
|
|
])
|
|
|
|
elif "memory" in error_message.lower() or "resource" in error_message.lower():
|
|
error_type = "ResourceError"
|
|
suggestions.extend([
|
|
"Check system memory and CPU availability",
|
|
"Consider reducing resource limits or dataset size",
|
|
"Monitor Docker resource usage",
|
|
"Increase Docker memory limits if needed"
|
|
])
|
|
|
|
elif "image" in error_message.lower():
|
|
error_type = "ImageError"
|
|
suggestions.extend([
|
|
"Check if the workflow image exists",
|
|
"Verify Docker registry access",
|
|
"Try rebuilding the workflow image",
|
|
"Check network connectivity to registries"
|
|
])
|
|
|
|
else:
|
|
suggestions.extend([
|
|
"Check FuzzForge backend logs for details",
|
|
"Verify all services are running (docker-compose up -d)",
|
|
"Try restarting the workflow deployment",
|
|
"Contact support if the issue persists"
|
|
])
|
|
|
|
error_response = create_structured_error_response(
|
|
error_type=error_type,
|
|
message=f"Failed to submit workflow: {error_message}",
|
|
workflow_name=workflow_name,
|
|
container_info=container_info,
|
|
deployment_info=deployment_info,
|
|
suggestions=suggestions
|
|
)
|
|
|
|
raise HTTPException(
|
|
status_code=500,
|
|
detail=error_response
|
|
)
|
|
|
|
|
|
@router.post("/{workflow_name}/upload-and-submit", response_model=RunSubmissionResponse)
|
|
async def upload_and_submit_workflow(
|
|
workflow_name: str,
|
|
file: UploadFile = File(..., description="Target file or tarball to analyze"),
|
|
parameters: Optional[str] = Form(None, description="JSON-encoded workflow parameters"),
|
|
timeout: Optional[int] = Form(None, description="Timeout in seconds"),
|
|
temporal_mgr=Depends(get_temporal_manager)
|
|
) -> RunSubmissionResponse:
|
|
"""
|
|
Upload a target file/tarball and submit workflow for execution.
|
|
|
|
This endpoint accepts multipart/form-data uploads and is the recommended
|
|
way to submit workflows from remote CLI clients.
|
|
|
|
Args:
|
|
workflow_name: Name of the workflow to execute
|
|
file: Target file or tarball (compressed directory)
|
|
parameters: JSON string of workflow parameters (optional)
|
|
timeout: Execution timeout in seconds (optional)
|
|
|
|
Returns:
|
|
Run submission response with run_id and initial status
|
|
|
|
Raises:
|
|
HTTPException: 404 if workflow not found, 400 for invalid parameters,
|
|
413 if file too large
|
|
"""
|
|
if workflow_name not in temporal_mgr.workflows:
|
|
available_workflows = list(temporal_mgr.workflows.keys())
|
|
error_response = create_structured_error_response(
|
|
error_type="WorkflowNotFound",
|
|
message=f"Workflow '{workflow_name}' not found",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
f"Available workflows: {', '.join(available_workflows)}",
|
|
"Use GET /workflows/ to see all available workflows"
|
|
]
|
|
)
|
|
raise HTTPException(status_code=404, detail=error_response)
|
|
|
|
temp_file_path = None
|
|
|
|
try:
|
|
# Validate file size
|
|
file_size = 0
|
|
chunk_size = 1024 * 1024 # 1MB chunks
|
|
|
|
# Create temporary file
|
|
temp_fd, temp_file_path = tempfile.mkstemp(suffix=".tar.gz")
|
|
|
|
logger.info(f"Receiving file upload for workflow '{workflow_name}': {file.filename}")
|
|
|
|
# Stream file to disk
|
|
with open(temp_fd, 'wb') as temp_file:
|
|
while True:
|
|
chunk = await file.read(chunk_size)
|
|
if not chunk:
|
|
break
|
|
|
|
file_size += len(chunk)
|
|
|
|
# Check size limit
|
|
if file_size > MAX_UPLOAD_SIZE:
|
|
raise HTTPException(
|
|
status_code=413,
|
|
detail=create_structured_error_response(
|
|
error_type="FileTooLarge",
|
|
message=f"File size exceeds maximum allowed size of {MAX_UPLOAD_SIZE / (1024**3):.1f} GB",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
"Reduce the size of your target directory",
|
|
"Exclude unnecessary files (build artifacts, dependencies, etc.)",
|
|
"Consider splitting into smaller analysis targets"
|
|
]
|
|
)
|
|
)
|
|
|
|
temp_file.write(chunk)
|
|
|
|
logger.info(f"Received file: {file_size / (1024**2):.2f} MB")
|
|
|
|
# Parse parameters
|
|
workflow_params = {}
|
|
if parameters:
|
|
try:
|
|
import json
|
|
workflow_params = json.loads(parameters)
|
|
if not isinstance(workflow_params, dict):
|
|
raise ValueError("Parameters must be a JSON object")
|
|
except (json.JSONDecodeError, ValueError) as e:
|
|
raise HTTPException(
|
|
status_code=400,
|
|
detail=create_structured_error_response(
|
|
error_type="InvalidParameters",
|
|
message=f"Invalid parameters JSON: {e}",
|
|
workflow_name=workflow_name,
|
|
suggestions=["Ensure parameters is valid JSON object"]
|
|
)
|
|
)
|
|
|
|
# Upload to MinIO
|
|
target_id = await temporal_mgr.upload_target(
|
|
file_path=Path(temp_file_path),
|
|
user_id="api-user",
|
|
metadata={
|
|
"workflow": workflow_name,
|
|
"original_filename": file.filename,
|
|
"upload_method": "multipart"
|
|
}
|
|
)
|
|
|
|
logger.info(f"Uploaded to MinIO with target_id: {target_id}")
|
|
|
|
# Merge default parameters with user parameters
|
|
workflow_info = temporal_mgr.workflows.get(workflow_name)
|
|
metadata = workflow_info.metadata or {}
|
|
defaults = metadata.get("default_parameters", {})
|
|
workflow_params = {**defaults, **workflow_params}
|
|
|
|
# Start workflow execution
|
|
handle = await temporal_mgr.run_workflow(
|
|
workflow_name=workflow_name,
|
|
target_id=target_id,
|
|
workflow_params=workflow_params
|
|
)
|
|
|
|
run_id = handle.id
|
|
|
|
# Initialize fuzzing tracking if needed
|
|
workflow_info = temporal_mgr.workflows.get(workflow_name, {})
|
|
workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
|
|
if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
|
|
from src.api.fuzzing import initialize_fuzzing_tracking
|
|
initialize_fuzzing_tracking(run_id, workflow_name)
|
|
|
|
return RunSubmissionResponse(
|
|
run_id=run_id,
|
|
status="RUNNING",
|
|
workflow=workflow_name,
|
|
message=f"Workflow '{workflow_name}' submitted successfully with uploaded target"
|
|
)
|
|
|
|
except HTTPException:
|
|
raise
|
|
except Exception as e:
|
|
logger.error(f"Failed to upload and submit workflow '{workflow_name}': {e}")
|
|
logger.error(f"Traceback: {traceback.format_exc()}")
|
|
|
|
error_response = create_structured_error_response(
|
|
error_type="WorkflowSubmissionError",
|
|
message=f"Failed to process upload and submit workflow: {str(e)}",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
"Check if the uploaded file is a valid tarball",
|
|
"Verify MinIO storage is accessible",
|
|
"Check backend logs for detailed error information",
|
|
"Ensure Temporal workers are running"
|
|
]
|
|
)
|
|
|
|
raise HTTPException(status_code=500, detail=error_response)
|
|
|
|
finally:
|
|
# Cleanup temporary file
|
|
if temp_file_path and Path(temp_file_path).exists():
|
|
try:
|
|
Path(temp_file_path).unlink()
|
|
logger.debug(f"Cleaned up temp file: {temp_file_path}")
|
|
except Exception as e:
|
|
logger.warning(f"Failed to cleanup temp file {temp_file_path}: {e}")
|
|
|
|
|
|
@router.get("/{workflow_name}/worker-info")
|
|
async def get_workflow_worker_info(
|
|
workflow_name: str,
|
|
temporal_mgr=Depends(get_temporal_manager)
|
|
) -> Dict[str, Any]:
|
|
"""
|
|
Get worker information for a workflow.
|
|
|
|
Returns details about which worker is required to execute this workflow,
|
|
including container name, task queue, and vertical.
|
|
|
|
Args:
|
|
workflow_name: Name of the workflow
|
|
|
|
Returns:
|
|
Worker information including vertical, container name, and task queue
|
|
|
|
Raises:
|
|
HTTPException: 404 if workflow not found
|
|
"""
|
|
if workflow_name not in temporal_mgr.workflows:
|
|
available_workflows = list(temporal_mgr.workflows.keys())
|
|
error_response = create_structured_error_response(
|
|
error_type="WorkflowNotFound",
|
|
message=f"Workflow '{workflow_name}' not found",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
f"Available workflows: {', '.join(available_workflows)}",
|
|
"Use GET /workflows/ to see all available workflows"
|
|
]
|
|
)
|
|
raise HTTPException(
|
|
status_code=404,
|
|
detail=error_response
|
|
)
|
|
|
|
info = temporal_mgr.workflows[workflow_name]
|
|
metadata = info.metadata
|
|
|
|
# Extract vertical from metadata
|
|
vertical = metadata.get("vertical")
|
|
|
|
if not vertical:
|
|
error_response = create_structured_error_response(
|
|
error_type="MissingVertical",
|
|
message=f"Workflow '{workflow_name}' does not specify a vertical in metadata",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
"Check workflow metadata.yaml for 'vertical' field",
|
|
"Contact workflow author for support"
|
|
]
|
|
)
|
|
raise HTTPException(
|
|
status_code=500,
|
|
detail=error_response
|
|
)
|
|
|
|
return {
|
|
"workflow": workflow_name,
|
|
"vertical": vertical,
|
|
"worker_container": f"fuzzforge-worker-{vertical}",
|
|
"task_queue": f"{vertical}-queue",
|
|
"required": True
|
|
}
|
|
|
|
|
|
@router.get("/{workflow_name}/parameters")
|
|
async def get_workflow_parameters(
|
|
workflow_name: str,
|
|
temporal_mgr=Depends(get_temporal_manager)
|
|
) -> Dict[str, Any]:
|
|
"""
|
|
Get the parameters schema for a workflow.
|
|
|
|
Args:
|
|
workflow_name: Name of the workflow
|
|
|
|
Returns:
|
|
Parameters schema with types, descriptions, and defaults
|
|
|
|
Raises:
|
|
HTTPException: 404 if workflow not found
|
|
"""
|
|
if workflow_name not in temporal_mgr.workflows:
|
|
available_workflows = list(temporal_mgr.workflows.keys())
|
|
error_response = create_structured_error_response(
|
|
error_type="WorkflowNotFound",
|
|
message=f"Workflow '{workflow_name}' not found",
|
|
workflow_name=workflow_name,
|
|
suggestions=[
|
|
f"Available workflows: {', '.join(available_workflows)}",
|
|
"Use GET /workflows/ to see all available workflows"
|
|
]
|
|
)
|
|
raise HTTPException(
|
|
status_code=404,
|
|
detail=error_response
|
|
)
|
|
|
|
info = temporal_mgr.workflows[workflow_name]
|
|
metadata = info.metadata
|
|
|
|
# Return parameters with enhanced schema information
|
|
parameters_schema = metadata.get("parameters", {})
|
|
|
|
# Extract the actual parameter definitions from JSON schema structure
|
|
if "properties" in parameters_schema:
|
|
param_definitions = parameters_schema["properties"]
|
|
else:
|
|
param_definitions = parameters_schema
|
|
|
|
# Add default values to the schema
|
|
default_params = metadata.get("default_parameters", {})
|
|
for param_name, param_schema in param_definitions.items():
|
|
if isinstance(param_schema, dict) and param_name in default_params:
|
|
param_schema["default"] = default_params[param_name]
|
|
|
|
return {
|
|
"workflow": workflow_name,
|
|
"parameters": param_definitions,
|
|
"default_parameters": default_params,
|
|
"required_parameters": [
|
|
name for name, schema in param_definitions.items()
|
|
if isinstance(schema, dict) and schema.get("required", False)
|
|
]
|
|
} |