mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-02-12 22:32:45 +00:00
CI/CD Integration with Ephemeral Deployment Model (#14)
* feat: Complete migration from Prefect to Temporal BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal ## Major Changes - Replace Prefect with Temporal for workflow orchestration - Implement vertical worker architecture (rust, android) - Replace Docker registry with MinIO for unified storage - Refactor activities to be co-located with workflows - Update all API endpoints for Temporal compatibility ## Infrastructure - New: docker-compose.temporal.yaml (Temporal + MinIO + workers) - New: workers/ directory with rust and android vertical workers - New: backend/src/temporal/ (manager, discovery) - New: backend/src/storage/ (S3-cached storage with MinIO) - New: backend/toolbox/common/ (shared storage activities) - Deleted: docker-compose.yaml (old Prefect setup) - Deleted: backend/src/core/prefect_manager.py - Deleted: backend/src/services/prefect_stats_monitor.py - Deleted: Docker registry and insecure-registries requirement ## Workflows - Migrated: security_assessment workflow to Temporal - New: rust_test workflow (example/test workflow) - Deleted: secret_detection_scan (Prefect-based, to be reimplemented) - Activities now co-located with workflows for independent testing ## API Changes - Updated: backend/src/api/workflows.py (Temporal submission) - Updated: backend/src/api/runs.py (Temporal status/results) - Updated: backend/src/main.py (727 lines, TemporalManager integration) - Updated: All 16 MCP tools to use TemporalManager ## Testing - ✅ All services healthy (Temporal, PostgreSQL, MinIO, workers, backend) - ✅ All API endpoints functional - ✅ End-to-end workflow test passed (72 findings from vulnerable_app) - ✅ MinIO storage integration working (target upload/download, results) - ✅ Worker activity discovery working (6 activities registered) - ✅ Tarball extraction working - ✅ SARIF report generation working ## Documentation - ARCHITECTURE.md: Complete Temporal architecture documentation - QUICKSTART_TEMPORAL.md: Getting started guide - MIGRATION_DECISION.md: Why we chose Temporal over Prefect - IMPLEMENTATION_STATUS.md: Migration progress tracking - workers/README.md: Worker development guide ## Dependencies - Added: temporalio>=1.6.0 - Added: boto3>=1.34.0 (MinIO S3 client) - Removed: prefect>=3.4.18 * feat: Add Python fuzzing vertical with Atheris integration This commit implements a complete Python fuzzing workflow using Atheris: ## Python Worker (workers/python/) - Dockerfile with Python 3.11, Atheris, and build tools - Generic worker.py for dynamic workflow discovery - requirements.txt with temporalio, boto3, atheris dependencies - Added to docker-compose.temporal.yaml with dedicated cache volume ## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/) - Reusable module extending BaseModule - Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py) - Recursive search to find targets in nested directories - Dynamically loads TestOneInput() function - Configurable max_iterations and timeout - Real-time stats callback support for live monitoring - Returns findings as ModuleFinding objects ## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/) - Temporal workflow for orchestrating fuzzing - Downloads user code from MinIO - Executes AtherisFuzzer module - Uploads results to MinIO - Cleans up cache after execution - metadata.yaml with vertical: python for routing ## Test Project (test_projects/python_fuzz_waterfall/) - Demonstrates stateful waterfall vulnerability - main.py with check_secret() that leaks progress - fuzz_target.py with Atheris TestOneInput() harness - Complete README with usage instructions ## Backend Fixes - Fixed parameter merging in REST API endpoints (workflows.py) - Changed workflow parameter passing from positional args to kwargs (manager.py) - Default parameters now properly merged with user parameters ## Testing ✅ Worker discovered AtherisFuzzingWorkflow ✅ Workflow executed end-to-end successfully ✅ Fuzz target auto-discovered in nested directories ✅ Atheris ran 100,000 iterations ✅ Results uploaded and cache cleaned * chore: Complete Temporal migration with updated CLI/SDK/docs This commit includes all remaining Temporal migration changes: ## CLI Updates (cli/) - Updated workflow execution commands for Temporal - Enhanced error handling and exceptions - Updated dependencies in uv.lock ## SDK Updates (sdk/) - Client methods updated for Temporal workflows - Updated models for new workflow execution - Updated dependencies in uv.lock ## Documentation Updates (docs/) - Architecture documentation for Temporal - Workflow concept documentation - Resource management documentation (new) - Debugging guide (new) - Updated tutorials and how-to guides - Troubleshooting updates ## README Updates - Main README with Temporal instructions - Backend README - CLI README - SDK README ## Other - Updated IMPLEMENTATION_STATUS.md - Removed old vulnerable_app.tar.gz These changes complete the Temporal migration and ensure the CLI/SDK work correctly with the new backend. * fix: Use positional args instead of kwargs for Temporal workflows The Temporal Python SDK's start_workflow() method doesn't accept a 'kwargs' parameter. Workflows must receive parameters as positional arguments via the 'args' parameter. Changed from: args=workflow_args # Positional arguments This fixes the error: TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs' Workflows now correctly receive parameters in order: - security_assessment: [target_id, scanner_config, analyzer_config, reporter_config] - atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds] - rust_test: [target_id, test_message] * fix: Filter metadata-only parameters from workflow arguments SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5. The issue was that target_path and volume_mode from default_parameters were being passed to the workflow, when they should only be used by the system for configuration. Now filters out metadata-only parameters (target_path, volume_mode) before passing arguments to workflow execution. * refactor: Remove Prefect leftovers and volume mounting legacy Complete cleanup of Prefect migration artifacts: Backend: - Delete registry.py and workflow_discovery.py (Prefect-specific files) - Remove Docker validation from setup.py (no longer needed) - Remove ResourceLimits and VolumeMount models - Remove target_path and volume_mode from WorkflowSubmission - Remove supported_volume_modes from API and discovery - Clean up metadata.yaml files (remove volume/path fields) - Simplify parameter filtering in manager.py SDK: - Remove volume_mode parameter from client methods - Remove ResourceLimits and VolumeMount models - Remove Prefect error patterns from docker_logs.py - Clean up WorkflowSubmission and WorkflowMetadata models CLI: - Remove Volume Modes display from workflow info All removed features are Prefect-specific or Docker volume mounting artifacts. Temporal workflows use MinIO storage exclusively. * feat: Add comprehensive test suite and benchmark infrastructure - Add 68 unit tests for fuzzer, scanner, and analyzer modules - Implement pytest-based test infrastructure with fixtures - Add 6 performance benchmarks with category-specific thresholds - Configure GitHub Actions for automated testing and benchmarking - Add test and benchmark documentation Test coverage: - AtherisFuzzer: 8 tests - CargoFuzzer: 14 tests - FileScanner: 22 tests - SecurityAnalyzer: 24 tests All tests passing (68/68) All benchmarks passing (6/6) * fix: Resolve all ruff linting violations across codebase Fixed 27 ruff violations in 12 files: - Removed unused imports (Depends, Dict, Any, Optional, etc.) - Fixed undefined workflow_info variable in workflows.py - Removed dead code with undefined variables in atheris_fuzzer.py - Changed f-string to regular string where no placeholders used All files now pass ruff checks for CI/CD compliance. * fix: Configure CI for unit tests only - Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility - Commented out integration-tests job (no integration tests yet) - Updated test-summary to only depend on lint and unit-tests CI will now run successfully with 68 unit tests. Integration tests can be added later. * feat: Add CI/CD integration with ephemeral deployment model Implements comprehensive CI/CD support for FuzzForge with on-demand worker management: **Worker Management (v0.7.0)** - Add WorkerManager for automatic worker lifecycle control - Auto-start workers from stopped state when workflows execute - Auto-stop workers after workflow completion - Health checks and startup timeout handling (90s default) **CI/CD Features** - `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info) - `--export-sarif` flag: Export findings in SARIF 2.1.0 format - `--auto-start`/`--auto-stop` flags: Control worker lifecycle - Exit code propagation: Returns 1 on blocking findings, 0 on success **Exit Code Fix** - Add `except typer.Exit: raise` handlers at 3 critical locations - Move worker cleanup to finally block for guaranteed execution - Exit codes now propagate correctly even when build fails **CI Scripts & Examples** - ci-start.sh: Start FuzzForge services with health checks - ci-stop.sh: Clean shutdown with volume preservation option - GitHub Actions workflow example (security-scan.yml) - GitLab CI pipeline example (.gitlab-ci.example.yml) - docker-compose.ci.yml: CI-optimized compose file with profiles **OSS-Fuzz Integration** - New ossfuzz_campaign workflow for running OSS-Fuzz projects - OSS-Fuzz worker with Docker-in-Docker support - Configurable campaign duration and project selection **Documentation** - Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md) - Updated architecture docs with worker lifecycle details - Updated workspace isolation documentation - CLI README with worker management examples **SDK Enhancements** - Add get_workflow_worker_info() endpoint - Worker vertical metadata in workflow responses **Testing** - All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing - All monitoring commands tested: stats, crashes, status, finding - Full CI pipeline simulation verified - Exit codes verified for success/failure scenarios Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers. * fix: Resolve ruff linting violations in CI/CD code - Remove unused variables (run_id, defaults, result) - Remove unused imports - Fix f-string without placeholders All CI/CD integration files now pass ruff checks.
This commit is contained in:
353
workers/README.md
Normal file
353
workers/README.md
Normal file
@@ -0,0 +1,353 @@
|
||||
# FuzzForge Vertical Workers
|
||||
|
||||
This directory contains vertical-specific worker implementations for the Temporal architecture.
|
||||
|
||||
## Architecture
|
||||
|
||||
Each vertical worker is a long-lived container pre-built with domain-specific security toolchains:
|
||||
|
||||
```
|
||||
workers/
|
||||
├── rust/ # Rust/Native security (AFL++, cargo-fuzz, gdb, valgrind)
|
||||
├── android/ # Android security (apktool, Frida, jadx, MobSF)
|
||||
├── web/ # Web security (OWASP ZAP, semgrep, eslint)
|
||||
├── ios/ # iOS security (class-dump, Clutch, Frida)
|
||||
├── blockchain/ # Smart contract security (mythril, slither, echidna)
|
||||
└── go/ # Go security (go-fuzz, staticcheck, gosec)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Worker Startup**: Worker discovers workflows from `/app/toolbox/workflows`
|
||||
2. **Filtering**: Only loads workflows where `metadata.yaml` has `vertical: <name>`
|
||||
3. **Dynamic Import**: Dynamically imports workflow Python modules
|
||||
4. **Registration**: Registers discovered workflows with Temporal
|
||||
5. **Processing**: Polls Temporal task queue for work
|
||||
|
||||
## Adding a New Vertical
|
||||
|
||||
### Step 1: Create Worker Directory
|
||||
|
||||
```bash
|
||||
mkdir -p workers/my_vertical
|
||||
cd workers/my_vertical
|
||||
```
|
||||
|
||||
### Step 2: Create Dockerfile
|
||||
|
||||
```dockerfile
|
||||
# workers/my_vertical/Dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
# Install your vertical-specific tools
|
||||
RUN apt-get update && apt-get install -y \
|
||||
tool1 \
|
||||
tool2 \
|
||||
tool3 \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Python dependencies
|
||||
COPY requirements.txt /tmp/
|
||||
RUN pip install --no-cache-dir -r /tmp/requirements.txt
|
||||
|
||||
# Copy worker files
|
||||
COPY worker.py /app/worker.py
|
||||
COPY activities.py /app/activities.py
|
||||
|
||||
WORKDIR /app
|
||||
ENV PYTHONPATH="/app:/app/toolbox:${PYTHONPATH}"
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
CMD ["python", "worker.py"]
|
||||
```
|
||||
|
||||
### Step 3: Copy Worker Files
|
||||
|
||||
```bash
|
||||
# Copy from rust worker as template
|
||||
cp workers/rust/worker.py workers/my_vertical/
|
||||
cp workers/rust/activities.py workers/my_vertical/
|
||||
cp workers/rust/requirements.txt workers/my_vertical/
|
||||
```
|
||||
|
||||
**Note**: The worker.py and activities.py are generic and work for all verticals. You only need to customize the Dockerfile with your tools.
|
||||
|
||||
### Step 4: Add to docker-compose.yml
|
||||
|
||||
Add profiles to prevent auto-start:
|
||||
|
||||
```yaml
|
||||
worker-my-vertical:
|
||||
build:
|
||||
context: ./workers/my_vertical
|
||||
dockerfile: Dockerfile
|
||||
container_name: fuzzforge-worker-my-vertical
|
||||
profiles: # ← Prevents auto-start (saves RAM)
|
||||
- workers
|
||||
- my_vertical
|
||||
depends_on:
|
||||
temporal:
|
||||
condition: service_healthy
|
||||
minio:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
TEMPORAL_ADDRESS: temporal:7233
|
||||
WORKER_VERTICAL: my_vertical # ← Important: matches metadata.yaml
|
||||
WORKER_TASK_QUEUE: my-vertical-queue
|
||||
MAX_CONCURRENT_ACTIVITIES: 5
|
||||
# MinIO configuration (same for all workers)
|
||||
STORAGE_BACKEND: s3
|
||||
S3_ENDPOINT: http://minio:9000
|
||||
S3_ACCESS_KEY: fuzzforge
|
||||
S3_SECRET_KEY: fuzzforge123
|
||||
S3_BUCKET: targets
|
||||
CACHE_DIR: /cache
|
||||
volumes:
|
||||
- ./backend/toolbox:/app/toolbox:ro
|
||||
- worker_my_vertical_cache:/cache
|
||||
networks:
|
||||
- fuzzforge-network
|
||||
restart: "no" # ← Don't auto-restart
|
||||
```
|
||||
|
||||
**Why profiles?** Workers are pre-built but don't auto-start, saving ~1-2GB RAM per worker when idle.
|
||||
|
||||
### Step 5: Add Volume
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
worker_my_vertical_cache:
|
||||
name: fuzzforge_worker_my_vertical_cache
|
||||
```
|
||||
|
||||
### Step 6: Create Workflows for Your Vertical
|
||||
|
||||
```bash
|
||||
mkdir -p backend/toolbox/workflows/my_workflow
|
||||
```
|
||||
|
||||
**metadata.yaml:**
|
||||
```yaml
|
||||
name: my_workflow
|
||||
version: 1.0.0
|
||||
vertical: my_vertical # ← Must match WORKER_VERTICAL
|
||||
```
|
||||
|
||||
**workflow.py:**
|
||||
```python
|
||||
from temporalio import workflow
|
||||
from datetime import timedelta
|
||||
|
||||
@workflow.defn
|
||||
class MyWorkflow:
|
||||
@workflow.run
|
||||
async def run(self, target_id: str) -> dict:
|
||||
# Download target
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
target_id,
|
||||
start_to_close_timeout=timedelta(minutes=5)
|
||||
)
|
||||
|
||||
# Your analysis logic here
|
||||
results = {"status": "success"}
|
||||
|
||||
# Cleanup
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
target_path,
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
### Step 7: Test
|
||||
|
||||
```bash
|
||||
# Start services
|
||||
docker-compose -f docker-compose.temporal.yaml up -d
|
||||
|
||||
# Check worker logs
|
||||
docker logs -f fuzzforge-worker-my-vertical
|
||||
|
||||
# You should see:
|
||||
# "Discovered workflow: MyWorkflow from my_workflow (vertical: my_vertical)"
|
||||
```
|
||||
|
||||
## Worker Components
|
||||
|
||||
### worker.py
|
||||
|
||||
Generic worker entrypoint. Handles:
|
||||
- Workflow discovery from mounted `/app/toolbox`
|
||||
- Dynamic import of workflow modules
|
||||
- Connection to Temporal
|
||||
- Task queue polling
|
||||
|
||||
**No customization needed** - works for all verticals.
|
||||
|
||||
### activities.py
|
||||
|
||||
Common activities available to all workflows:
|
||||
|
||||
- `get_target(target_id: str) -> str`: Download target from MinIO
|
||||
- `cleanup_cache(target_path: str) -> None`: Remove cached target
|
||||
- `upload_results(workflow_id, results, format) -> str`: Upload results to MinIO
|
||||
|
||||
**Can be extended** with vertical-specific activities:
|
||||
|
||||
```python
|
||||
# workers/my_vertical/activities.py
|
||||
|
||||
from temporalio import activity
|
||||
|
||||
@activity.defn(name="my_custom_activity")
|
||||
async def my_custom_activity(input_data: str) -> str:
|
||||
# Your vertical-specific logic
|
||||
return "result"
|
||||
|
||||
# Add to worker.py activities list:
|
||||
# activities=[..., my_custom_activity]
|
||||
```
|
||||
|
||||
### Dockerfile
|
||||
|
||||
**Only component that needs customization** for each vertical. Install your tools here.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
All workers support these environment variables:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `TEMPORAL_ADDRESS` | `localhost:7233` | Temporal server address |
|
||||
| `TEMPORAL_NAMESPACE` | `default` | Temporal namespace |
|
||||
| `WORKER_VERTICAL` | `rust` | Vertical name (must match metadata.yaml) |
|
||||
| `WORKER_TASK_QUEUE` | `{vertical}-queue` | Task queue name |
|
||||
| `MAX_CONCURRENT_ACTIVITIES` | `5` | Max concurrent activities per worker |
|
||||
| `S3_ENDPOINT` | `http://minio:9000` | MinIO/S3 endpoint |
|
||||
| `S3_ACCESS_KEY` | `fuzzforge` | S3 access key |
|
||||
| `S3_SECRET_KEY` | `fuzzforge123` | S3 secret key |
|
||||
| `S3_BUCKET` | `targets` | Bucket for uploaded targets |
|
||||
| `CACHE_DIR` | `/cache` | Local cache directory |
|
||||
| `CACHE_MAX_SIZE` | `10GB` | Max cache size (not enforced yet) |
|
||||
| `LOG_LEVEL` | `INFO` | Logging level |
|
||||
|
||||
## Scaling
|
||||
|
||||
### Vertical Scaling (More Work Per Worker)
|
||||
|
||||
Increase concurrent activities:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
MAX_CONCURRENT_ACTIVITIES: 10 # Handle 10 tasks at once
|
||||
```
|
||||
|
||||
### Horizontal Scaling (More Workers)
|
||||
|
||||
```bash
|
||||
# Scale to 3 workers for rust vertical
|
||||
docker-compose -f docker-compose.temporal.yaml up -d --scale worker-rust=3
|
||||
|
||||
# Each worker polls the same task queue
|
||||
# Temporal automatically load balances
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Worker Not Discovering Workflows
|
||||
|
||||
Check:
|
||||
1. Volume mount is correct: `./backend/toolbox:/app/toolbox:ro`
|
||||
2. Workflow has `metadata.yaml` with correct `vertical:` field
|
||||
3. Workflow has `workflow.py` with `@workflow.defn` decorated class
|
||||
4. Worker logs show discovery attempt
|
||||
|
||||
### Cannot Connect to Temporal
|
||||
|
||||
Check:
|
||||
1. Temporal container is healthy: `docker ps`
|
||||
2. Network connectivity: `docker exec worker-rust ping temporal`
|
||||
3. `TEMPORAL_ADDRESS` environment variable is correct
|
||||
|
||||
### Cannot Download from MinIO
|
||||
|
||||
Check:
|
||||
1. MinIO is healthy: `docker ps`
|
||||
2. Buckets exist: `docker exec fuzzforge-minio mc ls fuzzforge/targets`
|
||||
3. S3 credentials are correct
|
||||
4. Target was uploaded: Check MinIO console at http://localhost:9001
|
||||
|
||||
### Activity Timeouts
|
||||
|
||||
Increase timeout in workflow:
|
||||
|
||||
```python
|
||||
await workflow.execute_activity(
|
||||
"my_activity",
|
||||
args,
|
||||
start_to_close_timeout=timedelta(hours=2) # Increase from default
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Keep Dockerfiles lean**: Only install necessary tools
|
||||
2. **Use multi-stage builds**: Reduce final image size
|
||||
3. **Pin tool versions**: Ensure reproducibility
|
||||
4. **Log liberally**: Helps debugging workflow issues
|
||||
5. **Handle errors gracefully**: Don't fail workflow for non-critical issues
|
||||
6. **Test locally first**: Use docker-compose before deploying
|
||||
|
||||
## On-Demand Worker Management
|
||||
|
||||
Workers use Docker Compose profiles and CLI-managed lifecycle for resource optimization.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Build Time**: `docker-compose build` creates all worker images
|
||||
2. **Startup**: Workers DON'T auto-start with `docker-compose up -d`
|
||||
3. **On Demand**: CLI starts workers automatically when workflows need them
|
||||
4. **Shutdown**: Optional auto-stop after workflow completion
|
||||
|
||||
### Manual Control
|
||||
|
||||
```bash
|
||||
# Start specific worker
|
||||
docker start fuzzforge-worker-ossfuzz
|
||||
|
||||
# Stop specific worker
|
||||
docker stop fuzzforge-worker-ossfuzz
|
||||
|
||||
# Check worker status
|
||||
docker ps --filter "name=fuzzforge-worker"
|
||||
```
|
||||
|
||||
### CLI Auto-Management
|
||||
|
||||
```bash
|
||||
# Auto-start enabled by default
|
||||
ff workflow run ossfuzz_campaign . project_name=zlib
|
||||
|
||||
# Disable auto-start
|
||||
ff workflow run ossfuzz_campaign . project_name=zlib --no-auto-start
|
||||
|
||||
# Auto-stop after completion
|
||||
ff workflow run ossfuzz_campaign . project_name=zlib --wait --auto-stop
|
||||
```
|
||||
|
||||
### Resource Savings
|
||||
|
||||
- **Before**: All workers running = ~8GB RAM
|
||||
- **After**: Only core services running = ~1.2GB RAM
|
||||
- **Savings**: ~6-7GB RAM when idle
|
||||
|
||||
## Examples
|
||||
|
||||
See existing verticals for examples:
|
||||
- `workers/rust/` - Complete working example
|
||||
- `backend/toolbox/workflows/rust_test/` - Simple test workflow
|
||||
94
workers/android/Dockerfile
Normal file
94
workers/android/Dockerfile
Normal file
@@ -0,0 +1,94 @@
|
||||
# FuzzForge Vertical Worker: Android Security
|
||||
#
|
||||
# Pre-installed tools for Android security analysis:
|
||||
# - Android SDK (adb, aapt)
|
||||
# - apktool (APK decompilation)
|
||||
# - jadx (Dex to Java decompiler)
|
||||
# - Frida (dynamic instrumentation)
|
||||
# - androguard (Python APK analysis)
|
||||
# - MobSF dependencies
|
||||
|
||||
FROM python:3.11-slim-bookworm
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
# Build essentials
|
||||
build-essential \
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
unzip \
|
||||
# Java (required for Android tools)
|
||||
openjdk-17-jdk \
|
||||
# Android tools dependencies
|
||||
lib32stdc++6 \
|
||||
lib32z1 \
|
||||
# Frida dependencies
|
||||
libc6-dev \
|
||||
# XML/Binary analysis
|
||||
libxml2-dev \
|
||||
libxslt-dev \
|
||||
# Network tools
|
||||
netcat-openbsd \
|
||||
tcpdump \
|
||||
# Cleanup
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Android SDK Command Line Tools
|
||||
ENV ANDROID_HOME=/opt/android-sdk
|
||||
ENV PATH="${ANDROID_HOME}/cmdline-tools/latest/bin:${ANDROID_HOME}/platform-tools:${PATH}"
|
||||
|
||||
RUN mkdir -p ${ANDROID_HOME}/cmdline-tools && \
|
||||
cd ${ANDROID_HOME}/cmdline-tools && \
|
||||
wget -q https://dl.google.com/android/repository/commandlinetools-linux-9477386_latest.zip && \
|
||||
unzip -q commandlinetools-linux-9477386_latest.zip && \
|
||||
mv cmdline-tools latest && \
|
||||
rm commandlinetools-linux-9477386_latest.zip && \
|
||||
# Accept licenses
|
||||
yes | ${ANDROID_HOME}/cmdline-tools/latest/bin/sdkmanager --licenses && \
|
||||
# Install platform tools (adb, fastboot)
|
||||
${ANDROID_HOME}/cmdline-tools/latest/bin/sdkmanager "platform-tools" "build-tools;33.0.0"
|
||||
|
||||
# Install apktool
|
||||
RUN wget -q https://raw.githubusercontent.com/iBotPeaches/Apktool/master/scripts/linux/apktool -O /usr/local/bin/apktool && \
|
||||
wget -q https://bitbucket.org/iBotPeaches/apktool/downloads/apktool_2.9.3.jar -O /usr/local/bin/apktool.jar && \
|
||||
chmod +x /usr/local/bin/apktool
|
||||
|
||||
# Install jadx (Dex to Java decompiler)
|
||||
RUN wget -q https://github.com/skylot/jadx/releases/download/v1.4.7/jadx-1.4.7.zip -O /tmp/jadx.zip && \
|
||||
unzip -q /tmp/jadx.zip -d /opt/jadx && \
|
||||
ln -s /opt/jadx/bin/jadx /usr/local/bin/jadx && \
|
||||
ln -s /opt/jadx/bin/jadx-gui /usr/local/bin/jadx-gui && \
|
||||
rm /tmp/jadx.zip
|
||||
|
||||
# Install Python dependencies for Android security tools
|
||||
COPY requirements.txt /tmp/requirements.txt
|
||||
RUN pip3 install --no-cache-dir -r /tmp/requirements.txt && \
|
||||
rm /tmp/requirements.txt
|
||||
|
||||
# Install androguard (Python APK analysis framework)
|
||||
RUN pip3 install --no-cache-dir androguard pyaxmlparser
|
||||
|
||||
# Install Frida
|
||||
RUN pip3 install --no-cache-dir frida-tools frida
|
||||
|
||||
# Create cache directory
|
||||
RUN mkdir -p /cache && chmod 755 /cache
|
||||
|
||||
# Copy worker entrypoint (generic, works for all verticals)
|
||||
COPY worker.py /app/worker.py
|
||||
|
||||
# Add toolbox to Python path (mounted at runtime)
|
||||
ENV PYTHONPATH="/app:/app/toolbox:${PYTHONPATH}"
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
|
||||
|
||||
# Healthcheck
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
||||
CMD python3 -c "import sys; sys.exit(0)"
|
||||
|
||||
# Run worker
|
||||
CMD ["python3", "/app/worker.py"]
|
||||
19
workers/android/requirements.txt
Normal file
19
workers/android/requirements.txt
Normal file
@@ -0,0 +1,19 @@
|
||||
# Temporal Python SDK
|
||||
temporalio>=1.5.0
|
||||
|
||||
# S3/MinIO client
|
||||
boto3>=1.34.0
|
||||
botocore>=1.34.0
|
||||
|
||||
# Data validation
|
||||
pydantic>=2.5.0
|
||||
|
||||
# YAML parsing
|
||||
PyYAML>=6.0.1
|
||||
|
||||
# Utilities
|
||||
python-dotenv>=1.0.0
|
||||
aiofiles>=23.2.1
|
||||
|
||||
# Logging
|
||||
structlog>=24.1.0
|
||||
309
workers/android/worker.py
Normal file
309
workers/android/worker.py
Normal file
@@ -0,0 +1,309 @@
|
||||
"""
|
||||
FuzzForge Vertical Worker: Rust/Native Security
|
||||
|
||||
This worker:
|
||||
1. Discovers workflows for the 'rust' vertical from mounted toolbox
|
||||
2. Dynamically imports and registers workflow classes
|
||||
3. Connects to Temporal and processes tasks
|
||||
4. Handles activities for target download/upload from MinIO
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import importlib
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Any
|
||||
|
||||
import yaml
|
||||
from temporalio.client import Client
|
||||
from temporalio.worker import Worker
|
||||
|
||||
# Add toolbox to path for workflow and activity imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
# Import common storage activities
|
||||
from toolbox.common.storage_activities import (
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
)
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv('LOG_LEVEL', 'INFO'),
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def discover_workflows(vertical: str) -> List[Any]:
|
||||
"""
|
||||
Discover workflows for this vertical from mounted toolbox.
|
||||
|
||||
Args:
|
||||
vertical: The vertical name (e.g., 'rust', 'android', 'web')
|
||||
|
||||
Returns:
|
||||
List of workflow classes decorated with @workflow.defn
|
||||
"""
|
||||
workflows = []
|
||||
toolbox_path = Path("/app/toolbox/workflows")
|
||||
|
||||
if not toolbox_path.exists():
|
||||
logger.warning(f"Toolbox path does not exist: {toolbox_path}")
|
||||
return workflows
|
||||
|
||||
logger.info(f"Scanning for workflows in: {toolbox_path}")
|
||||
|
||||
for workflow_dir in toolbox_path.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
metadata_file = workflow_dir / "metadata.yaml"
|
||||
if not metadata_file.exists():
|
||||
logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Parse metadata
|
||||
with open(metadata_file) as f:
|
||||
metadata = yaml.safe_load(f)
|
||||
|
||||
# Check if workflow is for this vertical
|
||||
workflow_vertical = metadata.get("vertical")
|
||||
if workflow_vertical != vertical:
|
||||
logger.debug(
|
||||
f"Workflow {workflow_dir.name} is for vertical '{workflow_vertical}', "
|
||||
f"not '{vertical}', skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Check if workflow.py exists
|
||||
workflow_file = workflow_dir / "workflow.py"
|
||||
if not workflow_file.exists():
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Dynamically import workflow module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.workflow"
|
||||
logger.info(f"Importing workflow module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import workflow module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @workflow.defn decorated classes
|
||||
found_workflows = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isclass):
|
||||
# Check if class has Temporal workflow definition
|
||||
if hasattr(obj, '__temporal_workflow_definition'):
|
||||
workflows.append(obj)
|
||||
found_workflows = True
|
||||
logger.info(
|
||||
f"✓ Discovered workflow: {name} from {workflow_dir.name} "
|
||||
f"(vertical: {vertical})"
|
||||
)
|
||||
|
||||
if not found_workflows:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has no @workflow.defn decorated classes"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing workflow {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(workflows)} workflows for vertical '{vertical}'")
|
||||
return workflows
|
||||
|
||||
|
||||
async def discover_activities(workflows_dir: Path) -> List[Any]:
|
||||
"""
|
||||
Discover activities from workflow directories.
|
||||
|
||||
Looks for activities.py files alongside workflow.py in each workflow directory.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to workflows directory
|
||||
|
||||
Returns:
|
||||
List of activity functions decorated with @activity.defn
|
||||
"""
|
||||
activities = []
|
||||
|
||||
if not workflows_dir.exists():
|
||||
logger.warning(f"Workflows directory does not exist: {workflows_dir}")
|
||||
return activities
|
||||
|
||||
logger.info(f"Scanning for workflow activities in: {workflows_dir}")
|
||||
|
||||
for workflow_dir in workflows_dir.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
# Check if activities.py exists
|
||||
activities_file = workflow_dir / "activities.py"
|
||||
if not activities_file.exists():
|
||||
logger.debug(f"No activities.py in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Dynamically import activities module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.activities"
|
||||
logger.info(f"Importing activities module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import activities module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @activity.defn decorated functions
|
||||
found_activities = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isfunction):
|
||||
# Check if function has Temporal activity definition
|
||||
if hasattr(obj, '__temporal_activity_definition'):
|
||||
activities.append(obj)
|
||||
found_activities = True
|
||||
logger.info(
|
||||
f"✓ Discovered activity: {name} from {workflow_dir.name}"
|
||||
)
|
||||
|
||||
if not found_activities:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has activities.py but no @activity.defn decorated functions"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing activities from {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(activities)} workflow-specific activities")
|
||||
return activities
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main worker entry point"""
|
||||
# Get configuration from environment
|
||||
vertical = os.getenv("WORKER_VERTICAL", "rust")
|
||||
temporal_address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
|
||||
temporal_namespace = os.getenv("TEMPORAL_NAMESPACE", "default")
|
||||
task_queue = os.getenv("WORKER_TASK_QUEUE", f"{vertical}-queue")
|
||||
max_concurrent_activities = int(os.getenv("MAX_CONCURRENT_ACTIVITIES", "5"))
|
||||
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"FuzzForge Vertical Worker: {vertical}")
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"Temporal Address: {temporal_address}")
|
||||
logger.info(f"Temporal Namespace: {temporal_namespace}")
|
||||
logger.info(f"Task Queue: {task_queue}")
|
||||
logger.info(f"Max Concurrent Activities: {max_concurrent_activities}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
# Discover workflows for this vertical
|
||||
logger.info(f"Discovering workflows for vertical: {vertical}")
|
||||
workflows = await discover_workflows(vertical)
|
||||
|
||||
if not workflows:
|
||||
logger.error(f"No workflows found for vertical: {vertical}")
|
||||
logger.error("Worker cannot start without workflows. Exiting...")
|
||||
sys.exit(1)
|
||||
|
||||
# Discover activities from workflow directories
|
||||
logger.info("Discovering workflow-specific activities...")
|
||||
workflows_dir = Path("/app/toolbox/workflows")
|
||||
workflow_activities = await discover_activities(workflows_dir)
|
||||
|
||||
# Combine common storage activities with workflow-specific activities
|
||||
activities = [
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
] + workflow_activities
|
||||
|
||||
logger.info(
|
||||
f"Total activities registered: {len(activities)} "
|
||||
f"(3 common + {len(workflow_activities)} workflow-specific)"
|
||||
)
|
||||
|
||||
# Connect to Temporal
|
||||
logger.info(f"Connecting to Temporal at {temporal_address}...")
|
||||
try:
|
||||
client = await Client.connect(
|
||||
temporal_address,
|
||||
namespace=temporal_namespace
|
||||
)
|
||||
logger.info("✓ Connected to Temporal successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to connect to Temporal: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Create worker with discovered workflows and activities
|
||||
logger.info(f"Creating worker on task queue: {task_queue}")
|
||||
|
||||
try:
|
||||
worker = Worker(
|
||||
client,
|
||||
task_queue=task_queue,
|
||||
workflows=workflows,
|
||||
activities=activities,
|
||||
max_concurrent_activities=max_concurrent_activities
|
||||
)
|
||||
logger.info("✓ Worker created successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create worker: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Start worker
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"🚀 Worker started for vertical '{vertical}'")
|
||||
logger.info(f"📦 Registered {len(workflows)} workflows")
|
||||
logger.info(f"⚙️ Registered {len(activities)} activities")
|
||||
logger.info(f"📨 Listening on task queue: {task_queue}")
|
||||
logger.info("=" * 60)
|
||||
logger.info("Worker is ready to process tasks...")
|
||||
|
||||
try:
|
||||
await worker.run()
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Shutting down worker (keyboard interrupt)...")
|
||||
except Exception as e:
|
||||
logger.error(f"Worker error: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
asyncio.run(main())
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Worker stopped")
|
||||
except Exception as e:
|
||||
logger.error(f"Fatal error: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
45
workers/ossfuzz/Dockerfile
Normal file
45
workers/ossfuzz/Dockerfile
Normal file
@@ -0,0 +1,45 @@
|
||||
# OSS-Fuzz Worker - Generic fuzzing using OSS-Fuzz infrastructure
|
||||
FROM gcr.io/oss-fuzz-base/base-builder:latest
|
||||
|
||||
# Install Python, Docker CLI, and dependencies (use Python 3.8 from base image)
|
||||
RUN apt-get update && apt-get install -y \
|
||||
python3-pip \
|
||||
python3-dev \
|
||||
git \
|
||||
docker.io \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Upgrade pip
|
||||
RUN python3 -m pip install --upgrade pip
|
||||
|
||||
# Install Temporal Python SDK and dependencies
|
||||
RUN pip3 install --no-cache-dir \
|
||||
temporalio==1.5.0 \
|
||||
boto3==1.34.50 \
|
||||
pyyaml==6.0.1 \
|
||||
psutil==5.9.8
|
||||
|
||||
# Create necessary directories
|
||||
RUN mkdir -p /app /cache /corpus /output
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/app
|
||||
ENV WORKER_VERTICAL=ossfuzz
|
||||
ENV MAX_CONCURRENT_ACTIVITIES=2
|
||||
ENV CACHE_DIR=/cache
|
||||
ENV CACHE_MAX_SIZE=50GB
|
||||
ENV CACHE_TTL=30d
|
||||
|
||||
# Clone OSS-Fuzz repo (will be cached in /cache by worker)
|
||||
# This is just to have helper scripts available
|
||||
RUN git clone --depth=1 https://github.com/google/oss-fuzz.git /opt/oss-fuzz
|
||||
|
||||
# Copy worker code
|
||||
COPY worker.py /app/
|
||||
COPY activities.py /app/
|
||||
COPY requirements.txt /app/
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Run worker
|
||||
CMD ["python3", "worker.py"]
|
||||
413
workers/ossfuzz/activities.py
Normal file
413
workers/ossfuzz/activities.py
Normal file
@@ -0,0 +1,413 @@
|
||||
"""
|
||||
OSS-Fuzz Campaign Activities
|
||||
|
||||
Activities for running OSS-Fuzz campaigns using Google's infrastructure.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from datetime import datetime
|
||||
|
||||
import yaml
|
||||
from temporalio import activity
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Paths
|
||||
OSS_FUZZ_REPO = Path("/opt/oss-fuzz")
|
||||
CACHE_DIR = Path(os.getenv("CACHE_DIR", "/cache"))
|
||||
|
||||
|
||||
@activity.defn(name="load_ossfuzz_project")
|
||||
async def load_ossfuzz_project_activity(project_name: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Load OSS-Fuzz project configuration from project.yaml.
|
||||
|
||||
Args:
|
||||
project_name: Name of the OSS-Fuzz project (e.g., "curl", "sqlite3")
|
||||
|
||||
Returns:
|
||||
Dictionary with project config, paths, and metadata
|
||||
"""
|
||||
logger.info(f"Loading OSS-Fuzz project: {project_name}")
|
||||
|
||||
# Update OSS-Fuzz repo if it exists, clone if not
|
||||
if OSS_FUZZ_REPO.exists():
|
||||
logger.info("Updating OSS-Fuzz repository...")
|
||||
subprocess.run(
|
||||
["git", "-C", str(OSS_FUZZ_REPO), "pull", "--depth=1"],
|
||||
check=False # Don't fail if already up to date
|
||||
)
|
||||
else:
|
||||
logger.info("Cloning OSS-Fuzz repository...")
|
||||
subprocess.run(
|
||||
[
|
||||
"git", "clone", "--depth=1",
|
||||
"https://github.com/google/oss-fuzz.git",
|
||||
str(OSS_FUZZ_REPO)
|
||||
],
|
||||
check=True
|
||||
)
|
||||
|
||||
# Find project directory
|
||||
project_path = OSS_FUZZ_REPO / "projects" / project_name
|
||||
if not project_path.exists():
|
||||
raise ValueError(
|
||||
f"Project '{project_name}' not found in OSS-Fuzz. "
|
||||
f"Available projects: https://github.com/google/oss-fuzz/tree/master/projects"
|
||||
)
|
||||
|
||||
# Read project.yaml
|
||||
config_file = project_path / "project.yaml"
|
||||
if not config_file.exists():
|
||||
raise ValueError(f"No project.yaml found for project '{project_name}'")
|
||||
|
||||
with open(config_file) as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
# Add paths
|
||||
config["project_name"] = project_name
|
||||
config["project_path"] = str(project_path)
|
||||
config["dockerfile_path"] = str(project_path / "Dockerfile")
|
||||
config["build_script_path"] = str(project_path / "build.sh")
|
||||
|
||||
# Validate required fields
|
||||
if not config.get("language"):
|
||||
logger.warning(f"No language specified in project.yaml for {project_name}")
|
||||
|
||||
logger.info(
|
||||
f"✓ Loaded project {project_name}: "
|
||||
f"language={config.get('language', 'unknown')}, "
|
||||
f"engines={config.get('fuzzing_engines', [])}, "
|
||||
f"sanitizers={config.get('sanitizers', [])}"
|
||||
)
|
||||
|
||||
return config
|
||||
|
||||
|
||||
@activity.defn(name="build_ossfuzz_project")
|
||||
async def build_ossfuzz_project_activity(
|
||||
project_name: str,
|
||||
project_config: Dict[str, Any],
|
||||
sanitizer: Optional[str] = None,
|
||||
engine: Optional[str] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Build OSS-Fuzz project directly using build.sh (no Docker-in-Docker).
|
||||
|
||||
Args:
|
||||
project_name: Name of the project
|
||||
project_config: Configuration from project.yaml
|
||||
sanitizer: Override sanitizer (default: first from project.yaml)
|
||||
engine: Override engine (default: first from project.yaml)
|
||||
|
||||
Returns:
|
||||
Dictionary with build results and discovered fuzz targets
|
||||
"""
|
||||
logger.info(f"Building OSS-Fuzz project: {project_name}")
|
||||
|
||||
# Determine sanitizer and engine
|
||||
sanitizers = project_config.get("sanitizers", ["address"])
|
||||
engines = project_config.get("fuzzing_engines", ["libfuzzer"])
|
||||
|
||||
use_sanitizer = sanitizer if sanitizer else sanitizers[0]
|
||||
use_engine = engine if engine else engines[0]
|
||||
|
||||
logger.info(f"Building with sanitizer={use_sanitizer}, engine={use_engine}")
|
||||
|
||||
# Setup directories
|
||||
src_dir = Path("/src")
|
||||
out_dir = Path("/out")
|
||||
src_dir.mkdir(exist_ok=True)
|
||||
out_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Clean previous build artifacts
|
||||
for item in out_dir.glob("*"):
|
||||
if item.is_file():
|
||||
item.unlink()
|
||||
elif item.is_dir():
|
||||
shutil.rmtree(item)
|
||||
|
||||
# Copy project files from OSS-Fuzz repo to /src
|
||||
project_path = Path(project_config["project_path"])
|
||||
build_script = project_path / "build.sh"
|
||||
|
||||
if not build_script.exists():
|
||||
raise Exception(f"build.sh not found for project {project_name}")
|
||||
|
||||
logger.info(f"Copying project files from {project_path} to {src_dir}")
|
||||
|
||||
# Copy build.sh
|
||||
shutil.copy2(build_script, src_dir / "build.sh")
|
||||
os.chmod(src_dir / "build.sh", 0o755)
|
||||
|
||||
# Copy any fuzzer source files (*.cc, *.c, *.cpp files)
|
||||
for pattern in ["*.cc", "*.c", "*.cpp", "*.h", "*.hh", "*.hpp"]:
|
||||
for src_file in project_path.glob(pattern):
|
||||
dest_file = src_dir / src_file.name
|
||||
shutil.copy2(src_file, dest_file)
|
||||
logger.info(f"Copied: {src_file.name}")
|
||||
|
||||
# Clone project source code to subdirectory
|
||||
main_repo = project_config.get("main_repo")
|
||||
work_dir = src_dir
|
||||
|
||||
if main_repo:
|
||||
logger.info(f"Cloning project source from {main_repo}")
|
||||
project_src_dir = src_dir / project_name
|
||||
|
||||
# Remove existing directory if present
|
||||
if project_src_dir.exists():
|
||||
shutil.rmtree(project_src_dir)
|
||||
|
||||
clone_cmd = ["git", "clone", "--depth=1", main_repo, str(project_src_dir)]
|
||||
result = subprocess.run(clone_cmd, capture_output=True, text=True, timeout=600)
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.warning(f"Failed to clone {main_repo}: {result.stderr}")
|
||||
logger.info("Continuing without cloning (build.sh may download source)")
|
||||
else:
|
||||
# Copy build.sh into the project source directory
|
||||
shutil.copy2(src_dir / "build.sh", project_src_dir / "build.sh")
|
||||
os.chmod(project_src_dir / "build.sh", 0o755)
|
||||
# build.sh should run from within the project directory
|
||||
work_dir = project_src_dir
|
||||
logger.info(f"Build will run from: {work_dir}")
|
||||
else:
|
||||
logger.info("No main_repo in project.yaml, build.sh will download source")
|
||||
|
||||
# Set OSS-Fuzz environment variables
|
||||
build_env = os.environ.copy()
|
||||
build_env.update({
|
||||
"SRC": str(src_dir),
|
||||
"OUT": str(out_dir),
|
||||
"FUZZING_ENGINE": use_engine,
|
||||
"SANITIZER": use_sanitizer,
|
||||
"ARCHITECTURE": "x86_64",
|
||||
# Use clang's built-in libfuzzer instead of separate library
|
||||
"LIB_FUZZING_ENGINE": "-fsanitize=fuzzer",
|
||||
})
|
||||
|
||||
# Set sanitizer flags
|
||||
if use_sanitizer == "address":
|
||||
build_env["CFLAGS"] = build_env.get("CFLAGS", "") + " -fsanitize=address"
|
||||
build_env["CXXFLAGS"] = build_env.get("CXXFLAGS", "") + " -fsanitize=address"
|
||||
elif use_sanitizer == "memory":
|
||||
build_env["CFLAGS"] = build_env.get("CFLAGS", "") + " -fsanitize=memory"
|
||||
build_env["CXXFLAGS"] = build_env.get("CXXFLAGS", "") + " -fsanitize=memory"
|
||||
elif use_sanitizer == "undefined":
|
||||
build_env["CFLAGS"] = build_env.get("CFLAGS", "") + " -fsanitize=undefined"
|
||||
build_env["CXXFLAGS"] = build_env.get("CXXFLAGS", "") + " -fsanitize=undefined"
|
||||
|
||||
# Execute build.sh from the work directory
|
||||
logger.info(f"Executing build.sh in {work_dir}")
|
||||
build_cmd = ["bash", "./build.sh"]
|
||||
|
||||
result = subprocess.run(
|
||||
build_cmd,
|
||||
cwd=str(work_dir),
|
||||
env=build_env,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=1800 # 30 minutes max build time
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.error(f"Build failed:\nSTDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}")
|
||||
raise Exception(f"Build failed for {project_name}: {result.stderr}")
|
||||
|
||||
logger.info("✓ Build completed successfully")
|
||||
logger.info(f"Build output:\n{result.stdout[-2000:]}") # Last 2000 chars
|
||||
|
||||
# Discover fuzz targets in /out
|
||||
fuzz_targets = []
|
||||
for file in out_dir.glob("*"):
|
||||
if file.is_file() and os.access(file, os.X_OK):
|
||||
# Check if it's a fuzz target (executable, not .so/.a/.o)
|
||||
if file.suffix not in ['.so', '.a', '.o', '.zip']:
|
||||
fuzz_targets.append(str(file))
|
||||
logger.info(f"Found fuzz target: {file.name}")
|
||||
|
||||
if not fuzz_targets:
|
||||
logger.warning(f"No fuzz targets found in {out_dir}")
|
||||
logger.info(f"Directory contents: {list(out_dir.glob('*'))}")
|
||||
|
||||
return {
|
||||
"fuzz_targets": fuzz_targets,
|
||||
"build_log": result.stdout[-5000:], # Last 5000 chars
|
||||
"sanitizer_used": use_sanitizer,
|
||||
"engine_used": use_engine,
|
||||
"out_dir": str(out_dir)
|
||||
}
|
||||
|
||||
|
||||
@activity.defn(name="fuzz_target")
|
||||
async def fuzz_target_activity(
|
||||
target_path: str,
|
||||
engine: str,
|
||||
duration_seconds: int,
|
||||
corpus_dir: Optional[str] = None,
|
||||
dict_file: Optional[str] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Run fuzzing on a target with specified engine.
|
||||
|
||||
Args:
|
||||
target_path: Path to fuzz target executable
|
||||
engine: Fuzzing engine (libfuzzer, afl, honggfuzz)
|
||||
duration_seconds: How long to fuzz
|
||||
corpus_dir: Optional corpus directory
|
||||
dict_file: Optional dictionary file
|
||||
|
||||
Returns:
|
||||
Dictionary with fuzzing stats and results
|
||||
"""
|
||||
logger.info(f"Fuzzing {Path(target_path).name} with {engine} for {duration_seconds}s")
|
||||
|
||||
# Prepare corpus directory
|
||||
if not corpus_dir:
|
||||
corpus_dir = str(CACHE_DIR / "corpus" / Path(target_path).stem)
|
||||
Path(corpus_dir).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
output_dir = CACHE_DIR / "output" / Path(target_path).stem
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
start_time = datetime.now()
|
||||
|
||||
try:
|
||||
if engine == "libfuzzer":
|
||||
cmd = [
|
||||
target_path,
|
||||
corpus_dir,
|
||||
f"-max_total_time={duration_seconds}",
|
||||
"-print_final_stats=1",
|
||||
f"-artifact_prefix={output_dir}/"
|
||||
]
|
||||
if dict_file:
|
||||
cmd.append(f"-dict={dict_file}")
|
||||
|
||||
elif engine == "afl":
|
||||
cmd = [
|
||||
"afl-fuzz",
|
||||
"-i", corpus_dir if Path(corpus_dir).glob("*") else "-", # Empty corpus OK
|
||||
"-o", str(output_dir),
|
||||
"-t", "1000", # Timeout per execution
|
||||
"-m", "none", # No memory limit
|
||||
"--", target_path, "@@"
|
||||
]
|
||||
|
||||
elif engine == "honggfuzz":
|
||||
cmd = [
|
||||
"honggfuzz",
|
||||
f"--run_time={duration_seconds}",
|
||||
"-i", corpus_dir,
|
||||
"-o", str(output_dir),
|
||||
"--", target_path
|
||||
]
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unsupported fuzzing engine: {engine}")
|
||||
|
||||
logger.info(f"Starting fuzzer: {' '.join(cmd[:5])}...")
|
||||
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=duration_seconds + 120 # Add 2 minute buffer
|
||||
)
|
||||
|
||||
end_time = datetime.now()
|
||||
elapsed = (end_time - start_time).total_seconds()
|
||||
|
||||
# Parse stats from output
|
||||
stats = parse_fuzzing_stats(result.stdout, result.stderr, engine)
|
||||
stats["elapsed_time"] = elapsed
|
||||
stats["target_name"] = Path(target_path).name
|
||||
stats["engine"] = engine
|
||||
|
||||
# Find crashes
|
||||
crashes = find_crashes(output_dir)
|
||||
stats["crashes"] = len(crashes)
|
||||
stats["crash_files"] = crashes
|
||||
|
||||
# Collect new corpus files
|
||||
new_corpus = collect_corpus(corpus_dir)
|
||||
stats["corpus_size"] = len(new_corpus)
|
||||
stats["corpus_files"] = new_corpus
|
||||
|
||||
logger.info(
|
||||
f"✓ Fuzzing completed: {stats.get('total_executions', 0)} execs, "
|
||||
f"{len(crashes)} crashes"
|
||||
)
|
||||
|
||||
return stats
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning(f"Fuzzing timed out after {duration_seconds}s")
|
||||
return {
|
||||
"target_name": Path(target_path).name,
|
||||
"engine": engine,
|
||||
"status": "timeout",
|
||||
"elapsed_time": duration_seconds
|
||||
}
|
||||
|
||||
|
||||
def parse_fuzzing_stats(stdout: str, stderr: str, engine: str) -> Dict[str, Any]:
|
||||
"""Parse fuzzing statistics from output"""
|
||||
stats = {}
|
||||
|
||||
if engine == "libfuzzer":
|
||||
# Parse libFuzzer stats
|
||||
for line in (stdout + stderr).split('\n'):
|
||||
if "#" in line and "NEW" in line:
|
||||
# Example: #8192 NEW cov: 1234 ft: 5678 corp: 89/10KB
|
||||
parts = line.split()
|
||||
for i, part in enumerate(parts):
|
||||
if part.startswith("cov:"):
|
||||
stats["coverage"] = int(parts[i+1])
|
||||
elif part.startswith("corp:"):
|
||||
stats["corpus_entries"] = int(parts[i+1].split('/')[0])
|
||||
elif part.startswith("exec/s:"):
|
||||
stats["executions_per_sec"] = float(parts[i+1])
|
||||
elif part.startswith("#"):
|
||||
stats["total_executions"] = int(part[1:])
|
||||
|
||||
elif engine == "afl":
|
||||
# Parse AFL stats (would need to read fuzzer_stats file)
|
||||
pass
|
||||
|
||||
elif engine == "honggfuzz":
|
||||
# Parse Honggfuzz stats
|
||||
pass
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
def find_crashes(output_dir: Path) -> List[str]:
|
||||
"""Find crash files in output directory"""
|
||||
crashes = []
|
||||
|
||||
# libFuzzer crash files start with "crash-" or "leak-"
|
||||
for pattern in ["crash-*", "leak-*", "timeout-*"]:
|
||||
crashes.extend([str(f) for f in output_dir.glob(pattern)])
|
||||
|
||||
# AFL crashes in crashes/ subdirectory
|
||||
crashes_dir = output_dir / "crashes"
|
||||
if crashes_dir.exists():
|
||||
crashes.extend([str(f) for f in crashes_dir.glob("*") if f.is_file()])
|
||||
|
||||
return crashes
|
||||
|
||||
|
||||
def collect_corpus(corpus_dir: str) -> List[str]:
|
||||
"""Collect corpus files"""
|
||||
corpus_path = Path(corpus_dir)
|
||||
if not corpus_path.exists():
|
||||
return []
|
||||
|
||||
return [str(f) for f in corpus_path.glob("*") if f.is_file()]
|
||||
4
workers/ossfuzz/requirements.txt
Normal file
4
workers/ossfuzz/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
temporalio==1.5.0
|
||||
boto3==1.34.50
|
||||
pyyaml==6.0.1
|
||||
psutil==5.9.8
|
||||
319
workers/ossfuzz/worker.py
Normal file
319
workers/ossfuzz/worker.py
Normal file
@@ -0,0 +1,319 @@
|
||||
"""
|
||||
FuzzForge Vertical Worker: OSS-Fuzz Campaigns
|
||||
|
||||
This worker:
|
||||
1. Discovers workflows for the 'ossfuzz' vertical from mounted toolbox
|
||||
2. Dynamically imports and registers workflow classes
|
||||
3. Connects to Temporal and processes tasks
|
||||
4. Handles activities for OSS-Fuzz project building and fuzzing
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import importlib
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Any
|
||||
|
||||
import yaml
|
||||
from temporalio.client import Client
|
||||
from temporalio.worker import Worker
|
||||
|
||||
# Add toolbox to path for workflow and activity imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
# Import common storage activities
|
||||
from toolbox.common.storage_activities import (
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
)
|
||||
|
||||
# Import OSS-Fuzz specific activities
|
||||
from activities import (
|
||||
load_ossfuzz_project_activity,
|
||||
build_ossfuzz_project_activity,
|
||||
fuzz_target_activity
|
||||
)
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv('LOG_LEVEL', 'INFO'),
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def discover_workflows(vertical: str) -> List[Any]:
|
||||
"""
|
||||
Discover workflows for this vertical from mounted toolbox.
|
||||
|
||||
Args:
|
||||
vertical: The vertical name (e.g., 'ossfuzz')
|
||||
|
||||
Returns:
|
||||
List of workflow classes decorated with @workflow.defn
|
||||
"""
|
||||
workflows = []
|
||||
toolbox_path = Path("/app/toolbox/workflows")
|
||||
|
||||
if not toolbox_path.exists():
|
||||
logger.warning(f"Toolbox path does not exist: {toolbox_path}")
|
||||
return workflows
|
||||
|
||||
logger.info(f"Scanning for workflows in: {toolbox_path}")
|
||||
|
||||
for workflow_dir in toolbox_path.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
metadata_file = workflow_dir / "metadata.yaml"
|
||||
if not metadata_file.exists():
|
||||
logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Parse metadata
|
||||
with open(metadata_file) as f:
|
||||
metadata = yaml.safe_load(f)
|
||||
|
||||
# Check if workflow is for this vertical
|
||||
workflow_vertical = metadata.get("vertical")
|
||||
if workflow_vertical != vertical:
|
||||
logger.debug(
|
||||
f"Workflow {workflow_dir.name} is for vertical '{workflow_vertical}', "
|
||||
f"not '{vertical}', skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Check if workflow.py exists
|
||||
workflow_file = workflow_dir / "workflow.py"
|
||||
if not workflow_file.exists():
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Dynamically import workflow module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.workflow"
|
||||
logger.info(f"Importing workflow module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import workflow module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @workflow.defn decorated classes
|
||||
found_workflows = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isclass):
|
||||
# Check if class has Temporal workflow definition
|
||||
if hasattr(obj, '__temporal_workflow_definition'):
|
||||
workflows.append(obj)
|
||||
found_workflows = True
|
||||
logger.info(
|
||||
f"✓ Discovered workflow: {name} from {workflow_dir.name} "
|
||||
f"(vertical: {vertical})"
|
||||
)
|
||||
|
||||
if not found_workflows:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has no @workflow.defn decorated classes"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing workflow {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(workflows)} workflows for vertical '{vertical}'")
|
||||
return workflows
|
||||
|
||||
|
||||
async def discover_activities(workflows_dir: Path) -> List[Any]:
|
||||
"""
|
||||
Discover activities from workflow directories.
|
||||
|
||||
Looks for activities.py files alongside workflow.py in each workflow directory.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to workflows directory
|
||||
|
||||
Returns:
|
||||
List of activity functions decorated with @activity.defn
|
||||
"""
|
||||
activities = []
|
||||
|
||||
if not workflows_dir.exists():
|
||||
logger.warning(f"Workflows directory does not exist: {workflows_dir}")
|
||||
return activities
|
||||
|
||||
logger.info(f"Scanning for workflow activities in: {workflows_dir}")
|
||||
|
||||
for workflow_dir in workflows_dir.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
# Check if activities.py exists
|
||||
activities_file = workflow_dir / "activities.py"
|
||||
if not activities_file.exists():
|
||||
logger.debug(f"No activities.py in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Dynamically import activities module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.activities"
|
||||
logger.info(f"Importing activities module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import activities module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @activity.defn decorated functions
|
||||
found_activities = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isfunction):
|
||||
# Check if function has Temporal activity definition
|
||||
if hasattr(obj, '__temporal_activity_definition'):
|
||||
activities.append(obj)
|
||||
found_activities = True
|
||||
logger.info(
|
||||
f"✓ Discovered activity: {name} from {workflow_dir.name}"
|
||||
)
|
||||
|
||||
if not found_activities:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has activities.py but no @activity.defn decorated functions"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing activities from {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(activities)} workflow-specific activities")
|
||||
return activities
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main worker entry point"""
|
||||
# Get configuration from environment
|
||||
vertical = os.getenv("WORKER_VERTICAL", "ossfuzz")
|
||||
temporal_address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
|
||||
temporal_namespace = os.getenv("TEMPORAL_NAMESPACE", "default")
|
||||
task_queue = os.getenv("WORKER_TASK_QUEUE", f"{vertical}-queue")
|
||||
max_concurrent_activities = int(os.getenv("MAX_CONCURRENT_ACTIVITIES", "2"))
|
||||
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"FuzzForge Vertical Worker: {vertical}")
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"Temporal Address: {temporal_address}")
|
||||
logger.info(f"Temporal Namespace: {temporal_namespace}")
|
||||
logger.info(f"Task Queue: {task_queue}")
|
||||
logger.info(f"Max Concurrent Activities: {max_concurrent_activities}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
# Discover workflows for this vertical
|
||||
logger.info(f"Discovering workflows for vertical: {vertical}")
|
||||
workflows = await discover_workflows(vertical)
|
||||
|
||||
if not workflows:
|
||||
logger.error(f"No workflows found for vertical: {vertical}")
|
||||
logger.error("Worker cannot start without workflows. Exiting...")
|
||||
sys.exit(1)
|
||||
|
||||
# Discover activities from workflow directories
|
||||
logger.info("Discovering workflow-specific activities...")
|
||||
workflows_dir = Path("/app/toolbox/workflows")
|
||||
workflow_activities = await discover_activities(workflows_dir)
|
||||
|
||||
# Combine common storage activities, OSS-Fuzz activities, and workflow-specific activities
|
||||
activities = [
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity,
|
||||
load_ossfuzz_project_activity,
|
||||
build_ossfuzz_project_activity,
|
||||
fuzz_target_activity
|
||||
] + workflow_activities
|
||||
|
||||
logger.info(
|
||||
f"Total activities registered: {len(activities)} "
|
||||
f"(3 common + 3 ossfuzz + {len(workflow_activities)} workflow-specific)"
|
||||
)
|
||||
|
||||
# Connect to Temporal
|
||||
logger.info(f"Connecting to Temporal at {temporal_address}...")
|
||||
try:
|
||||
client = await Client.connect(
|
||||
temporal_address,
|
||||
namespace=temporal_namespace
|
||||
)
|
||||
logger.info("✓ Connected to Temporal successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to connect to Temporal: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Create worker with discovered workflows and activities
|
||||
logger.info(f"Creating worker on task queue: {task_queue}")
|
||||
|
||||
try:
|
||||
worker = Worker(
|
||||
client,
|
||||
task_queue=task_queue,
|
||||
workflows=workflows,
|
||||
activities=activities,
|
||||
max_concurrent_activities=max_concurrent_activities
|
||||
)
|
||||
logger.info("✓ Worker created successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create worker: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Start worker
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"🚀 Worker started for vertical '{vertical}'")
|
||||
logger.info(f"📦 Registered {len(workflows)} workflows")
|
||||
logger.info(f"⚙️ Registered {len(activities)} activities")
|
||||
logger.info(f"📨 Listening on task queue: {task_queue}")
|
||||
logger.info("=" * 60)
|
||||
logger.info("Worker is ready to process tasks...")
|
||||
|
||||
try:
|
||||
await worker.run()
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Shutting down worker (keyboard interrupt)...")
|
||||
except Exception as e:
|
||||
logger.error(f"Worker error: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
asyncio.run(main())
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Worker stopped")
|
||||
except Exception as e:
|
||||
logger.error(f"Fatal error: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
47
workers/python/Dockerfile
Normal file
47
workers/python/Dockerfile
Normal file
@@ -0,0 +1,47 @@
|
||||
# FuzzForge Vertical Worker: Python Fuzzing
|
||||
#
|
||||
# Pre-installed tools for Python fuzzing and security analysis:
|
||||
# - Python 3.11
|
||||
# - Atheris (Python fuzzing)
|
||||
# - Common Python security tools
|
||||
# - Temporal worker
|
||||
|
||||
FROM python:3.11-slim
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
# Build essentials for Atheris
|
||||
build-essential \
|
||||
clang \
|
||||
llvm \
|
||||
# Development tools
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
# Cleanup
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Python dependencies for Temporal worker
|
||||
COPY requirements.txt /tmp/requirements.txt
|
||||
RUN pip3 install --no-cache-dir -r /tmp/requirements.txt && \
|
||||
rm /tmp/requirements.txt
|
||||
|
||||
# Create cache directory for downloaded targets
|
||||
RUN mkdir -p /cache && chmod 755 /cache
|
||||
|
||||
# Copy worker entrypoint
|
||||
COPY worker.py /app/worker.py
|
||||
|
||||
# Add toolbox to Python path (mounted at runtime)
|
||||
ENV PYTHONPATH="/app:/app/toolbox:${PYTHONPATH}"
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
# Healthcheck
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
||||
CMD python3 -c "import sys; sys.exit(0)"
|
||||
|
||||
# Run worker
|
||||
CMD ["python3", "/app/worker.py"]
|
||||
15
workers/python/requirements.txt
Normal file
15
workers/python/requirements.txt
Normal file
@@ -0,0 +1,15 @@
|
||||
# Temporal worker dependencies
|
||||
temporalio>=1.5.0
|
||||
pydantic>=2.0.0
|
||||
|
||||
# Storage (MinIO/S3)
|
||||
boto3>=1.34.0
|
||||
|
||||
# Configuration
|
||||
pyyaml>=6.0.0
|
||||
|
||||
# HTTP Client (for real-time stats reporting)
|
||||
httpx>=0.27.0
|
||||
|
||||
# Fuzzing
|
||||
atheris>=2.3.0
|
||||
309
workers/python/worker.py
Normal file
309
workers/python/worker.py
Normal file
@@ -0,0 +1,309 @@
|
||||
"""
|
||||
FuzzForge Vertical Worker: Rust/Native Security
|
||||
|
||||
This worker:
|
||||
1. Discovers workflows for the 'rust' vertical from mounted toolbox
|
||||
2. Dynamically imports and registers workflow classes
|
||||
3. Connects to Temporal and processes tasks
|
||||
4. Handles activities for target download/upload from MinIO
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import importlib
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Any
|
||||
|
||||
import yaml
|
||||
from temporalio.client import Client
|
||||
from temporalio.worker import Worker
|
||||
|
||||
# Add toolbox to path for workflow and activity imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
# Import common storage activities
|
||||
from toolbox.common.storage_activities import (
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
)
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv('LOG_LEVEL', 'INFO'),
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def discover_workflows(vertical: str) -> List[Any]:
|
||||
"""
|
||||
Discover workflows for this vertical from mounted toolbox.
|
||||
|
||||
Args:
|
||||
vertical: The vertical name (e.g., 'rust', 'android', 'web')
|
||||
|
||||
Returns:
|
||||
List of workflow classes decorated with @workflow.defn
|
||||
"""
|
||||
workflows = []
|
||||
toolbox_path = Path("/app/toolbox/workflows")
|
||||
|
||||
if not toolbox_path.exists():
|
||||
logger.warning(f"Toolbox path does not exist: {toolbox_path}")
|
||||
return workflows
|
||||
|
||||
logger.info(f"Scanning for workflows in: {toolbox_path}")
|
||||
|
||||
for workflow_dir in toolbox_path.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
metadata_file = workflow_dir / "metadata.yaml"
|
||||
if not metadata_file.exists():
|
||||
logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Parse metadata
|
||||
with open(metadata_file) as f:
|
||||
metadata = yaml.safe_load(f)
|
||||
|
||||
# Check if workflow is for this vertical
|
||||
workflow_vertical = metadata.get("vertical")
|
||||
if workflow_vertical != vertical:
|
||||
logger.debug(
|
||||
f"Workflow {workflow_dir.name} is for vertical '{workflow_vertical}', "
|
||||
f"not '{vertical}', skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Check if workflow.py exists
|
||||
workflow_file = workflow_dir / "workflow.py"
|
||||
if not workflow_file.exists():
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Dynamically import workflow module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.workflow"
|
||||
logger.info(f"Importing workflow module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import workflow module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @workflow.defn decorated classes
|
||||
found_workflows = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isclass):
|
||||
# Check if class has Temporal workflow definition
|
||||
if hasattr(obj, '__temporal_workflow_definition'):
|
||||
workflows.append(obj)
|
||||
found_workflows = True
|
||||
logger.info(
|
||||
f"✓ Discovered workflow: {name} from {workflow_dir.name} "
|
||||
f"(vertical: {vertical})"
|
||||
)
|
||||
|
||||
if not found_workflows:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has no @workflow.defn decorated classes"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing workflow {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(workflows)} workflows for vertical '{vertical}'")
|
||||
return workflows
|
||||
|
||||
|
||||
async def discover_activities(workflows_dir: Path) -> List[Any]:
|
||||
"""
|
||||
Discover activities from workflow directories.
|
||||
|
||||
Looks for activities.py files alongside workflow.py in each workflow directory.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to workflows directory
|
||||
|
||||
Returns:
|
||||
List of activity functions decorated with @activity.defn
|
||||
"""
|
||||
activities = []
|
||||
|
||||
if not workflows_dir.exists():
|
||||
logger.warning(f"Workflows directory does not exist: {workflows_dir}")
|
||||
return activities
|
||||
|
||||
logger.info(f"Scanning for workflow activities in: {workflows_dir}")
|
||||
|
||||
for workflow_dir in workflows_dir.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
# Check if activities.py exists
|
||||
activities_file = workflow_dir / "activities.py"
|
||||
if not activities_file.exists():
|
||||
logger.debug(f"No activities.py in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Dynamically import activities module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.activities"
|
||||
logger.info(f"Importing activities module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import activities module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @activity.defn decorated functions
|
||||
found_activities = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isfunction):
|
||||
# Check if function has Temporal activity definition
|
||||
if hasattr(obj, '__temporal_activity_definition'):
|
||||
activities.append(obj)
|
||||
found_activities = True
|
||||
logger.info(
|
||||
f"✓ Discovered activity: {name} from {workflow_dir.name}"
|
||||
)
|
||||
|
||||
if not found_activities:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has activities.py but no @activity.defn decorated functions"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing activities from {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(activities)} workflow-specific activities")
|
||||
return activities
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main worker entry point"""
|
||||
# Get configuration from environment
|
||||
vertical = os.getenv("WORKER_VERTICAL", "rust")
|
||||
temporal_address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
|
||||
temporal_namespace = os.getenv("TEMPORAL_NAMESPACE", "default")
|
||||
task_queue = os.getenv("WORKER_TASK_QUEUE", f"{vertical}-queue")
|
||||
max_concurrent_activities = int(os.getenv("MAX_CONCURRENT_ACTIVITIES", "5"))
|
||||
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"FuzzForge Vertical Worker: {vertical}")
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"Temporal Address: {temporal_address}")
|
||||
logger.info(f"Temporal Namespace: {temporal_namespace}")
|
||||
logger.info(f"Task Queue: {task_queue}")
|
||||
logger.info(f"Max Concurrent Activities: {max_concurrent_activities}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
# Discover workflows for this vertical
|
||||
logger.info(f"Discovering workflows for vertical: {vertical}")
|
||||
workflows = await discover_workflows(vertical)
|
||||
|
||||
if not workflows:
|
||||
logger.error(f"No workflows found for vertical: {vertical}")
|
||||
logger.error("Worker cannot start without workflows. Exiting...")
|
||||
sys.exit(1)
|
||||
|
||||
# Discover activities from workflow directories
|
||||
logger.info("Discovering workflow-specific activities...")
|
||||
workflows_dir = Path("/app/toolbox/workflows")
|
||||
workflow_activities = await discover_activities(workflows_dir)
|
||||
|
||||
# Combine common storage activities with workflow-specific activities
|
||||
activities = [
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
] + workflow_activities
|
||||
|
||||
logger.info(
|
||||
f"Total activities registered: {len(activities)} "
|
||||
f"(3 common + {len(workflow_activities)} workflow-specific)"
|
||||
)
|
||||
|
||||
# Connect to Temporal
|
||||
logger.info(f"Connecting to Temporal at {temporal_address}...")
|
||||
try:
|
||||
client = await Client.connect(
|
||||
temporal_address,
|
||||
namespace=temporal_namespace
|
||||
)
|
||||
logger.info("✓ Connected to Temporal successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to connect to Temporal: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Create worker with discovered workflows and activities
|
||||
logger.info(f"Creating worker on task queue: {task_queue}")
|
||||
|
||||
try:
|
||||
worker = Worker(
|
||||
client,
|
||||
task_queue=task_queue,
|
||||
workflows=workflows,
|
||||
activities=activities,
|
||||
max_concurrent_activities=max_concurrent_activities
|
||||
)
|
||||
logger.info("✓ Worker created successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create worker: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Start worker
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"🚀 Worker started for vertical '{vertical}'")
|
||||
logger.info(f"📦 Registered {len(workflows)} workflows")
|
||||
logger.info(f"⚙️ Registered {len(activities)} activities")
|
||||
logger.info(f"📨 Listening on task queue: {task_queue}")
|
||||
logger.info("=" * 60)
|
||||
logger.info("Worker is ready to process tasks...")
|
||||
|
||||
try:
|
||||
await worker.run()
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Shutting down worker (keyboard interrupt)...")
|
||||
except Exception as e:
|
||||
logger.error(f"Worker error: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
asyncio.run(main())
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Worker stopped")
|
||||
except Exception as e:
|
||||
logger.error(f"Fatal error: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
87
workers/rust/Dockerfile
Normal file
87
workers/rust/Dockerfile
Normal file
@@ -0,0 +1,87 @@
|
||||
# FuzzForge Vertical Worker: Rust/Native Security
|
||||
#
|
||||
# Pre-installed tools for Rust and native binary security analysis:
|
||||
# - Rust toolchain (rustc, cargo)
|
||||
# - AFL++ (fuzzing)
|
||||
# - cargo-fuzz (Rust fuzzing)
|
||||
# - gdb (debugging)
|
||||
# - valgrind (memory analysis)
|
||||
# - AddressSanitizer/MemorySanitizer support
|
||||
# - Common reverse engineering tools
|
||||
|
||||
FROM rust:1.83-slim-bookworm
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
# Build essentials
|
||||
build-essential \
|
||||
cmake \
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
pkg-config \
|
||||
libssl-dev \
|
||||
# AFL++ dependencies
|
||||
clang \
|
||||
llvm \
|
||||
# Debugging and analysis tools
|
||||
gdb \
|
||||
valgrind \
|
||||
strace \
|
||||
# Binary analysis (binutils includes objdump, readelf, etc.)
|
||||
binutils \
|
||||
# Network tools
|
||||
netcat-openbsd \
|
||||
tcpdump \
|
||||
# Python for Temporal worker
|
||||
python3 \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
# Cleanup
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install AFL++
|
||||
RUN git clone https://github.com/AFLplusplus/AFLplusplus /tmp/aflplusplus && \
|
||||
cd /tmp/aflplusplus && \
|
||||
make all && \
|
||||
make install && \
|
||||
cd / && \
|
||||
rm -rf /tmp/aflplusplus
|
||||
|
||||
# Install Rust toolchain components (nightly required for cargo-fuzz)
|
||||
RUN rustup install nightly && \
|
||||
rustup default nightly && \
|
||||
rustup component add rustfmt clippy && \
|
||||
rustup target add x86_64-unknown-linux-musl
|
||||
|
||||
# Install cargo-fuzz and other Rust security tools
|
||||
RUN cargo install --locked \
|
||||
cargo-fuzz \
|
||||
cargo-audit \
|
||||
cargo-outdated \
|
||||
cargo-tree
|
||||
|
||||
# Install Python dependencies for Temporal worker
|
||||
COPY requirements.txt /tmp/requirements.txt
|
||||
RUN pip3 install --break-system-packages --no-cache-dir -r /tmp/requirements.txt && \
|
||||
rm /tmp/requirements.txt
|
||||
|
||||
# Create cache directory for downloaded targets
|
||||
RUN mkdir -p /cache && chmod 755 /cache
|
||||
|
||||
# Copy worker entrypoint
|
||||
COPY worker.py /app/worker.py
|
||||
|
||||
# Add toolbox to Python path (mounted at runtime)
|
||||
ENV PYTHONPATH="/app:/app/toolbox:${PYTHONPATH}"
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
# Healthcheck
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
||||
CMD python3 -c "import sys; sys.exit(0)"
|
||||
|
||||
# Run worker
|
||||
CMD ["python3", "/app/worker.py"]
|
||||
22
workers/rust/requirements.txt
Normal file
22
workers/rust/requirements.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
# Temporal Python SDK
|
||||
temporalio>=1.5.0
|
||||
|
||||
# S3/MinIO client
|
||||
boto3>=1.34.0
|
||||
botocore>=1.34.0
|
||||
|
||||
# Data validation
|
||||
pydantic>=2.5.0
|
||||
|
||||
# YAML parsing
|
||||
PyYAML>=6.0.1
|
||||
|
||||
# Utilities
|
||||
python-dotenv>=1.0.0
|
||||
aiofiles>=23.2.1
|
||||
|
||||
# HTTP Client (for real-time stats reporting)
|
||||
httpx>=0.27.0
|
||||
|
||||
# Logging
|
||||
structlog>=24.1.0
|
||||
309
workers/rust/worker.py
Normal file
309
workers/rust/worker.py
Normal file
@@ -0,0 +1,309 @@
|
||||
"""
|
||||
FuzzForge Vertical Worker: Rust/Native Security
|
||||
|
||||
This worker:
|
||||
1. Discovers workflows for the 'rust' vertical from mounted toolbox
|
||||
2. Dynamically imports and registers workflow classes
|
||||
3. Connects to Temporal and processes tasks
|
||||
4. Handles activities for target download/upload from MinIO
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import importlib
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Any
|
||||
|
||||
import yaml
|
||||
from temporalio.client import Client
|
||||
from temporalio.worker import Worker
|
||||
|
||||
# Add toolbox to path for workflow and activity imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
# Import common storage activities
|
||||
from toolbox.common.storage_activities import (
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
)
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv('LOG_LEVEL', 'INFO'),
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def discover_workflows(vertical: str) -> List[Any]:
|
||||
"""
|
||||
Discover workflows for this vertical from mounted toolbox.
|
||||
|
||||
Args:
|
||||
vertical: The vertical name (e.g., 'rust', 'android', 'web')
|
||||
|
||||
Returns:
|
||||
List of workflow classes decorated with @workflow.defn
|
||||
"""
|
||||
workflows = []
|
||||
toolbox_path = Path("/app/toolbox/workflows")
|
||||
|
||||
if not toolbox_path.exists():
|
||||
logger.warning(f"Toolbox path does not exist: {toolbox_path}")
|
||||
return workflows
|
||||
|
||||
logger.info(f"Scanning for workflows in: {toolbox_path}")
|
||||
|
||||
for workflow_dir in toolbox_path.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
metadata_file = workflow_dir / "metadata.yaml"
|
||||
if not metadata_file.exists():
|
||||
logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Parse metadata
|
||||
with open(metadata_file) as f:
|
||||
metadata = yaml.safe_load(f)
|
||||
|
||||
# Check if workflow is for this vertical
|
||||
workflow_vertical = metadata.get("vertical")
|
||||
if workflow_vertical != vertical:
|
||||
logger.debug(
|
||||
f"Workflow {workflow_dir.name} is for vertical '{workflow_vertical}', "
|
||||
f"not '{vertical}', skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Check if workflow.py exists
|
||||
workflow_file = workflow_dir / "workflow.py"
|
||||
if not workflow_file.exists():
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Dynamically import workflow module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.workflow"
|
||||
logger.info(f"Importing workflow module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import workflow module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @workflow.defn decorated classes
|
||||
found_workflows = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isclass):
|
||||
# Check if class has Temporal workflow definition
|
||||
if hasattr(obj, '__temporal_workflow_definition'):
|
||||
workflows.append(obj)
|
||||
found_workflows = True
|
||||
logger.info(
|
||||
f"✓ Discovered workflow: {name} from {workflow_dir.name} "
|
||||
f"(vertical: {vertical})"
|
||||
)
|
||||
|
||||
if not found_workflows:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has no @workflow.defn decorated classes"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing workflow {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(workflows)} workflows for vertical '{vertical}'")
|
||||
return workflows
|
||||
|
||||
|
||||
async def discover_activities(workflows_dir: Path) -> List[Any]:
|
||||
"""
|
||||
Discover activities from workflow directories.
|
||||
|
||||
Looks for activities.py files alongside workflow.py in each workflow directory.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to workflows directory
|
||||
|
||||
Returns:
|
||||
List of activity functions decorated with @activity.defn
|
||||
"""
|
||||
activities = []
|
||||
|
||||
if not workflows_dir.exists():
|
||||
logger.warning(f"Workflows directory does not exist: {workflows_dir}")
|
||||
return activities
|
||||
|
||||
logger.info(f"Scanning for workflow activities in: {workflows_dir}")
|
||||
|
||||
for workflow_dir in workflows_dir.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
# Check if activities.py exists
|
||||
activities_file = workflow_dir / "activities.py"
|
||||
if not activities_file.exists():
|
||||
logger.debug(f"No activities.py in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Dynamically import activities module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.activities"
|
||||
logger.info(f"Importing activities module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import activities module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @activity.defn decorated functions
|
||||
found_activities = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isfunction):
|
||||
# Check if function has Temporal activity definition
|
||||
if hasattr(obj, '__temporal_activity_definition'):
|
||||
activities.append(obj)
|
||||
found_activities = True
|
||||
logger.info(
|
||||
f"✓ Discovered activity: {name} from {workflow_dir.name}"
|
||||
)
|
||||
|
||||
if not found_activities:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has activities.py but no @activity.defn decorated functions"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing activities from {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(activities)} workflow-specific activities")
|
||||
return activities
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main worker entry point"""
|
||||
# Get configuration from environment
|
||||
vertical = os.getenv("WORKER_VERTICAL", "rust")
|
||||
temporal_address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
|
||||
temporal_namespace = os.getenv("TEMPORAL_NAMESPACE", "default")
|
||||
task_queue = os.getenv("WORKER_TASK_QUEUE", f"{vertical}-queue")
|
||||
max_concurrent_activities = int(os.getenv("MAX_CONCURRENT_ACTIVITIES", "5"))
|
||||
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"FuzzForge Vertical Worker: {vertical}")
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"Temporal Address: {temporal_address}")
|
||||
logger.info(f"Temporal Namespace: {temporal_namespace}")
|
||||
logger.info(f"Task Queue: {task_queue}")
|
||||
logger.info(f"Max Concurrent Activities: {max_concurrent_activities}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
# Discover workflows for this vertical
|
||||
logger.info(f"Discovering workflows for vertical: {vertical}")
|
||||
workflows = await discover_workflows(vertical)
|
||||
|
||||
if not workflows:
|
||||
logger.error(f"No workflows found for vertical: {vertical}")
|
||||
logger.error("Worker cannot start without workflows. Exiting...")
|
||||
sys.exit(1)
|
||||
|
||||
# Discover activities from workflow directories
|
||||
logger.info("Discovering workflow-specific activities...")
|
||||
workflows_dir = Path("/app/toolbox/workflows")
|
||||
workflow_activities = await discover_activities(workflows_dir)
|
||||
|
||||
# Combine common storage activities with workflow-specific activities
|
||||
activities = [
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
] + workflow_activities
|
||||
|
||||
logger.info(
|
||||
f"Total activities registered: {len(activities)} "
|
||||
f"(3 common + {len(workflow_activities)} workflow-specific)"
|
||||
)
|
||||
|
||||
# Connect to Temporal
|
||||
logger.info(f"Connecting to Temporal at {temporal_address}...")
|
||||
try:
|
||||
client = await Client.connect(
|
||||
temporal_address,
|
||||
namespace=temporal_namespace
|
||||
)
|
||||
logger.info("✓ Connected to Temporal successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to connect to Temporal: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Create worker with discovered workflows and activities
|
||||
logger.info(f"Creating worker on task queue: {task_queue}")
|
||||
|
||||
try:
|
||||
worker = Worker(
|
||||
client,
|
||||
task_queue=task_queue,
|
||||
workflows=workflows,
|
||||
activities=activities,
|
||||
max_concurrent_activities=max_concurrent_activities
|
||||
)
|
||||
logger.info("✓ Worker created successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create worker: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Start worker
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"🚀 Worker started for vertical '{vertical}'")
|
||||
logger.info(f"📦 Registered {len(workflows)} workflows")
|
||||
logger.info(f"⚙️ Registered {len(activities)} activities")
|
||||
logger.info(f"📨 Listening on task queue: {task_queue}")
|
||||
logger.info("=" * 60)
|
||||
logger.info("Worker is ready to process tasks...")
|
||||
|
||||
try:
|
||||
await worker.run()
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Shutting down worker (keyboard interrupt)...")
|
||||
except Exception as e:
|
||||
logger.error(f"Worker error: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
asyncio.run(main())
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Worker stopped")
|
||||
except Exception as e:
|
||||
logger.error(f"Fatal error: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
Reference in New Issue
Block a user