Compare commits

..

1 Commits

Author SHA1 Message Date
tmarschutz
5da3f1e071 first commit 2025-10-03 11:45:17 +02:00
10868 changed files with 1448668 additions and 14157 deletions

48
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,48 @@
---
name: 🐛 Bug Report
about: Create a report to help us improve FuzzForge
title: "[BUG] "
labels: bug
assignees: ''
---
## Description
A clear and concise description of the bug you encountered.
## Environment
Please provide details about your environment:
- **OS**: (e.g., macOS 14.0, Ubuntu 22.04, Windows 11)
- **Python version**: (e.g., 3.9.7)
- **Docker version**: (e.g., 24.0.6)
- **FuzzForge version**: (e.g., 0.6.0)
## Steps to Reproduce
Clear steps to recreate the issue:
1. Go to '...'
2. Run command '...'
3. Click on '...'
4. See error
## Expected Behavior
A clear and concise description of what should happen.
## Actual Behavior
A clear and concise description of what actually happens.
## Logs
Please include relevant error messages and stack traces:
```
Paste logs here
```
## Screenshots
If applicable, add screenshots to help explain your problem.
## Additional Context
Add any other context about the problem here (workflow used, specific target, configuration, etc.).
---
💬 **Need help?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) for real-time support.

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: 💬 Community Discord
url: https://discord.com/invite/acqv9FVG
about: Join our Discord to discuss ideas, workflows, and security research with the community.
- name: 📖 Documentation
url: https://github.com/FuzzingLabs/fuzzforge_ai/tree/main/docs
about: Check our documentation for guides, tutorials, and API reference.

View File

@@ -0,0 +1,38 @@
---
name: ✨ Feature Request
about: Suggest an idea for FuzzForge
title: "[FEATURE] "
labels: enhancement
assignees: ''
---
## Use Case
Why is this feature needed? Describe the problem you're trying to solve or the improvement you'd like to see.
## Proposed Solution
How should it work? Describe your ideal solution in detail.
## Alternatives
What other approaches have you considered? List any alternative solutions or features you've thought about.
## Implementation
**(Optional)** Do you have any technical considerations or implementation ideas?
## Category
What area of FuzzForge would this feature enhance?
- [ ] 🤖 AI Agents for Security
- [ ] 🛠 Workflow Automation
- [ ] 📈 Vulnerability Research
- [ ] 🔗 Fuzzer Integration
- [ ] 🌐 Community Marketplace
- [ ] 🔒 Enterprise Features
- [ ] 📚 Documentation
- [ ] 🎯 Other
## Additional Context
Add any other context, screenshots, references, or examples about the feature request here.
---
💬 **Want to discuss this idea?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) to collaborate with other contributors!

View File

@@ -0,0 +1,67 @@
---
name: 🔄 Workflow Submission
about: Contribute a security workflow or module to the FuzzForge community
title: "[WORKFLOW] "
labels: workflow, community
assignees: ''
---
## Workflow Name
Provide a short, descriptive name for your workflow.
## Description
Explain what this workflow does and what security problems it solves.
## Category
What type of security workflow is this?
- [ ] 🛡️ **Security Assessment** - Static analysis, vulnerability scanning
- [ ] 🔍 **Secret Detection** - Credential and secret scanning
- [ ] 🎯 **Fuzzing** - Dynamic testing and fuzz testing
- [ ] 🔄 **Reverse Engineering** - Binary analysis and decompilation
- [ ] 🌐 **Infrastructure Security** - Container, cloud, network security
- [ ] 🔒 **Penetration Testing** - Offensive security testing
- [ ] 📋 **Other** - Please describe
## Files
Please attach or provide links to your workflow files:
- [ ] `workflow.py` - Main Prefect flow implementation
- [ ] `Dockerfile` - Container definition
- [ ] `metadata.yaml` - Workflow metadata
- [ ] Test files or examples
- [ ] Documentation
## Testing
How did you test this workflow? Please describe:
- **Test targets used**: (e.g., vulnerable_app, custom test cases)
- **Expected outputs**: (e.g., SARIF format, specific vulnerabilities detected)
- **Validation results**: (e.g., X vulnerabilities found, Y false positives)
## SARIF Compliance
- [ ] My workflow outputs results in SARIF format
- [ ] Results include severity levels and descriptions
- [ ] Code flow information is provided where applicable
## Security Guidelines
- [ ] This workflow focuses on **defensive security** purposes only
- [ ] I have not included any malicious tools or capabilities
- [ ] All secrets/credentials are parameterized (no hardcoded values)
- [ ] I have followed responsible disclosure practices
## Registry Integration
Have you updated the workflow registry?
- [ ] Added import statement to `backend/toolbox/workflows/registry.py`
- [ ] Added registry entry with proper metadata
- [ ] Tested workflow registration and deployment
## Additional Notes
Anything else the maintainers should know about this workflow?
---
🚀 **Thank you for contributing to FuzzForge!** Your workflow will help the security community automate and scale their testing efforts.
💬 **Questions?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) to discuss your contribution!

70
.github/workflows/ci-python.yml vendored Normal file
View File

@@ -0,0 +1,70 @@
name: Python CI
# This is a dumb Ci to ensure that the python client and backend builds correctly
# It could be optimized to run faster, building, testing and linting only changed code
# but for now it is good enough. It runs on every push and PR to any branch.
# It also runs on demand.
on:
workflow_dispatch:
push:
paths:
- "ai/**"
- "backend/**"
- "cli/**"
- "sdk/**"
- "src/**"
pull_request:
paths:
- "ai/**"
- "backend/**"
- "cli/**"
- "sdk/**"
- "src/**"
jobs:
ci:
name: ci
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Setup uv
uses: astral-sh/setup-uv@v6
with:
enable-cache: true
- name: Set up Python
run: uv python install
# Validate no obvious issues
# Quick hack because CLI returns non-zero exit code when no args are provided
- name: Run base command
run: |
set +e
uv run ff
if [ $? -ne 2 ]; then
echo "Expected exit code 2 from 'uv run ff', got $?"
exit 1
fi
- name: Build fuzzforge_ai package
run: uv build
- name: Build ai package
working-directory: ai
run: uv build
- name: Build cli package
working-directory: cli
run: uv build
- name: Build sdk package
working-directory: sdk
run: uv build
- name: Build backend package
working-directory: backend
run: uv build

View File

@@ -1,86 +0,0 @@
name: CI
on:
push:
branches: [main, dev, feature/*]
pull_request:
branches: [main, dev]
workflow_dispatch:
jobs:
lint-and-typecheck:
name: Lint & Type Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
- name: Set up Python
run: uv python install 3.14
- name: Install dependencies
run: uv sync
- name: Ruff check (fuzzforge-cli)
run: |
cd fuzzforge-cli
uv run --extra lints ruff check src/
- name: Ruff check (fuzzforge-mcp)
run: |
cd fuzzforge-mcp
uv run --extra lints ruff check src/
- name: Ruff check (fuzzforge-common)
run: |
cd fuzzforge-common
uv run --extra lints ruff check src/
- name: Mypy type check (fuzzforge-cli)
run: |
cd fuzzforge-cli
uv run --extra lints mypy src/
- name: Mypy type check (fuzzforge-mcp)
run: |
cd fuzzforge-mcp
uv run --extra lints mypy src/
# NOTE: Mypy check for fuzzforge-common temporarily disabled
# due to 37 pre-existing type errors in legacy code.
# TODO: Fix type errors and re-enable strict checking
#- name: Mypy type check (fuzzforge-common)
# run: |
# cd fuzzforge-common
# uv run --extra lints mypy src/
test:
name: Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
- name: Set up Python
run: uv python install 3.14
- name: Install dependencies
run: uv sync --all-extras
- name: Run MCP tests
run: |
cd fuzzforge-mcp
uv run --extra tests pytest -v
- name: Run common tests
run: |
cd fuzzforge-common
uv run --extra tests pytest -v

57
.github/workflows/docs-deploy.yml vendored Normal file
View File

@@ -0,0 +1,57 @@
name: Deploy Docusaurus to GitHub Pages
on:
workflow_dispatch:
push:
branches:
- master
paths:
- "docs/**"
jobs:
build:
name: Build Docusaurus
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./docs
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: 24
cache: npm
cache-dependency-path: "**/package-lock.json"
- name: Install dependencies
run: npm ci
- name: Build website
run: npm run build
- name: Upload Build Artifact
uses: actions/upload-pages-artifact@v3
with:
path: ./docs/build
deploy:
name: Deploy to GitHub Pages
needs: build
# Grant GITHUB_TOKEN the permissions required to make a Pages deployment
permissions:
pages: write # to deploy to Pages
id-token: write # to verify the deployment originates from an appropriate source
# Deploy to the github-pages environment
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4

33
.github/workflows/docs-test-deploy.yml vendored Normal file
View File

@@ -0,0 +1,33 @@
name: Docusaurus test deployment
on:
workflow_dispatch:
push:
paths:
- "docs/**"
pull_request:
paths:
- "docs/**"
jobs:
test-deploy:
name: Test deployment
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./docs
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: 24
cache: npm
cache-dependency-path: "**/package-lock.json"
- name: Install dependencies
run: npm ci
- name: Test build website
run: npm run build

View File

@@ -1,49 +0,0 @@
name: MCP Server Smoke Test
on:
push:
branches: [main, dev]
pull_request:
branches: [main, dev]
workflow_dispatch:
jobs:
mcp-server:
name: MCP Server Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
- name: Set up Python
run: uv python install 3.14
- name: Install dependencies
run: uv sync --all-extras
- name: Start MCP server in background
run: |
cd fuzzforge-mcp
nohup uv run python -m fuzzforge_mcp.server > server.log 2>&1 &
echo $! > server.pid
sleep 3
- name: Run MCP tool tests
run: |
cd fuzzforge-mcp
uv run --extra tests pytest tests/test_resources.py -v
- name: Stop MCP server
if: always()
run: |
if [ -f fuzzforge-mcp/server.pid ]; then
kill $(cat fuzzforge-mcp/server.pid) || true
fi
- name: Show server logs
if: failure()
run: cat fuzzforge-mcp/server.log || true

298
.gitignore vendored
View File

@@ -1,15 +1,291 @@
*.egg-info
*.whl
# ========================================
# FuzzForge Platform .gitignore
# ========================================
# -------------------- Python --------------------
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Environments
.env
.mypy_cache
.pytest_cache
.ruff_cache
.venv
.vscode
__pycache__
env/
venv/
ENV/
env.bak/
venv.bak/
.python-version
# Podman/Docker container storage artifacts
~/.fuzzforge/
# UV package manager
uv.lock
# But allow uv.lock in CLI and SDK for reproducible builds
!cli/uv.lock
!sdk/uv.lock
!backend/uv.lock
# User-specific hub config (generated at runtime)
hub-config.json
# MyPy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# -------------------- IDE / Editor --------------------
# VSCode
.vscode/
*.code-workspace
# PyCharm
.idea/
# Vim
*.swp
*.swo
*~
# Emacs
*~
\#*\#
/.emacs.desktop
/.emacs.desktop.lock
*.elc
auto-save-list
tramp
.\#*
# Sublime Text
*.sublime-project
*.sublime-workspace
# -------------------- Operating System --------------------
# macOS
.DS_Store
.AppleDouble
.LSOverride
Icon
._*
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
# Windows
Thumbs.db
Thumbs.db:encryptable
ehthumbs.db
ehthumbs_vista.db
*.stackdump
[Dd]esktop.ini
$RECYCLE.BIN/
*.cab
*.msi
*.msix
*.msm
*.msp
*.lnk
# Linux
*~
.fuse_hidden*
.directory
.Trash-*
.nfs*
# -------------------- Docker --------------------
# Docker volumes and data
docker-volumes/
.dockerignore.bak
# Docker Compose override files
docker-compose.override.yml
docker-compose.override.yaml
# -------------------- Database --------------------
# SQLite
*.sqlite
*.sqlite3
*.db
*.db-journal
*.db-shm
*.db-wal
# PostgreSQL
*.sql.backup
# -------------------- Logs --------------------
# General logs
*.log
logs/
*.log.*
# -------------------- FuzzForge Specific --------------------
# FuzzForge project directories (user projects should manage their own .gitignore)
.fuzzforge/
# Test project databases and configurations
test_projects/*/.fuzzforge/
test_projects/*/findings.db*
test_projects/*/config.yaml
test_projects/*/.gitignore
# Local development configurations
local_config.yaml
dev_config.yaml
.env.local
.env.development
# Generated reports and outputs
reports/
output/
findings/
*.sarif.json
*.html.report
security_report.*
# Temporary files
tmp/
temp/
*.tmp
*.temp
# Backup files
*.bak
*.backup
*~
# -------------------- Node.js (for any JS tooling) --------------------
node_modules/
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.npm
# -------------------- Security --------------------
# Never commit these files
*.pem
*.key
*.p12
*.pfx
secret*
secrets/
credentials*
api_keys*
.env.production
.env.staging
# AWS credentials
.aws/
# -------------------- Build Artifacts --------------------
# Python builds
build/
dist/
*.wheel
# Documentation builds
docs/_build/
site/
# -------------------- Miscellaneous --------------------
# Jupyter Notebook checkpoints
.ipynb_checkpoints
# IPython history
.ipython/
# Rope project settings
.ropeproject
# spyderproject
.spyderproject
.spyproject
# mkdocs documentation
/site
# Local Netlify folder
.netlify
# -------------------- Project Specific Overrides --------------------
# Allow specific test project files that should be tracked
!test_projects/*/src/
!test_projects/*/scripts/
!test_projects/*/config/
!test_projects/*/data/
!test_projects/*/README.md
!test_projects/*/*.py
!test_projects/*/*.js
!test_projects/*/*.php
!test_projects/*/*.java
# But exclude their sensitive content
test_projects/*/.env
test_projects/*/private_key.pem
test_projects/*/wallet.json
test_projects/*/.npmrc
test_projects/*/.git-credentials
test_projects/*/credentials.*
test_projects/*/api_keys.*

View File

@@ -1 +0,0 @@
3.14.2

View File

@@ -1,21 +1,17 @@
# Contributing to FuzzForge AI
# Contributing to FuzzForge 🤝
Thank you for your interest in contributing to FuzzForge AI! We welcome contributions from the community and are excited to collaborate with you.
Thank you for your interest in contributing to FuzzForge! We welcome contributions from the community and are excited to collaborate with you.
**Our Vision**: FuzzForge aims to be a **universal platform for security research** across all cybersecurity domains. Through our modular architecture, any security tool—from fuzzing engines to cloud scanners, from mobile app analyzers to IoT security tools—can be integrated as a containerized module and controlled via AI agents.
## 🌟 Ways to Contribute
## Ways to Contribute
- 🐛 **Bug Reports** - Help us identify and fix issues
- 💡 **Feature Requests** - Suggest new capabilities and improvements
- 🔧 **Code Contributions** - Submit bug fixes, features, and enhancements
- 📚 **Documentation** - Improve guides, tutorials, and API documentation
- 🧪 **Testing** - Help test new features and report issues
- 🛡️ **Security Workflows** - Contribute new security analysis workflows
- **Security Modules** - Create modules for any cybersecurity domain (AppSec, NetSec, Cloud, IoT, etc.)
- **Bug Reports** - Help us identify and fix issues
- **Feature Requests** - Suggest new capabilities and improvements
- **Core Features** - Contribute to the MCP server, runner, or CLI
- **Documentation** - Improve guides, tutorials, and module documentation
- **Testing** - Help test new features and report issues
- **AI Integration** - Improve MCP tools and AI agent interactions
- **Tool Integrations** - Wrap existing security tools as FuzzForge modules
## Contribution Guidelines
## 📋 Contribution Guidelines
### Code Style
@@ -48,10 +44,9 @@ We use conventional commits for clear history:
**Examples:**
```
feat(modules): add cloud security scanner module
fix(mcp): resolve module listing timeout
docs(sdk): update module development guide
test(runner): add container execution tests
feat(workflows): add new static analysis workflow for Go
fix(api): resolve authentication timeout issue
docs(readme): update installation instructions
```
### Pull Request Process
@@ -70,14 +65,9 @@ test(runner): add container execution tests
3. **Test Your Changes**
```bash
# Test modules
FUZZFORGE_MODULES_PATH=./fuzzforge-modules uv run fuzzforge modules list
# Run a module
uv run fuzzforge modules run your-module --assets ./test-assets
# Test MCP integration (if applicable)
uv run fuzzforge mcp status
# Test workflows
cd test_projects/vulnerable_app/
ff workflow security_assessment .
```
4. **Submit Pull Request**
@@ -86,353 +76,64 @@ test(runner): add container execution tests
- Link related issues using `Fixes #123` or `Closes #123`
- Ensure all CI checks pass
## Module Development
## 🛡️ Security Workflow Development
FuzzForge uses a modular architecture where security tools run as isolated containers. The `fuzzforge-modules-sdk` provides everything you need to create new modules.
### Creating New Workflows
**Documentation:**
- [Module SDK Documentation](fuzzforge-modules/fuzzforge-modules-sdk/README.md) - Complete SDK reference
- [Module Template](fuzzforge-modules/fuzzforge-module-template/) - Starting point for new modules
- [USAGE Guide](USAGE.md) - Setup and installation instructions
### Creating a New Module
1. **Use the Module Template**
```bash
# Generate a new module from template
cd fuzzforge-modules/
cp -r fuzzforge-module-template my-new-module
cd my-new-module
1. **Workflow Structure**
```
backend/toolbox/workflows/your_workflow/
├── __init__.py
├── workflow.py # Main Prefect flow
├── metadata.yaml # Workflow metadata
└── Dockerfile # Container definition
```
2. **Module Structure**
```
my-new-module/
├── Dockerfile # Container definition
├── Makefile # Build commands
├── README.md # Module documentation
├── pyproject.toml # Python dependencies
├── mypy.ini # Type checking config
├── ruff.toml # Linting config
└── src/
└── module/
├── __init__.py
├── __main__.py # Entry point
├── mod.py # Main module logic
├── models.py # Pydantic models
└── settings.py # Configuration
```
3. **Implement Your Module**
Edit `src/module/mod.py`:
2. **Register Your Workflow**
Add your workflow to `backend/toolbox/workflows/registry.py`:
```python
from fuzzforge_modules_sdk.api.modules import BaseModule
from fuzzforge_modules_sdk.api.models import ModuleResult
from .models import MyModuleConfig, MyModuleOutput
class MyModule(BaseModule[MyModuleConfig, MyModuleOutput]):
"""Your module description."""
def execute(self) -> ModuleResult[MyModuleOutput]:
"""Main execution logic."""
# Access input assets
assets = self.input_path
# Your security tool logic here
results = self.run_analysis(assets)
# Return structured results
return ModuleResult(
success=True,
output=MyModuleOutput(
findings=results,
summary="Analysis complete"
)
)
# Import your workflow
from .your_workflow.workflow import main_flow as your_workflow_flow
# Add to registry
WORKFLOW_REGISTRY["your_workflow"] = {
"flow": your_workflow_flow,
"module_path": "toolbox.workflows.your_workflow.workflow",
"function_name": "main_flow",
"description": "Description of your workflow",
"version": "1.0.0",
"author": "Your Name",
"tags": ["tag1", "tag2"]
}
```
4. **Define Configuration Models**
Edit `src/module/models.py`:
```python
from pydantic import BaseModel, Field
from fuzzforge_modules_sdk.api.models import BaseModuleConfig, BaseModuleOutput
class MyModuleConfig(BaseModuleConfig):
"""Configuration for your module."""
timeout: int = Field(default=300, description="Timeout in seconds")
max_iterations: int = Field(default=1000, description="Max iterations")
class MyModuleOutput(BaseModuleOutput):
"""Output from your module."""
findings: list[dict] = Field(default_factory=list)
coverage: float = Field(default=0.0)
```
5. **Build Your Module**
```bash
# Build the SDK first (if not already done)
cd ../fuzzforge-modules-sdk
uv build
mkdir -p .wheels
cp ../../dist/fuzzforge_modules_sdk-*.whl .wheels/
cd ../..
docker build -t localhost/fuzzforge-modules-sdk:0.1.0 fuzzforge-modules/fuzzforge-modules-sdk/
# Build your module
cd fuzzforge-modules/my-new-module
docker build -t fuzzforge-my-new-module:0.1.0 .
```
6. **Test Your Module**
```bash
# Run with test assets
uv run fuzzforge modules run my-new-module --assets ./test-assets
# Check module info
uv run fuzzforge modules info my-new-module
```
### Module Development Guidelines
**Important Conventions:**
- **Input/Output**: Use `/fuzzforge/input` for assets and `/fuzzforge/output` for results
- **Configuration**: Support JSON configuration via stdin or file
- **Logging**: Use structured logging (structlog is pre-configured)
- **Error Handling**: Return proper exit codes and error messages
- **Security**: Run as non-root user when possible
- **Documentation**: Include clear README with usage examples
- **Dependencies**: Minimize container size, use multi-stage builds
**See also:**
- [Module SDK API Reference](fuzzforge-modules/fuzzforge-modules-sdk/src/fuzzforge_modules_sdk/api/)
- [Dockerfile Best Practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
### Module Types
FuzzForge is designed to support modules across **all cybersecurity domains**. The modular architecture allows any security tool to be containerized and integrated. Here are the main categories:
**Application Security**
- Fuzzing engines (coverage-guided, grammar-based, mutation-based)
- Static analysis (SAST, code quality, dependency scanning)
- Dynamic analysis (DAST, runtime analysis, instrumentation)
- Test validation and coverage analysis
- Crash analysis and exploit detection
**Network & Infrastructure Security**
- Network scanning and service enumeration
- Protocol analysis and fuzzing
- Firewall and configuration testing
- Cloud security (AWS/Azure/GCP misconfiguration detection, IAM analysis)
- Container security (image scanning, Kubernetes security)
**Web & API Security**
- Web vulnerability scanners (XSS, SQL injection, CSRF)
- Authentication and session testing
- API security (REST/GraphQL/gRPC testing, fuzzing)
- SSL/TLS analysis
**Binary & Reverse Engineering**
- Binary analysis and disassembly
- Malware sandboxing and behavior analysis
- Exploit development tools
- Firmware extraction and analysis
**Mobile & IoT Security**
- Mobile app analysis (Android/iOS static/dynamic analysis)
- IoT device security and firmware analysis
- SCADA/ICS and industrial protocol testing
- Automotive security (CAN bus, ECU testing)
**Data & Compliance**
- Database security testing
- Encryption and cryptography analysis
- Secrets and credential detection
- Privacy tools (PII detection, GDPR compliance)
- Compliance checkers (PCI-DSS, HIPAA, SOC2, ISO27001)
**Threat Intelligence & Risk**
- OSINT and reconnaissance tools
- Threat hunting and IOC correlation
- Risk assessment and attack surface mapping
- Security audit and policy validation
**Emerging Technologies**
- AI/ML security (model poisoning, adversarial testing)
- Blockchain and smart contract analysis
- Quantum-safe cryptography testing
**Custom & Integration**
- Domain-specific security tools
- Bridges to existing security tools
- Multi-tool orchestration and result aggregation
### Example: Simple Security Scanner Module
```python
# src/module/mod.py
from pathlib import Path
from fuzzforge_modules_sdk.api.modules import BaseModule
from fuzzforge_modules_sdk.api.models import ModuleResult
from .models import ScannerConfig, ScannerOutput
class SecurityScanner(BaseModule[ScannerConfig, ScannerOutput]):
"""Scans for common security issues in code."""
def execute(self) -> ModuleResult[ScannerOutput]:
findings = []
# Scan all source files
for file_path in self.input_path.rglob("*"):
if file_path.is_file():
findings.extend(self.scan_file(file_path))
return ModuleResult(
success=True,
output=ScannerOutput(
findings=findings,
files_scanned=len(list(self.input_path.rglob("*")))
)
)
def scan_file(self, path: Path) -> list[dict]:
"""Scan a single file for security issues."""
# Your scanning logic here
return []
```
### Testing Modules
Create tests in `tests/`:
```python
import pytest
from module.mod import MyModule
from module.models import MyModuleConfig
def test_module_execution():
config = MyModuleConfig(timeout=60)
module = MyModule(config=config, input_path=Path("test_assets"))
result = module.execute()
assert result.success
assert len(result.output.findings) >= 0
```
Run tests:
```bash
uv run pytest
```
3. **Testing Workflows**
- Create test cases in `test_projects/vulnerable_app/`
- Ensure SARIF output format compliance
- Test with various input scenarios
### Security Guidelines
**Critical Requirements:**
- Never commit secrets, API keys, or credentials
- Focus on **defensive security** tools and analysis
- Do not create tools for malicious purposes
- Test modules thoroughly before submission
- Follow responsible disclosure for security issues
- Use minimal, secure base images for containers
- Avoid running containers as root when possible
- 🔐 Never commit secrets, API keys, or credentials
- 🛡️ Focus on **defensive security** tools and analysis
- ⚠️ Do not create tools for malicious purposes
- 🧪 Test workflows thoroughly before submission
- 📋 Follow responsible disclosure for security issues
**Security Resources:**
- [OWASP Container Security](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html)
- [CIS Docker Benchmarks](https://www.cisecurity.org/benchmark/docker)
## Contributing to Core Features
Beyond modules, you can contribute to FuzzForge's core components.
**Useful Resources:**
- [Project Structure](README.md) - Overview of the codebase
- [USAGE Guide](USAGE.md) - Installation and setup
- Python best practices: [PEP 8](https://pep8.org/)
### Core Components
- **fuzzforge-mcp** - MCP server for AI agent integration
- **fuzzforge-runner** - Module execution engine
- **fuzzforge-cli** - Command-line interface
- **fuzzforge-common** - Shared utilities and sandbox engines
- **fuzzforge-types** - Type definitions and schemas
### Development Setup
1. **Clone and Install**
```bash
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
cd fuzzforge_ai
uv sync --all-extras
```
2. **Run Tests**
```bash
# Run all tests
make test
# Run specific package tests
cd fuzzforge-mcp
uv run pytest
```
3. **Type Checking**
```bash
# Type check all packages
make typecheck
# Type check specific package
cd fuzzforge-runner
uv run mypy .
```
4. **Linting and Formatting**
```bash
# Format code
make format
# Lint code
make lint
```
## Bug Reports
## 🐛 Bug Reports
When reporting bugs, please include:
- **Environment**: OS, Python version, Docker version, uv version
- **FuzzForge Version**: Output of `uv run fuzzforge --version`
- **Module**: Which module or component is affected
- **Environment**: OS, Python version, Docker version
- **Steps to Reproduce**: Clear steps to recreate the issue
- **Expected Behavior**: What should happen
- **Actual Behavior**: What actually happens
- **Logs**: Relevant error messages and stack traces
- **Container Logs**: For module issues, include Docker/Podman logs
- **Screenshots**: If applicable
**Example:**
```markdown
**Environment:**
- OS: Ubuntu 22.04
- Python: 3.14.2
- Docker: 24.0.7
- uv: 0.5.13
Use our [Bug Report Template](.github/ISSUE_TEMPLATE/bug_report.md).
**Module:** my-custom-scanner
**Steps to Reproduce:**
1. Run `uv run fuzzforge modules run my-scanner --assets ./test-target`
2. Module fails with timeout error
**Expected:** Module completes analysis
**Actual:** Times out after 30 seconds
**Logs:**
```
ERROR: Module execution timeout
...
```
```
## Feature Requests
## 💡 Feature Requests
For new features, please provide:
@@ -440,124 +141,33 @@ For new features, please provide:
- **Proposed Solution**: How should it work?
- **Alternatives**: Other approaches considered
- **Implementation**: Technical considerations (optional)
- **Module vs Core**: Should this be a module or core feature?
**Example Feature Requests:**
- New module for cloud security posture management (CSPM)
- Module for analyzing smart contract vulnerabilities
- MCP tool for orchestrating multi-module workflows
- CLI command for batch module execution across multiple targets
- Support for distributed fuzzing campaigns
- Integration with CI/CD pipelines
- Module marketplace/registry features
Use our [Feature Request Template](.github/ISSUE_TEMPLATE/feature_request.md).
## Documentation
## 📚 Documentation
Help improve our documentation:
- **Module Documentation**: Document your modules in their README.md
- **API Documentation**: Update docstrings and type hints
- **User Guides**: Improve USAGE.md and tutorial content
- **Module SDK Guides**: Help document the SDK for module developers
- **MCP Integration**: Document AI agent integration patterns
- **Examples**: Add practical usage examples and workflows
- **User Guides**: Create tutorials and how-to guides
- **Workflow Documentation**: Document new security workflows
- **Examples**: Add practical usage examples
### Documentation Standards
- Use clear, concise language
- Include code examples
- Add command-line examples with expected output
- Document all configuration options
- Explain error messages and troubleshooting
### Module README Template
```markdown
# Module Name
Brief description of what this module does.
## Features
- Feature 1
- Feature 2
## Configuration
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| timeout | int | 300 | Timeout in seconds |
## Usage
\`\`\`bash
uv run fuzzforge modules run module-name --assets ./path/to/assets
\`\`\`
## Output
Describes the output structure and format.
## Examples
Practical usage examples.
```
## Recognition
## 🙏 Recognition
Contributors will be:
- Listed in our [Contributors](CONTRIBUTORS.md) file
- Mentioned in release notes for significant contributions
- Credited in module documentation (for module authors)
- Invited to join our [Discord community](https://discord.gg/8XEX33UUwZ)
- Invited to join our Discord community
- Eligible for FuzzingLabs Academy courses and swag
## Module Submission Checklist
## 📜 License
Before submitting a new module:
- [ ] Module follows SDK structure and conventions
- [ ] Dockerfile builds successfully
- [ ] Module executes without errors
- [ ] Configuration options are documented
- [ ] README.md is complete with examples
- [ ] Tests are included (pytest)
- [ ] Type hints are used throughout
- [ ] Linting passes (ruff)
- [ ] Security best practices followed
- [ ] No secrets or credentials in code
- [ ] License headers included
## Review Process
1. **Initial Review** - Maintainers review for completeness
2. **Technical Review** - Code quality and security assessment
3. **Testing** - Module tested in isolated environment
4. **Documentation Review** - Ensure docs are clear and complete
5. **Approval** - Module merged and included in next release
## License
By contributing to FuzzForge AI, you agree that your contributions will be licensed under the same license as the project (see [LICENSE](LICENSE)).
For module contributions:
- Modules you create remain under the project license
- You retain credit as the module author
- Your module may be used by others under the project license terms
By contributing to FuzzForge, you agree that your contributions will be licensed under the same [Business Source License 1.1](LICENSE) as the project.
---
## Getting Help
**Thank you for making FuzzForge better! 🚀**
Need help contributing?
- Join our [Discord](https://discord.gg/8XEX33UUwZ)
- Read the [Module SDK Documentation](fuzzforge-modules/fuzzforge-modules-sdk/README.md)
- Check the module template for examples
- Contact: contact@fuzzinglabs.com
---
**Thank you for making FuzzForge better!**
Every contribution, no matter how small, helps build a stronger security research platform. Whether you're creating a module for web security, cloud scanning, mobile analysis, or any other cybersecurity domain, your work makes FuzzForge more powerful and versatile for the entire security community!
Every contribution, no matter how small, helps build a stronger security community.

View File

@@ -1,78 +0,0 @@
.PHONY: help install sync format lint typecheck test build-hub-images clean
SHELL := /bin/bash
# Default target
help:
@echo "FuzzForge AI Development Commands"
@echo ""
@echo " make install - Install all dependencies"
@echo " make sync - Sync shared packages from upstream"
@echo " make format - Format code with ruff"
@echo " make lint - Lint code with ruff"
@echo " make typecheck - Type check with mypy"
@echo " make test - Run all tests"
@echo " make build-hub-images - Build all mcp-security-hub images"
@echo " make clean - Clean build artifacts"
@echo ""
# Install all dependencies
install:
uv sync
# Sync shared packages from upstream fuzzforge-core
sync:
@if [ -z "$(UPSTREAM)" ]; then \
echo "Usage: make sync UPSTREAM=/path/to/fuzzforge-core"; \
exit 1; \
fi
./scripts/sync-upstream.sh $(UPSTREAM)
# Format all packages
format:
@for pkg in packages/fuzzforge-*/; do \
if [ -f "$$pkg/pyproject.toml" ]; then \
echo "Formatting $$pkg..."; \
cd "$$pkg" && uv run ruff format . && cd -; \
fi \
done
# Lint all packages
lint:
@for pkg in packages/fuzzforge-*/; do \
if [ -f "$$pkg/pyproject.toml" ]; then \
echo "Linting $$pkg..."; \
cd "$$pkg" && uv run ruff check . && cd -; \
fi \
done
# Type check all packages
typecheck:
@for pkg in packages/fuzzforge-*/; do \
if [ -f "$$pkg/pyproject.toml" ] && [ -f "$$pkg/mypy.ini" ]; then \
echo "Type checking $$pkg..."; \
cd "$$pkg" && uv run mypy . && cd -; \
fi \
done
# Run all tests
test:
@for pkg in packages/fuzzforge-*/; do \
if [ -f "$$pkg/pytest.ini" ]; then \
echo "Testing $$pkg..."; \
cd "$$pkg" && uv run pytest && cd -; \
fi \
done
# Build all mcp-security-hub images for the firmware analysis pipeline
build-hub-images:
@bash scripts/build-hub-images.sh
# Clean build artifacts
clean:
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
find . -type d -name ".pytest_cache" -exec rm -rf {} + 2>/dev/null || true
find . -type d -name ".mypy_cache" -exec rm -rf {} + 2>/dev/null || true
find . -type d -name ".ruff_cache" -exec rm -rf {} + 2>/dev/null || true
find . -type d -name "*.egg-info" -exec rm -rf {} + 2>/dev/null || true
find . -type f -name "*.pyc" -delete 2>/dev/null || true

341
README.md
View File

@@ -1,266 +1,215 @@
<h1 align="center"> FuzzForge AI</h1>
<h3 align="center">AI-Powered Security Research Orchestration via MCP</h3>
<p align="center">
<a href="https://discord.gg/8XEX33UUwZ"><img src="https://img.shields.io/discord/1420767905255133267?logo=discord&label=Discord" alt="Discord"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/license-BSL%201.1-blue" alt="License: BSL 1.1"></a>
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="Python 3.12+"/></a>
<a href="https://modelcontextprotocol.io"><img src="https://img.shields.io/badge/MCP-compatible-green" alt="MCP Compatible"/></a>
<a href="https://fuzzforge.ai"><img src="https://img.shields.io/badge/Website-fuzzforge.ai-purple" alt="Website"/></a>
<img src="docs/static/img/fuzzforge_banner_github.png" alt="FuzzForge Banner" width="100%">
</p>
<h1 align="center">🚧 FuzzForge is under active development</h1>
<p align="center"><strong>AI-powered workflow automation and AI Agents for AppSec, Fuzzing & Offensive Security</strong></p>
<p align="center">
<strong>Let AI agents orchestrate your security research workflows locally</strong>
<a href="https://discord.com/invite/acqv9FVG"><img src="https://img.shields.io/discord/1420767905255133267?logo=discord&label=Discord" alt="Discord"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/license-BSL%20%2B%20Apache-orange" alt="License: BSL + Apache"></a>
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11%2B-blue" alt="Python 3.11+"/></a>
<a href="https://fuzzforge.ai"><img src="https://img.shields.io/badge/Website-fuzzforge.ai-blue" alt="Website"/></a>
<img src="https://img.shields.io/badge/version-0.6.0-green" alt="Version">
<a href="https://github.com/FuzzingLabs/fuzzforge_ai/stargazers"><img src="https://img.shields.io/github/stars/FuzzingLabs/fuzzforge_ai?style=social" alt="GitHub Stars"></a>
</p>
<p align="center">
<sub>
<a href="#-overview"><b>Overview</b></a>
<a href="#-features"><b>Features</b></a>
<a href="#-mcp-security-hub"><b>Security Hub</b></a>
<a href="#-installation"><b>Installation</b></a>
<a href="USAGE.md"><b>Usage Guide</b></a>
<a href="#-contributing"><b>Contributing</b></a>
<a href="#-overview"><b>Overview</b></a>
<a href="#-key-features"><b>Features</b></a>
<a href="#-installation"><b>Installation</b></a>
<a href="#-quickstart"><b>Quickstart</b></a>
<a href="#ai-powered-workflow-execution"><b>AI Demo</b></a>
<a href="#-contributing"><b>Contributing</b></a>
• <a href="#%EF%B8%8F-roadmap"><b>Roadmap</b></a>
</sub>
</p>
---
> 🚧 **FuzzForge AI is under active development.** Expect breaking changes and new features!
---
## 🚀 Overview
**FuzzForge AI** is an open-source MCP server that enables AI agents (GitHub Copilot, Claude, etc.) to orchestrate security research workflows through the **Model Context Protocol (MCP)**.
**FuzzForge** helps security researchers and engineers automate **application security** and **offensive security** workflows with the power of AI and fuzzing frameworks.
FuzzForge connects your AI assistant to **MCP tool hubs** — collections of containerized security tools that the agent can discover, chain, and execute autonomously. Instead of manually running security tools, describe what you want and let your AI assistant handle it.
- Orchestrate static & dynamic analysis
- Automate vulnerability research
- Scale AppSec testing with AI agents
- Build, share & reuse workflows across teams
### The Core: Hub Architecture
FuzzForge is **open source**, built to empower security teams, researchers, and the community.
FuzzForge acts as a **meta-MCP server** — a single MCP endpoint that gives your AI agent access to tools from multiple MCP hub servers. Each hub server is a containerized security tool (Binwalk, YARA, Radare2, Nmap, etc.) that the agent can discover at runtime.
- **🔍 Discovery**: The agent lists available hub servers and discovers their tools
- **🤖 AI-Native**: Hub tools provide agent context — usage tips, workflow guidance, and domain knowledge
- **🔗 Composable**: Chain tools from different hubs into automated pipelines
- **📦 Extensible**: Add your own MCP servers to the hub registry
### 🎬 Use Case: Firmware Vulnerability Research
> **Scenario**: Analyze a firmware image to find security vulnerabilities — fully automated by an AI agent.
```
User: "Search for vulnerabilities in firmware.bin"
Agent → Binwalk: Extract filesystem from firmware image
Agent → YARA: Scan extracted files for vulnerability patterns
Agent → Radare2: Trace dangerous function calls in prioritized binaries
Agent → Report: 8 vulnerabilities found (2 critical, 4 high, 2 medium)
```
### 🎬 Use Case: Rust Fuzzing Pipeline
> **Scenario**: Fuzz a Rust crate to discover vulnerabilities using AI-assisted harness generation and parallel fuzzing.
```
User: "Fuzz the blurhash crate for vulnerabilities"
Agent → Rust Analyzer: Identify fuzzable functions and attack surface
Agent → Harness Gen: Generate and validate fuzzing harnesses
Agent → Cargo Fuzzer: Run parallel coverage-guided fuzzing sessions
Agent → Crash Analysis: Deduplicate and triage discovered crashes
```
> 🚧 FuzzForge is under active development. Expect breaking changes.
---
## ⭐ Support the Project
If you find FuzzForge useful, please **star the repo** to support development! 🚀
<a href="https://github.com/FuzzingLabs/fuzzforge_ai/stargazers">
<img src="https://img.shields.io/github/stars/FuzzingLabs/fuzzforge_ai?style=social" alt="GitHub Stars">
</a>
---
## ✨ Features
| Feature | Description |
|---------|-------------|
| 🤖 **AI-Native** | Built for MCP — works with GitHub Copilot, Claude, and any MCP-compatible agent |
| 🔌 **Hub System** | Connect to MCP tool hubs — each hub brings dozens of containerized security tools |
| 🔍 **Tool Discovery** | Agents discover available tools at runtime with built-in usage guidance |
| 🔗 **Pipelines** | Chain tools from different hubs into automated multi-step workflows |
| 🔄 **Persistent Sessions** | Long-running tools (Radare2, fuzzers) with stateful container sessions |
| 🏠 **Local First** | All execution happens on your machine — no cloud required |
| 🔒 **Sandboxed** | Every tool runs in an isolated container via Docker or Podman |
If you find FuzzForge useful, please star the repo to support development 🚀
---
## 🏗️ Architecture
## ✨ Key Features
```
┌─────────────────────────────────────────────────────────────────┐
│ AI Agent (Copilot/Claude) │
└───────────────────────────┬─────────────────────────────────────┘
│ MCP Protocol (stdio)
┌─────────────────────────────────────────────────────────────────┐
│ FuzzForge MCP Server │
│ │
│ Projects Hub Discovery Hub Execution │
│ ┌──────────────┐ ┌──────────────────┐ ┌───────────────────┐ │
│ │init_project │ │list_hub_servers │ │execute_hub_tool │ │
│ │set_assets │ │discover_hub_tools│ │start_hub_server │ │
│ │list_results │ │get_tool_schema │ │stop_hub_server │ │
│ └──────────────┘ └──────────────────┘ └───────────────────┘ │
└───────────────────────────┬─────────────────────────────────────┘
│ Docker/Podman
┌─────────────────────────────────────────────────────────────────┐
│ MCP Hub Servers │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Binwalk │ │ YARA │ │ Radare2 │ │ Nmap │ │
│ │ 6 tools │ │ 5 tools │ │ 32 tools │ │ 8 tools │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Nuclei │ │ SQLMap │ │ Trivy │ │ ... │ │
│ │ 7 tools │ │ 8 tools │ │ 7 tools │ │ 36 hubs │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
---
## 🔧 MCP Security Hub
FuzzForge ships with built-in support for the **[MCP Security Hub](https://github.com/FuzzingLabs/mcp-security-hub)** — a collection of 36 production-ready, Dockerized MCP servers covering offensive security:
| Category | Servers | Examples |
|----------|---------|----------|
| 🔍 **Reconnaissance** | 8 | Nmap, Masscan, Shodan, WhatWeb |
| 🌐 **Web Security** | 6 | Nuclei, SQLMap, ffuf, Nikto |
| 🔬 **Binary Analysis** | 6 | Radare2, Binwalk, YARA, Capa, Ghidra |
| ⛓️ **Blockchain** | 3 | Medusa, Solazy, DAML Viewer |
| ☁️ **Cloud Security** | 3 | Trivy, Prowler, RoadRecon |
| 💻 **Code Security** | 1 | Semgrep |
| 🔑 **Secrets Detection** | 1 | Gitleaks |
| 💥 **Exploitation** | 1 | SearchSploit |
| 🎯 **Fuzzing** | 2 | Boofuzz, Dharma |
| 🕵️ **OSINT** | 2 | Maigret, DNSTwist |
| 🛡️ **Threat Intel** | 2 | VirusTotal, AlienVault OTX |
| 🏰 **Active Directory** | 1 | BloodHound |
> 185+ individual tools accessible through a single MCP connection.
The hub is open source and can be extended with your own MCP servers. See the [mcp-security-hub repository](https://github.com/FuzzingLabs/mcp-security-hub) for details.
- 🤖 **AI Agents for Security** Specialized agents for AppSec, reversing, and fuzzing
- 🛠 **Workflow Automation** Define & execute AppSec workflows as code
- 📈 **Vulnerability Research at Scale** Rediscover 1-days & find 0-days with automation
- 🔗 **Fuzzer Integration** AFL, Honggfuzz, AFLnet, StateAFL & more
- 🌐 **Community Marketplace** Share workflows, corpora, PoCs, and modules
- 🔒 **Enterprise Ready** Team/Corp cloud tiers for scaling offensive security
---
## 📦 Installation
### Prerequisites
### Requirements
- **Python 3.12+**
- **[uv](https://docs.astral.sh/uv/)** package manager
- **Docker** ([Install Docker](https://docs.docker.com/get-docker/)) or Podman
**Python 3.11+**
Python 3.11 or higher is required.
### Quick Install
**uv Package Manager**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
**Docker**
For containerized workflows, see the [Docker Installation Guide](https://docs.docker.com/get-docker/).
#### Configure Docker Daemon
Before running `docker compose up`, configure Docker to allow insecure registries (required for the local registry).
Add the following to your Docker daemon configuration:
```json
{
"insecure-registries": [
"localhost:5000",
"host.docker.internal:5001",
"registry:5000"
]
}
```
**macOS (Docker Desktop):**
1. Open Docker Desktop
2. Go to Settings → Docker Engine
3. Add the `insecure-registries` configuration to the JSON
4. Click "Apply & Restart"
**Linux:**
1. Edit `/etc/docker/daemon.json` (create if it doesn't exist):
```bash
sudo nano /etc/docker/daemon.json
```
2. Add the configuration above
3. Restart Docker:
```bash
sudo systemctl restart docker
```
### CLI Installation
After installing the requirements, install the FuzzForge CLI:
```bash
# Clone the repository
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
cd fuzzforge_ai
# Install dependencies
uv sync
# Install CLI with uv (from the root directory)
uv tool install --python python3.12 .
```
### Link the Security Hub
```bash
# Clone the MCP Security Hub
git clone https://github.com/FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
# Build the Docker images for the hub tools
./scripts/build-hub-images.sh
```
Or use the terminal UI (`uv run fuzzforge ui`) to link hubs interactively.
### Configure MCP for Your AI Agent
```bash
# For GitHub Copilot
uv run fuzzforge mcp install copilot
# For Claude Code (CLI)
uv run fuzzforge mcp install claude-code
# For Claude Desktop (standalone app)
uv run fuzzforge mcp install claude-desktop
# Verify installation
uv run fuzzforge mcp status
```
**Restart your editor** and your AI agent will have access to FuzzForge tools!
---
## 🧑‍💻 Usage
## ⚡ Quickstart
Once installed, just talk to your AI agent:
Run your first workflow :
```
"What security tools are available?"
"Scan this firmware image for vulnerabilities"
"Analyze this binary with radare2"
"Run nuclei against https://example.com"
```bash
# 1. Clone the repo
git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
cd fuzzforge_ai
# 2. Build & run with Docker
# Set registry host for your OS (local registry is mandatory)
# macOS/Windows (Docker Desktop):
export REGISTRY_HOST=host.docker.internal
# Linux (default):
# export REGISTRY_HOST=localhost
docker compose up -d
```
The agent will use FuzzForge to discover the right hub tools, chain them into a pipeline, and return results — all without you touching a terminal.
> The first launch can take 5-10 minutes due to Docker image building - a good time for a coffee break ☕
See the [Usage Guide](USAGE.md) for detailed setup and advanced workflows.
```bash
# 3. Run your first workflow
cd test_projects/vulnerable_app/ # Go into the test directory
fuzzforge init # Init a fuzzforge project
ff workflow run security_assessment . # Start a workflow (you can also use ff command)
```
### Manual Workflow Setup
![Manual Workflow Demo](docs/static/videos/manual_workflow.gif)
_Setting up and running security workflows through the interface_
👉 More installation options in the [Documentation](https://docs.fuzzforge.ai).
---
## 📁 Project Structure
## AI-Powered Workflow Execution
```
fuzzforge_ai/
├── fuzzforge-mcp/ # MCP server — the core of FuzzForge
├── fuzzforge-cli/ # Command-line interface & terminal UI
├── fuzzforge-common/ # Shared abstractions (containers, storage)
├── fuzzforge-runner/ # Container execution engine (Docker/Podman)
├── fuzzforge-tests/ # Integration tests
├── mcp-security-hub/ # Default hub: 36 offensive security MCP servers
└── scripts/ # Hub image build scripts
```
![LLM Workflow Demo](docs/static/videos/llm_workflow.gif)
_AI agents automatically analyzing code and providing security insights_
## 📚 Resources
- 🌐 [Website](https://fuzzforge.ai)
- 📖 [Documentation](https://docs.fuzzforge.ai)
- 💬 [Community Discord](https://discord.com/invite/acqv9FVG)
- 🎓 [FuzzingLabs Academy](https://academy.fuzzinglabs.com/?coupon=GITHUB_FUZZFORGE)
---
## 🤝 Contributing
We welcome contributions from the community!
We welcome contributions from the community!
There are many ways to help:
- 🐛 Report bugs via [GitHub Issues](../../issues)
- 💡 Suggest features or improvements
- 🔧 Submit pull requests
- 🔌 Add new MCP servers to the [Security Hub](https://github.com/FuzzingLabs/mcp-security-hub)
- Report bugs by opening an [issue](../../issues)
- Suggest new features or improvements
- Submit pull requests with fixes or enhancements
- Share workflows, corpora, or modules with the community
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
See our [Contributing Guide](CONTRIBUTING.md) for details.
---
## 📄 License
## 🗺️ Roadmap
BSL 1.1 - See [LICENSE](LICENSE) for details.
Planned features and improvements:
- 📦 Public workflow & module marketplace
- 🤖 New specialized AI agents (Rust, Go, Android, Automotive)
- 🔗 Expanded fuzzer integrations (LibFuzzer, Jazzer, more network fuzzers)
- ☁️ Multi-tenant SaaS platform with team collaboration
- 📊 Advanced reporting & analytics
👉 Follow updates in the [GitHub issues](../../issues) and [Discord](https://discord.com/invite/acqv9FVG).
---
<p align="center">
<strong>Maintained by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
<br>
</p>
## 📜 License
FuzzForge is released under the **Business Source License (BSL) 1.1**, with an automatic fallback to **Apache 2.0** after 4 years.
See [LICENSE](LICENSE) and [LICENSE-APACHE](LICENSE-APACHE) for details.

View File

@@ -1,125 +0,0 @@
# FuzzForge AI Roadmap
This document outlines the planned features and development direction for FuzzForge AI.
---
## 🎯 Upcoming Features
### 1. MCP Security Hub Integration
**Status:** 🔄 Planned
Integrate [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) tools into FuzzForge, giving AI agents access to 28 MCP servers and 163+ security tools through a unified interface.
#### How It Works
Unlike native FuzzForge modules (built with the SDK), mcp-security-hub tools are **standalone MCP servers**. The integration will bridge these tools so they can be:
- Discovered via `list_modules` alongside native modules
- Executed through FuzzForge's orchestration layer
- Chained with native modules in workflows
| Aspect | Native Modules | MCP Hub Tools |
|--------|----------------|---------------|
| **Runtime** | FuzzForge SDK container | Standalone MCP server container |
| **Protocol** | Direct execution | MCP-to-MCP bridge |
| **Configuration** | Module config | Tool-specific args |
| **Output** | FuzzForge results format | Tool-native format (normalized) |
#### Goals
- Unified discovery of all available tools (native + hub)
- Orchestrate hub tools through FuzzForge's workflow engine
- Normalize outputs for consistent result handling
- No modification required to mcp-security-hub tools
#### Planned Tool Categories
| Category | Tools | Example Use Cases |
|----------|-------|-------------------|
| **Reconnaissance** | nmap, masscan, whatweb, shodan | Network scanning, service discovery |
| **Web Security** | nuclei, sqlmap, ffuf, nikto | Vulnerability scanning, fuzzing |
| **Binary Analysis** | radare2, binwalk, yara, capa, ghidra | Reverse engineering, malware analysis |
| **Cloud Security** | trivy, prowler | Container scanning, cloud auditing |
| **Secrets Detection** | gitleaks | Credential scanning |
| **OSINT** | maigret, dnstwist | Username tracking, typosquatting |
| **Threat Intel** | virustotal, otx | Malware analysis, IOC lookup |
#### Example Workflow
```
You: "Scan example.com for vulnerabilities and analyze any suspicious binaries"
AI Agent:
1. Uses nmap module for port discovery
2. Uses nuclei module for vulnerability scanning
3. Uses binwalk module to extract firmware
4. Uses yara module for malware detection
5. Generates consolidated report
```
---
### 2. User Interface
**Status:** 🔄 Planned
A graphical interface to manage FuzzForge without the command line.
#### Goals
- Provide an alternative to CLI for users who prefer visual tools
- Make configuration and monitoring more accessible
- Complement (not replace) the CLI experience
#### Planned Capabilities
| Capability | Description |
|------------|-------------|
| **Configuration** | Change MCP server settings, engine options, paths |
| **Module Management** | Browse, configure, and launch modules |
| **Execution Monitoring** | View running tasks, logs, progress, metrics |
| **Project Overview** | Manage projects and browse execution results |
| **Workflow Management** | Create and run multi-module workflows |
---
## 📋 Backlog
Features under consideration for future releases:
| Feature | Description |
|---------|-------------|
| **Module Marketplace** | Browse and install community modules |
| **Scheduled Executions** | Run modules on a schedule (cron-style) |
| **Team Collaboration** | Share projects, results, and workflows |
| **Reporting Engine** | Generate PDF/HTML security reports |
| **Notifications** | Slack, Discord, email alerts for findings |
---
## ✅ Completed
| Feature | Version | Date |
|---------|---------|------|
| Docker as default engine | 0.1.0 | Jan 2026 |
| MCP server for AI agents | 0.1.0 | Jan 2026 |
| CLI for project management | 0.1.0 | Jan 2026 |
| Continuous execution mode | 0.1.0 | Jan 2026 |
| Workflow orchestration | 0.1.0 | Jan 2026 |
---
## 💬 Feedback
Have suggestions for the roadmap?
- Open an issue on [GitHub](https://github.com/FuzzingLabs/fuzzforge_ai/issues)
- Join our [Discord](https://discord.gg/8XEX33UUwZ)
---
<p align="center">
<strong>Built with ❤️ by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
</p>

517
USAGE.md
View File

@@ -1,517 +0,0 @@
# FuzzForge AI Usage Guide
This guide covers everything you need to know to get started with FuzzForge AI — from installation to linking your first MCP hub and running security research workflows with AI.
> **FuzzForge is designed to be used with AI agents** (GitHub Copilot, Claude, etc.) via MCP.
> A terminal UI (`fuzzforge ui`) is provided for managing agents and hubs.
> The CLI is available for advanced users but the primary experience is through natural language interaction with your AI assistant.
---
## Table of Contents
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Terminal UI](#terminal-ui)
- [Launching the UI](#launching-the-ui)
- [Dashboard](#dashboard)
- [Agent Setup](#agent-setup)
- [Hub Manager](#hub-manager)
- [MCP Hub System](#mcp-hub-system)
- [What is an MCP Hub?](#what-is-an-mcp-hub)
- [FuzzingLabs Security Hub](#fuzzinglabs-security-hub)
- [Linking a Custom Hub](#linking-a-custom-hub)
- [Building Hub Images](#building-hub-images)
- [MCP Server Configuration (CLI)](#mcp-server-configuration-cli)
- [GitHub Copilot](#github-copilot)
- [Claude Code (CLI)](#claude-code-cli)
- [Claude Desktop](#claude-desktop)
- [Using FuzzForge with AI](#using-fuzzforge-with-ai)
- [CLI Reference](#cli-reference)
- [Environment Variables](#environment-variables)
- [Troubleshooting](#troubleshooting)
---
## Quick Start
> **Prerequisites:** You need [uv](https://docs.astral.sh/uv/) and [Docker](https://docs.docker.com/get-docker/) installed.
> See the [Prerequisites](#prerequisites) section for details.
```bash
# 1. Clone and install
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
cd fuzzforge_ai
uv sync
# 2. Launch the terminal UI
uv run fuzzforge ui
# 3. Press 'h' → "FuzzingLabs Hub" to clone & link the default security hub
# 4. Select an agent row and press Enter to install the MCP server for your agent
# 5. Build the Docker images for the hub tools (required before tools can run)
./scripts/build-hub-images.sh
# 6. Restart your AI agent and start talking:
# "What security tools are available?"
# "Scan this binary with binwalk and yara"
# "Analyze this Rust crate for fuzzable functions"
```
Or do it entirely from the command line:
```bash
# Install MCP for your AI agent
uv run fuzzforge mcp install copilot # For VS Code + GitHub Copilot
# OR
uv run fuzzforge mcp install claude-code # For Claude Code CLI
# Clone and link the default security hub
git clone git@github.com:FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
# Build hub tool images (required — tools only run once their image is built)
./scripts/build-hub-images.sh
# Restart your AI agent — done!
```
> **Note:** FuzzForge uses Docker by default. Podman is also supported via `--engine podman`.
---
## Prerequisites
Before installing FuzzForge AI, ensure you have:
- **Python 3.12+** — [Download Python](https://www.python.org/downloads/)
- **uv** package manager — [Install uv](https://docs.astral.sh/uv/)
- **Docker** — Container runtime ([Install Docker](https://docs.docker.com/get-docker/))
- **Git** — For cloning hub repositories
### Installing uv
```bash
# Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or with pip
pip install uv
```
### Installing Docker
```bash
# Linux (Ubuntu/Debian)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in for group changes to take effect
# macOS/Windows
# Install Docker Desktop from https://docs.docker.com/get-docker/
```
> **Note:** Podman is also supported. Use `--engine podman` with CLI commands
> or set `FUZZFORGE_ENGINE=podman` environment variable.
---
## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
cd fuzzforge_ai
```
### 2. Install Dependencies
```bash
uv sync
```
This installs all FuzzForge components in a virtual environment.
### 3. Verify Installation
```bash
uv run fuzzforge --help
```
---
## Terminal UI
FuzzForge ships with a terminal user interface (TUI) built on [Textual](https://textual.textualize.io/) for managing AI agents and MCP hub servers from a single dashboard.
### Launching the UI
```bash
uv run fuzzforge ui
```
### Dashboard
The main screen is split into two panels:
| Panel | Content |
|-------|---------|
| **AI Agents** (left) | Shows GitHub Copilot, Claude Desktop, and Claude Code with live link status and config file path |
| **Hub Servers** (right) | Shows all configured MCP hub tools with Docker image name, source hub, and build status (✓ Ready / ✗ Not built) |
### Keyboard Shortcuts
| Key | Action |
|-----|--------|
| `Enter` | **Select** — Act on the selected row (setup/unlink an agent) |
| `h` | **Hub Manager** — Open the hub management screen |
| `r` | **Refresh** — Re-check all agent and hub statuses |
| `q` | **Quit** |
### Agent Setup
Select an agent row in the AI Agents table and press `Enter`:
- **If the agent is not linked** → a setup dialog opens asking for your container engine (Docker or Podman), then installs the FuzzForge MCP configuration
- **If the agent is already linked** → a confirmation dialog offers to unlink it (removes the `fuzzforge` entry without touching other MCP servers)
The setup auto-detects:
- FuzzForge installation root
- Docker/Podman socket path
- Hub configuration from `hub-config.json`
### Hub Manager
Press `h` to open the hub manager. This is where you manage your MCP hub repositories:
| Button | Action |
|--------|--------|
| **FuzzingLabs Hub** | One-click clone of the official [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) repository — clones to `~/.fuzzforge/hubs/mcp-security-hub`, scans for tools, and registers them in `hub-config.json` |
| **Link Path** | Link any local directory as a hub — enter a name and path, FuzzForge scans it for `category/tool-name/Dockerfile` patterns |
| **Clone URL** | Clone any git repository and link it as a hub |
| **Remove** | Unlink the selected hub and remove its servers from the configuration |
The hub table shows:
- **Name** — Hub name (★ prefix for the default hub)
- **Path** — Local directory path
- **Servers** — Number of MCP tools discovered
- **Source** — Git URL or "local"
---
## MCP Hub System
### What is an MCP Hub?
An MCP hub is a directory containing one or more containerized MCP tools, organized by category:
```
my-hub/
├── category-a/
│ ├── tool-1/
│ │ └── Dockerfile
│ └── tool-2/
│ └── Dockerfile
├── category-b/
│ └── tool-3/
│ └── Dockerfile
└── ...
```
FuzzForge scans for the pattern `category/tool-name/Dockerfile` and auto-generates server configuration entries for each discovered tool.
### FuzzingLabs Security Hub
The default MCP hub is [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub), maintained by FuzzingLabs. It includes **40+ security tools** across categories:
| Category | Tools |
|----------|-------|
| **Reconnaissance** | nmap, masscan, shodan, zoomeye, whatweb, pd-tools, externalattacker, networksdb |
| **Binary Analysis** | binwalk, yara, capa, radare2, ghidra, ida |
| **Code Security** | semgrep, rust-analyzer, harness-tester, cargo-fuzzer, crash-analyzer |
| **Web Security** | nuclei, nikto, sqlmap, ffuf, burp, waybackurls |
| **Fuzzing** | boofuzz, dharma |
| **Exploitation** | searchsploit |
| **Secrets** | gitleaks |
| **Cloud Security** | trivy, prowler, roadrecon |
| **OSINT** | maigret, dnstwist |
| **Threat Intel** | virustotal, otx |
| **Password Cracking** | hashcat |
| **Blockchain** | medusa, solazy, daml-viewer |
**Clone it via the UI:**
1. `uv run fuzzforge ui`
2. Press `h` → click **FuzzingLabs Hub**
3. Wait for the clone to finish — servers are auto-registered
**Or clone manually:**
```bash
git clone git@github.com:FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
```
### Linking a Custom Hub
You can link any directory that follows the `category/tool-name/Dockerfile` layout:
**Via the UI:**
1. Press `h`**Link Path**
2. Enter a name and the directory path
**Via the CLI (planned):** Not yet available — use the UI.
### Building Hub Images
After linking a hub, you need to build the Docker images before the tools can be used:
```bash
# Build all images from the default security hub
./scripts/build-hub-images.sh
# Or build a single tool image
docker build -t semgrep-mcp:latest mcp-security-hub/code-security/semgrep-mcp/
```
The dashboard hub table shows ✓ Ready for built images and ✗ Not built for missing ones.
---
## MCP Server Configuration (CLI)
If you prefer the command line over the TUI, you can configure agents directly:
### GitHub Copilot
```bash
uv run fuzzforge mcp install copilot
```
The command auto-detects:
- **FuzzForge root** — Where FuzzForge is installed
- **Docker socket** — Auto-detects `/var/run/docker.sock`
**Optional overrides:**
```bash
uv run fuzzforge mcp install copilot --engine podman
```
**After installation:** Restart VS Code. FuzzForge tools appear in GitHub Copilot Chat.
### Claude Code (CLI)
```bash
uv run fuzzforge mcp install claude-code
```
Installs to `~/.claude.json`. FuzzForge tools are available from any directory after restarting Claude.
### Claude Desktop
```bash
uv run fuzzforge mcp install claude-desktop
```
**After installation:** Restart Claude Desktop.
### Check Status
```bash
uv run fuzzforge mcp status
```
### Remove Configuration
```bash
uv run fuzzforge mcp uninstall copilot
uv run fuzzforge mcp uninstall claude-code
uv run fuzzforge mcp uninstall claude-desktop
```
---
## Using FuzzForge with AI
Once MCP is configured and hub images are built, interact with FuzzForge through natural language with your AI assistant.
### Example Conversations
**Discover available tools:**
```
You: "What security tools are available in FuzzForge?"
AI: Queries hub tools → "I found 15 tools across categories: nmap for
port scanning, binwalk for firmware analysis, semgrep for code
scanning, cargo-fuzzer for Rust fuzzing..."
```
**Analyze a binary:**
```
You: "Extract and analyze this firmware image"
AI: Uses binwalk to extract → yara for pattern matching → capa for
capability detection → "Found 3 embedded filesystems, 2 YARA
matches for known vulnerabilities..."
```
**Fuzz Rust code:**
```
You: "Analyze this Rust crate for functions I should fuzz"
AI: Uses rust-analyzer → "Found 3 fuzzable entry points..."
You: "Start fuzzing parse_input for 10 minutes"
AI: Uses cargo-fuzzer → "Fuzzing session started. 2 crashes found..."
```
**Scan for vulnerabilities:**
```
You: "Scan this codebase with semgrep for security issues"
AI: Uses semgrep-mcp → "Found 5 findings: 2 high severity SQL injection
patterns, 3 medium severity hardcoded secrets..."
```
---
## CLI Reference
### UI Command
```bash
uv run fuzzforge ui # Launch the terminal dashboard
```
### MCP Commands
```bash
uv run fuzzforge mcp status # Check agent configuration status
uv run fuzzforge mcp install <agent> # Install MCP config (copilot|claude-code|claude-desktop)
uv run fuzzforge mcp uninstall <agent> # Remove MCP config
uv run fuzzforge mcp generate <agent> # Preview config without installing
```
### Project Commands
```bash
uv run fuzzforge project init # Initialize a project
uv run fuzzforge project info # Show project info
uv run fuzzforge project executions # List executions
uv run fuzzforge project results <id> # Get execution results
```
---
## Environment Variables
Configure FuzzForge using environment variables:
```bash
# Override the FuzzForge installation root (auto-detected from cwd by default)
export FUZZFORGE_ROOT=/path/to/fuzzforge_ai
# Override the user-global data directory (default: ~/.fuzzforge)
# Useful for isolated testing without touching your real installation
export FUZZFORGE_USER_DIR=/tmp/my-fuzzforge-test
# Storage path for projects and execution results (default: <workspace>/.fuzzforge/storage)
export FUZZFORGE_STORAGE__PATH=/path/to/storage
# Container engine (Docker is default)
export FUZZFORGE_ENGINE__TYPE=docker # or podman
# Podman-specific container storage paths
export FUZZFORGE_ENGINE__GRAPHROOT=~/.fuzzforge/containers/storage
export FUZZFORGE_ENGINE__RUNROOT=~/.fuzzforge/containers/run
```
---
## Troubleshooting
### Docker Not Running
```
Error: Cannot connect to Docker daemon
```
**Solution:**
```bash
# Linux: Start Docker service
sudo systemctl start docker
# macOS/Windows: Start Docker Desktop application
# Verify Docker is running
docker run --rm hello-world
```
### Permission Denied on Docker Socket
```
Error: Permission denied connecting to Docker socket
```
**Solution:**
```bash
sudo usermod -aG docker $USER
# Log out and back in, then verify:
docker run --rm hello-world
```
### Hub Images Not Built
The dashboard shows ✗ Not built for tools:
```bash
# Build all hub images
./scripts/build-hub-images.sh
# Or build a single tool
docker build -t <tool-name>:latest mcp-security-hub/<category>/<tool-name>/
```
### MCP Server Not Starting
```bash
# Check agent configuration
uv run fuzzforge mcp status
# Verify the config file path exists and contains valid JSON
cat ~/.config/Code/User/mcp.json # Copilot
cat ~/.claude.json # Claude Code
```
### Using Podman Instead of Docker
```bash
# Install with Podman engine
uv run fuzzforge mcp install copilot --engine podman
# Or set environment variable
export FUZZFORGE_ENGINE=podman
```
### Hub Registry
FuzzForge stores linked hub information in `~/.fuzzforge/hubs.json`. If something goes wrong:
```bash
# View registry
cat ~/.fuzzforge/hubs.json
# Reset registry
rm ~/.fuzzforge/hubs.json
```
---
## Next Steps
- 🖥️ Launch `uv run fuzzforge ui` and explore the dashboard
- 🔒 Clone the [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) for 40+ security tools
- 💬 Join our [Discord](https://discord.gg/8XEX33UUwZ) for support
---
<p align="center">
<strong>Built with ❤️ by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
</p>

6
ai/.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
.env
__pycache__/
*.pyc
fuzzforge_sessions.db
agentops.log
*.log

110
ai/README.md Normal file
View File

@@ -0,0 +1,110 @@
# FuzzForge AI Module
FuzzForge AI is the multi-agent layer that lets you operate the FuzzForge security platform through natural language. It orchestrates local tooling, registered Agent-to-Agent (A2A) peers, and the Prefect-powered backend while keeping long-running context in memory and project knowledge graphs.
## Quick Start
1. **Initialise a project**
```bash
cd /path/to/project
fuzzforge init
```
2. **Review environment settings** copy `.fuzzforge/.env.template` to `.fuzzforge/.env`, then edit the values to match your provider. The template ships with commented defaults for OpenAI-style usage and placeholders for Cognee keys.
```env
LLM_PROVIDER=openai
LITELLM_MODEL=gpt-5-mini
OPENAI_API_KEY=sk-your-key
FUZZFORGE_MCP_URL=http://localhost:8010/mcp
SESSION_PERSISTENCE=sqlite
```
Optional flags you may want to enable early:
```env
MEMORY_SERVICE=inmemory
AGENTOPS_API_KEY=sk-your-agentops-key # Enable hosted tracing
LOG_LEVEL=INFO # CLI / server log level
```
3. **Populate the knowledge graph**
```bash
fuzzforge ingest --path . --recursive
# alias: fuzzforge rag ingest --path . --recursive
```
4. **Launch the agent shell**
```bash
fuzzforge ai agent
```
Keep the backend running (Prefect API at `FUZZFORGE_MCP_URL`) so workflow commands succeed.
## Everyday Workflow
- Run `fuzzforge ai agent` and start with `list available fuzzforge workflows` or `/memory status` to confirm everything is wired.
- Use natural prompts for automation (`run fuzzforge workflow …`, `search project knowledge for …`) and fall back to slash commands for precision (`/recall`, `/sendfile`).
- Keep `/memory datasets` handy to see which Cognee datasets are available after each ingest.
- Start the HTTP surface with `python -m fuzzforge_ai` when external agents need access to artifacts or graph queries. The CLI stays usable at the same time.
- Refresh the knowledge graph regularly: `fuzzforge ingest --path . --recursive --force` keeps responses aligned with recent code changes.
## What the Agent Can Do
- **Route requests** automatically selects the right local tool or remote agent using the A2A capability registry.
- **Run security workflows** list, submit, and monitor FuzzForge workflows via MCP wrappers.
- **Manage artifacts** create downloadable files for reports, code edits, and shared attachments.
- **Maintain context** stores session history, semantic recall, and Cognee project graphs.
- **Serve over HTTP** expose the same agent as an A2A server using `python -m fuzzforge_ai`.
## Essential Commands
Inside `fuzzforge ai agent` you can mix slash commands and free-form prompts:
```text
/list # Show registered A2A agents
/register http://:10201 # Add a remote agent
/artifacts # List generated files
/sendfile SecurityAgent src/report.md "Please review"
You> route_to SecurityAnalyzer: scan ./backend for secrets
You> run fuzzforge workflow static_analysis_scan on ./test_projects/demo
You> search project knowledge for "prefect status" using INSIGHTS
```
Artifacts created during the conversation are served from `.fuzzforge/artifacts/` and exposed through the A2A HTTP API.
## Memory & Knowledge
The module layers three storage systems:
- **Session persistence** (SQLite or in-memory) for chat transcripts.
- **Semantic recall** via the ADK memory service for fuzzy search.
- **Cognee graphs** for project-wide knowledge built from ingestion runs.
Re-run ingestion after major code changes to keep graph answers relevant. If Cognee variables are not set, graph-specific tools automatically respond with a polite "not configured" message.
## Sample Prompts
Use these to validate the setup once the agent shell is running:
- `list available fuzzforge workflows`
- `run fuzzforge workflow static_analysis_scan on ./backend with target_branch=main`
- `show findings for that run once it finishes`
- `refresh the project knowledge graph for ./backend`
- `search project knowledge for "prefect readiness" using INSIGHTS`
- `/recall terraform secrets`
- `/memory status`
- `ROUTE_TO SecurityAnalyzer: audit infrastructure_vulnerable`
## Need More Detail?
Dive into the dedicated guides under `ai/docs/advanced/`:
- [Architecture](https://docs.fuzzforge.ai/docs/ai/intro) High-level architecture with diagrams and component breakdowns.
- [Ingestion](https://docs.fuzzforge.ai/docs/ai/ingestion.md) Command options, Cognee persistence, and prompt examples.
- [Configuration](https://docs.fuzzforge.ai/docs/ai/configuration.md) LLM provider matrix, local model setup, and tracing options.
- [Prompts](https://docs.fuzzforge.ai/docs/ai/prompts.md) Slash commands, workflow prompts, and routing tips.
- [A2A Services](https://docs.fuzzforge.ai/docs/ai/a2a-services.md) HTTP endpoints, agent card, and collaboration flow.
- [Memory Persistence](https://docs.fuzzforge.ai/docs/ai/architecture.md#memory--persistence) Deep dive on memory storage, datasets, and how `/memory status` inspects them.
## Development Notes
- Entry point for the CLI: `ai/src/fuzzforge_ai/cli.py`
- A2A HTTP server: `ai/src/fuzzforge_ai/a2a_server.py`
- Tool routing & workflow glue: `ai/src/fuzzforge_ai/agent_executor.py`
- Ingestion helpers: `ai/src/fuzzforge_ai/ingest_utils.py`
Install the module in editable mode (`pip install -e ai`) while iterating so CLI changes are picked up immediately.

93
ai/llm.txt Normal file
View File

@@ -0,0 +1,93 @@
FuzzForge AI LLM Configuration Guide
===================================
This note summarises the environment variables and libraries that drive LiteLLM (via the Google ADK runtime) inside the FuzzForge AI module. For complete matrices and advanced examples, read `docs/advanced/configuration.md`.
Core Libraries
--------------
- `google-adk` hosts the agent runtime, memory services, and LiteLLM bridge.
- `litellm` provider-agnostic LLM client used by ADK and the executor.
- Provider SDKs install the SDK that matches your target backend (`openai`, `anthropic`, `google-cloud-aiplatform`, `groq`, etc.).
- Optional extras: `agentops` for tracing, `cognee[all]` for knowledge-graph ingestion, `ollama` CLI for running local models.
Quick install foundation::
```
pip install google-adk litellm openai
```
Add any provider-specific SDKs (for example `pip install anthropic groq`) on top of that base.
Baseline Setup
--------------
Copy `.fuzzforge/.env.template` to `.fuzzforge/.env` and set the core fields:
```
LLM_PROVIDER=openai
LITELLM_MODEL=gpt-5-mini
OPENAI_API_KEY=sk-your-key
FUZZFORGE_MCP_URL=http://localhost:8010/mcp
SESSION_PERSISTENCE=sqlite
MEMORY_SERVICE=inmemory
```
LiteLLM Provider Examples
-------------------------
OpenAI-compatible (Azure, etc.)::
```
LLM_PROVIDER=azure_openai
LITELLM_MODEL=gpt-4o-mini
LLM_API_KEY=sk-your-azure-key
LLM_ENDPOINT=https://your-resource.openai.azure.com
```
Anthropic::
```
LLM_PROVIDER=anthropic
LITELLM_MODEL=claude-3-haiku-20240307
ANTHROPIC_API_KEY=sk-your-key
```
Ollama (local)::
```
LLM_PROVIDER=ollama_chat
LITELLM_MODEL=codellama:latest
OLLAMA_API_BASE=http://localhost:11434
```
Run `ollama pull codellama:latest` so the adapter can respond immediately.
Vertex AI::
```
LLM_PROVIDER=vertex_ai
LITELLM_MODEL=gemini-1.5-pro
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
```
Provider Checklist
------------------
- **OpenAI / Azure OpenAI**: `LLM_PROVIDER`, `LITELLM_MODEL`, API key, optional endpoint + API version (Azure).
- **Anthropic**: `LLM_PROVIDER=anthropic`, `LITELLM_MODEL`, `ANTHROPIC_API_KEY`.
- **Google Vertex AI**: `LLM_PROVIDER=vertex_ai`, `LITELLM_MODEL`, `GOOGLE_APPLICATION_CREDENTIALS`, `GOOGLE_CLOUD_PROJECT`.
- **Groq**: `LLM_PROVIDER=groq`, `LITELLM_MODEL`, `GROQ_API_KEY`.
- **Ollama / Local**: `LLM_PROVIDER=ollama_chat`, `LITELLM_MODEL`, `OLLAMA_API_BASE`, and the model pulled locally (`ollama pull <model>`).
Knowledge Graph Add-ons
-----------------------
Set these only if you plan to use Cognee project graphs:
```
LLM_COGNEE_PROVIDER=openai
LLM_COGNEE_MODEL=gpt-5-mini
LLM_COGNEE_API_KEY=sk-your-key
```
Tracing & Debugging
-------------------
- Provide `AGENTOPS_API_KEY` to enable hosted traces for every conversation.
- Set `FUZZFORGE_DEBUG=1` (and optionally `LOG_LEVEL=DEBUG`) for verbose executor output.
- Restart the agent after changing environment variables; LiteLLM loads configuration on boot.
Further Reading
---------------
`docs/advanced/configuration.md` provider comparison, debugging flags, and referenced modules.

44
ai/pyproject.toml Normal file
View File

@@ -0,0 +1,44 @@
[project]
name = "fuzzforge-ai"
version = "0.6.0"
description = "FuzzForge AI orchestration module"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"google-adk",
"a2a-sdk",
"litellm",
"python-dotenv",
"httpx",
"uvicorn",
"rich",
"agentops",
"fastmcp",
"mcp",
"typing-extensions",
"cognee>=0.3.0",
]
[project.optional-dependencies]
dev = [
"pytest",
"pytest-asyncio",
"black",
"ruff",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/fuzzforge_ai"]
[tool.hatch.metadata]
allow-direct-references = true
[tool.uv]
dev-dependencies = [
"pytest",
"pytest-asyncio",
]

View File

@@ -0,0 +1,24 @@
"""
FuzzForge AI Module - Agent-to-Agent orchestration system
This module integrates the fuzzforge_ai components into FuzzForge,
providing intelligent AI agent capabilities for security analysis.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
__version__ = "0.6.0"
from .agent import FuzzForgeAgent
from .config_manager import ConfigManager
__all__ = ['FuzzForgeAgent', 'ConfigManager']

View File

@@ -0,0 +1,109 @@
"""
FuzzForge A2A Server
Run this to expose FuzzForge as an A2A-compatible agent
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import os
import warnings
import logging
from dotenv import load_dotenv
from fuzzforge_ai.config_bridge import ProjectConfigManager
# Suppress warnings
warnings.filterwarnings("ignore")
logging.getLogger("google.adk").setLevel(logging.ERROR)
logging.getLogger("google.adk.tools.base_authenticated_tool").setLevel(logging.ERROR)
# Load .env from .fuzzforge directory first, then fallback
from pathlib import Path
# Ensure Cognee logs stay inside the project workspace
project_root = Path.cwd()
default_log_dir = project_root / ".fuzzforge" / "logs"
default_log_dir.mkdir(parents=True, exist_ok=True)
log_path = default_log_dir / "cognee.log"
os.environ.setdefault("COGNEE_LOG_PATH", str(log_path))
fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
if fuzzforge_env.exists():
load_dotenv(fuzzforge_env, override=True)
else:
load_dotenv(override=True)
# Ensure Cognee uses the project-specific storage paths when available
try:
project_config = ProjectConfigManager()
project_config.setup_cognee_environment()
except Exception:
# Project may not be initialized; fall through with default settings
pass
# Check configuration
if not os.getenv('LITELLM_MODEL'):
print("[ERROR] LITELLM_MODEL not set in .env file")
print("Please set LITELLM_MODEL to your desired model (e.g., gpt-4o-mini)")
exit(1)
from .agent import get_fuzzforge_agent
from .a2a_server import create_a2a_app as create_custom_a2a_app
def create_a2a_app():
"""Create the A2A application"""
# Get configuration
port = int(os.getenv('FUZZFORGE_PORT', 10100))
# Get the FuzzForge agent
fuzzforge = get_fuzzforge_agent()
# Print ASCII banner
print("\033[95m") # Purple color
print(" ███████╗██╗ ██╗███████╗███████╗███████╗ ██████╗ ██████╗ ██████╗ ███████╗ █████╗ ██╗")
print(" ██╔════╝██║ ██║╚══███╔╝╚══███╔╝██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔══██╗██║")
print(" █████╗ ██║ ██║ ███╔╝ ███╔╝ █████╗ ██║ ██║██████╔╝██║ ███╗█████╗ ███████║██║")
print(" ██╔══╝ ██║ ██║ ███╔╝ ███╔╝ ██╔══╝ ██║ ██║██╔══██╗██║ ██║██╔══╝ ██╔══██║██║")
print(" ██║ ╚██████╔╝███████╗███████╗██║ ╚██████╔╝██║ ██║╚██████╔╝███████╗ ██║ ██║██║")
print(" ╚═╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═╝╚═╝")
print("\033[0m") # Reset color
# Create A2A app
print(f"🚀 Starting FuzzForge A2A Server")
print(f" Model: {fuzzforge.model}")
if fuzzforge.cognee_url:
print(f" Memory: Cognee at {fuzzforge.cognee_url}")
print(f" Port: {port}")
app = create_custom_a2a_app(fuzzforge.adk_agent, port=port, executor=fuzzforge.executor)
print(f"\n✅ FuzzForge A2A Server ready!")
print(f" Agent card: http://localhost:{port}/.well-known/agent-card.json")
print(f" A2A endpoint: http://localhost:{port}/")
print(f"\n📡 Other agents can register FuzzForge at: http://localhost:{port}")
return app
def main():
"""Start the A2A server using uvicorn."""
import uvicorn
app = create_a2a_app()
port = int(os.getenv('FUZZFORGE_PORT', 10100))
print(f"\n🎯 Starting server with uvicorn...")
uvicorn.run(app, host="127.0.0.1", port=port)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,230 @@
"""Custom A2A wiring so we can access task store and queue manager."""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from __future__ import annotations
import logging
from typing import Optional, Union
from starlette.applications import Starlette
from starlette.responses import Response, FileResponse
from starlette.routing import Route
from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor
from google.adk.a2a.utils.agent_card_builder import AgentCardBuilder
from google.adk.a2a.experimental import a2a_experimental
from google.adk.agents.base_agent import BaseAgent
from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService
from google.adk.auth.credential_service.in_memory_credential_service import InMemoryCredentialService
from google.adk.cli.utils.logs import setup_adk_logger
from google.adk.memory.in_memory_memory_service import InMemoryMemoryService
from google.adk.runners import Runner
from google.adk.sessions.in_memory_session_service import InMemorySessionService
from a2a.server.apps import A2AStarletteApplication
from a2a.server.request_handlers.default_request_handler import DefaultRequestHandler
from a2a.server.tasks.inmemory_task_store import InMemoryTaskStore
from a2a.server.events.in_memory_queue_manager import InMemoryQueueManager
from a2a.types import AgentCard
from .agent_executor import FuzzForgeExecutor
import json
async def serve_artifact(request):
"""Serve artifact files via HTTP for A2A agents"""
artifact_id = request.path_params["artifact_id"]
# Try to get the executor instance to access artifact cache
# We'll store a reference to it during app creation
executor = getattr(serve_artifact, '_executor', None)
if not executor:
return Response("Artifact service not available", status_code=503)
try:
# Look in the artifact cache directory
artifact_cache_dir = executor._artifact_cache_dir
artifact_dir = artifact_cache_dir / artifact_id
if not artifact_dir.exists():
return Response("Artifact not found", status_code=404)
# Find the artifact file (should be only one file in the directory)
artifact_files = list(artifact_dir.glob("*"))
if not artifact_files:
return Response("Artifact file not found", status_code=404)
artifact_file = artifact_files[0] # Take the first (and should be only) file
# Determine mime type from file extension or default to octet-stream
import mimetypes
mime_type, _ = mimetypes.guess_type(str(artifact_file))
if not mime_type:
mime_type = 'application/octet-stream'
return FileResponse(
path=str(artifact_file),
media_type=mime_type,
filename=artifact_file.name
)
except Exception as e:
return Response(f"Error serving artifact: {str(e)}", status_code=500)
async def knowledge_query(request):
"""Expose knowledge graph search over HTTP for external agents."""
executor = getattr(knowledge_query, '_executor', None)
if not executor:
return Response("Knowledge service not available", status_code=503)
try:
payload = await request.json()
except Exception:
return Response("Invalid JSON body", status_code=400)
query = payload.get("query")
if not query:
return Response("'query' is required", status_code=400)
search_type = payload.get("search_type", "INSIGHTS")
dataset = payload.get("dataset")
result = await executor.query_project_knowledge_api(
query=query,
search_type=search_type,
dataset=dataset,
)
status = 200 if not isinstance(result, dict) or "error" not in result else 400
return Response(
json.dumps(result, default=str),
status_code=status,
media_type="application/json",
)
async def create_file_artifact(request):
"""Create an artifact from a project file via HTTP."""
executor = getattr(create_file_artifact, '_executor', None)
if not executor:
return Response("File service not available", status_code=503)
try:
payload = await request.json()
except Exception:
return Response("Invalid JSON body", status_code=400)
path = payload.get("path")
if not path:
return Response("'path' is required", status_code=400)
result = await executor.create_project_file_artifact_api(path)
status = 200 if not isinstance(result, dict) or "error" not in result else 400
return Response(
json.dumps(result, default=str),
status_code=status,
media_type="application/json",
)
def _load_agent_card(agent_card: Optional[Union[AgentCard, str]]) -> Optional[AgentCard]:
if agent_card is None:
return None
if isinstance(agent_card, AgentCard):
return agent_card
import json
from pathlib import Path
path = Path(agent_card)
with path.open('r', encoding='utf-8') as handle:
data = json.load(handle)
return AgentCard(**data)
@a2a_experimental
def create_a2a_app(
agent: BaseAgent,
*,
host: str = "localhost",
port: int = 8000,
protocol: str = "http",
agent_card: Optional[Union[AgentCard, str]] = None,
executor=None, # Accept executor reference
) -> Starlette:
"""Variant of google.adk.a2a.utils.to_a2a that exposes task-store handles."""
setup_adk_logger(logging.INFO)
async def create_runner() -> Runner:
return Runner(
agent=agent,
app_name=agent.name or "fuzzforge",
artifact_service=InMemoryArtifactService(),
session_service=InMemorySessionService(),
memory_service=InMemoryMemoryService(),
credential_service=InMemoryCredentialService(),
)
task_store = InMemoryTaskStore()
queue_manager = InMemoryQueueManager()
agent_executor = A2aAgentExecutor(runner=create_runner)
request_handler = DefaultRequestHandler(
agent_executor=agent_executor,
task_store=task_store,
queue_manager=queue_manager,
)
rpc_url = f"{protocol}://{host}:{port}/"
provided_card = _load_agent_card(agent_card)
card_builder = AgentCardBuilder(agent=agent, rpc_url=rpc_url)
app = Starlette()
async def setup() -> None:
if provided_card is not None:
final_card = provided_card
else:
final_card = await card_builder.build()
a2a_app = A2AStarletteApplication(
agent_card=final_card,
http_handler=request_handler,
)
a2a_app.add_routes_to_app(app)
# Add artifact serving route
app.router.add_route("/artifacts/{artifact_id}", serve_artifact, methods=["GET"])
app.router.add_route("/graph/query", knowledge_query, methods=["POST"])
app.router.add_route("/project/files", create_file_artifact, methods=["POST"])
app.add_event_handler("startup", setup)
# Expose handles so the executor can emit task updates later
FuzzForgeExecutor.task_store = task_store
FuzzForgeExecutor.queue_manager = queue_manager
# Store reference to executor for artifact serving
serve_artifact._executor = executor
knowledge_query._executor = executor
create_file_artifact._executor = executor
return app
__all__ = ["create_a2a_app"]

View File

@@ -0,0 +1,133 @@
"""
FuzzForge Agent Definition
The core agent that combines all components
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import os
from pathlib import Path
from typing import Dict, Any, List
from google.adk import Agent
from google.adk.models.lite_llm import LiteLlm
from .agent_card import get_fuzzforge_agent_card
from .agent_executor import FuzzForgeExecutor
from .memory_service import FuzzForgeMemoryService, HybridMemoryManager
# Load environment variables from the AI module's .env file
try:
from dotenv import load_dotenv
_ai_dir = Path(__file__).parent
_env_file = _ai_dir / ".env"
if _env_file.exists():
load_dotenv(_env_file, override=False) # Don't override existing env vars
except ImportError:
# dotenv not available, skip loading
pass
class FuzzForgeAgent:
"""The main FuzzForge agent that combines card, executor, and ADK agent"""
def __init__(
self,
model: str = None,
cognee_url: str = None,
port: int = 10100,
):
"""Initialize FuzzForge agent with configuration"""
self.model = model or os.getenv('LITELLM_MODEL', 'gpt-4o-mini')
self.cognee_url = cognee_url or os.getenv('COGNEE_MCP_URL')
self.port = port
# Initialize ADK Memory Service for conversational memory
memory_type = os.getenv('MEMORY_SERVICE', 'inmemory')
self.memory_service = FuzzForgeMemoryService(memory_type=memory_type)
# Create the executor (the brain) with memory and session services
self.executor = FuzzForgeExecutor(
model=self.model,
cognee_url=self.cognee_url,
debug=os.getenv('FUZZFORGE_DEBUG', '0') == '1',
memory_service=self.memory_service,
session_persistence=os.getenv('SESSION_PERSISTENCE', 'inmemory'),
fuzzforge_mcp_url=os.getenv('FUZZFORGE_MCP_URL'),
)
# Create Hybrid Memory Manager (ADK + Cognee direct integration)
# MCP tools removed - using direct Cognee integration only
self.memory_manager = HybridMemoryManager(
memory_service=self.memory_service,
cognee_tools=None # No MCP tools, direct integration used instead
)
# Get the agent card (the identity)
self.agent_card = get_fuzzforge_agent_card(f"http://localhost:{self.port}")
# Create the ADK agent (for A2A server mode)
self.adk_agent = self._create_adk_agent()
def _create_adk_agent(self) -> Agent:
"""Create the ADK agent for A2A server mode"""
# Build instruction
instruction = f"""You are {self.agent_card.name}, {self.agent_card.description}
Your capabilities include:
"""
for skill in self.agent_card.skills:
instruction += f"\n- {skill.name}: {skill.description}"
instruction += """
When responding to requests:
1. Use your registered agents when appropriate
2. Use Cognee memory tools when available
3. Provide helpful, concise responses
4. Maintain context across conversations
"""
# Create ADK agent
return Agent(
model=LiteLlm(model=self.model),
name=self.agent_card.name,
description=self.agent_card.description,
instruction=instruction,
tools=self.executor.agent.tools if hasattr(self.executor.agent, 'tools') else []
)
async def process_message(self, message: str, context_id: str = None) -> str:
"""Process a message using the executor"""
result = await self.executor.execute(message, context_id or "default")
return result.get("response", "No response generated")
async def register_agent(self, url: str) -> Dict[str, Any]:
"""Register a new agent"""
return await self.executor.register_agent(url)
def list_agents(self) -> List[Dict[str, Any]]:
"""List registered agents"""
return self.executor.list_agents()
async def cleanup(self):
"""Clean up resources"""
await self.executor.cleanup()
# Create a singleton instance for import
_instance = None
def get_fuzzforge_agent() -> FuzzForgeAgent:
"""Get the singleton FuzzForge agent instance"""
global _instance
if _instance is None:
_instance = FuzzForgeAgent()
return _instance

View File

@@ -0,0 +1,183 @@
"""
FuzzForge Agent Card and Skills Definition
Defines what FuzzForge can do and how others can discover it
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
@dataclass
class AgentSkill:
"""Represents a specific capability of the agent"""
id: str
name: str
description: str
tags: List[str]
examples: List[str]
input_modes: List[str] = None
output_modes: List[str] = None
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary for JSON serialization"""
return {
"id": self.id,
"name": self.name,
"description": self.description,
"tags": self.tags,
"examples": self.examples,
"inputModes": self.input_modes or ["text/plain"],
"outputModes": self.output_modes or ["text/plain"]
}
@dataclass
class AgentCapabilities:
"""Defines agent capabilities for A2A protocol"""
streaming: bool = False
push_notifications: bool = False
multi_turn: bool = True
context_retention: bool = True
def to_dict(self) -> Dict[str, Any]:
return {
"streaming": self.streaming,
"pushNotifications": self.push_notifications,
"multiTurn": self.multi_turn,
"contextRetention": self.context_retention
}
@dataclass
class AgentCard:
"""The agent's business card - tells others what this agent can do"""
name: str
description: str
version: str
url: str
skills: List[AgentSkill]
capabilities: AgentCapabilities
default_input_modes: List[str] = None
default_output_modes: List[str] = None
preferred_transport: str = "JSONRPC"
protocol_version: str = "0.3.0"
def to_dict(self) -> Dict[str, Any]:
"""Convert to A2A-compliant agent card JSON"""
return {
"name": self.name,
"description": self.description,
"version": self.version,
"url": self.url,
"protocolVersion": self.protocol_version,
"preferredTransport": self.preferred_transport,
"defaultInputModes": self.default_input_modes or ["text/plain"],
"defaultOutputModes": self.default_output_modes or ["text/plain"],
"capabilities": self.capabilities.to_dict(),
"skills": [skill.to_dict() for skill in self.skills]
}
# Define FuzzForge's skills
orchestration_skill = AgentSkill(
id="orchestration",
name="Agent Orchestration",
description="Route requests to appropriate registered agents based on their capabilities",
tags=["orchestration", "routing", "coordination"],
examples=[
"Route this to the calculator",
"Send this to the appropriate agent",
"Which agent should handle this?"
]
)
memory_skill = AgentSkill(
id="memory",
name="Memory Management",
description="Store and retrieve information using Cognee knowledge graph",
tags=["memory", "knowledge", "storage", "cognee"],
examples=[
"Remember that my favorite color is blue",
"What do you remember about me?",
"Search your memory for project details"
]
)
conversation_skill = AgentSkill(
id="conversation",
name="General Conversation",
description="Engage in general conversation and answer questions using LLM",
tags=["chat", "conversation", "qa", "llm"],
examples=[
"What is the meaning of life?",
"Explain quantum computing",
"Help me understand this concept"
]
)
workflow_automation_skill = AgentSkill(
id="workflow_automation",
name="Workflow Automation",
description="Operate project workflows via MCP, monitor runs, and share results",
tags=["workflow", "automation", "mcp", "orchestration"],
examples=[
"Submit the security assessment workflow",
"Kick off the infrastructure scan and monitor it",
"Summarise findings for run abc123"
]
)
agent_management_skill = AgentSkill(
id="agent_management",
name="Agent Registry Management",
description="Register, list, and manage connections to other A2A agents",
tags=["registry", "management", "discovery"],
examples=[
"Register agent at http://localhost:10201",
"List all registered agents",
"Show agent capabilities"
]
)
# Define FuzzForge's capabilities
fuzzforge_capabilities = AgentCapabilities(
streaming=False,
push_notifications=True,
multi_turn=True, # We support multi-turn conversations
context_retention=True # We maintain context across turns
)
# Create the public agent card
def get_fuzzforge_agent_card(url: str = "http://localhost:10100") -> AgentCard:
"""Get FuzzForge's agent card with current configuration"""
return AgentCard(
name="ProjectOrchestrator",
description=(
"An A2A-capable project agent that can launch and monitor FuzzForge workflows, "
"consult the project knowledge graph, and coordinate with speciality agents."
),
version="project-agent",
url=url,
skills=[
orchestration_skill,
memory_skill,
conversation_skill,
workflow_automation_skill,
agent_management_skill
],
capabilities=fuzzforge_capabilities,
default_input_modes=["text/plain", "application/json"],
default_output_modes=["text/plain", "application/json"],
preferred_transport="JSONRPC",
protocol_version="0.3.0"
)

File diff suppressed because it is too large Load Diff

977
ai/src/fuzzforge_ai/cli.py Executable file
View File

@@ -0,0 +1,977 @@
#!/usr/bin/env python3
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
"""
FuzzForge CLI - Clean modular version
Uses the separated agent components
"""
import asyncio
import shlex
import os
import sys
import signal
import warnings
import logging
import random
from datetime import datetime
from contextlib import contextmanager
from pathlib import Path
from typing import Any
from dotenv import load_dotenv
# Ensure Cognee writes logs inside the project workspace
project_root = Path.cwd()
default_log_dir = project_root / ".fuzzforge" / "logs"
default_log_dir.mkdir(parents=True, exist_ok=True)
log_path = default_log_dir / "cognee.log"
os.environ.setdefault("COGNEE_LOG_PATH", str(log_path))
# Suppress warnings
warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.ERROR)
# Load .env file with explicit path handling
# 1. First check current working directory for .fuzzforge/.env
fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
if fuzzforge_env.exists():
load_dotenv(fuzzforge_env, override=True)
else:
# 2. Then check parent directories for .fuzzforge projects
current_path = Path.cwd()
for parent in [current_path] + list(current_path.parents):
fuzzforge_dir = parent / ".fuzzforge"
if fuzzforge_dir.exists():
project_env = fuzzforge_dir / ".env"
if project_env.exists():
load_dotenv(project_env, override=True)
break
else:
# 3. Fallback to generic load_dotenv
load_dotenv(override=True)
# Enhanced readline configuration for Rich Console input compatibility
try:
import readline
# Enable Rich-compatible input features
readline.parse_and_bind("tab: complete")
readline.parse_and_bind("set editing-mode emacs")
readline.parse_and_bind("set show-all-if-ambiguous on")
readline.parse_and_bind("set completion-ignore-case on")
readline.parse_and_bind("set colored-completion-prefix on")
readline.parse_and_bind("set enable-bracketed-paste on") # Better paste support
# Navigation bindings for better editing
readline.parse_and_bind("Control-a: beginning-of-line")
readline.parse_and_bind("Control-e: end-of-line")
readline.parse_and_bind("Control-u: unix-line-discard")
readline.parse_and_bind("Control-k: kill-line")
readline.parse_and_bind("Control-w: unix-word-rubout")
readline.parse_and_bind("Meta-Backspace: backward-kill-word")
# History and completion
readline.set_history_length(2000)
readline.set_startup_hook(None)
# Enable multiline editing hints
readline.parse_and_bind("set horizontal-scroll-mode off")
readline.parse_and_bind("set mark-symlinked-directories on")
READLINE_AVAILABLE = True
except ImportError:
READLINE_AVAILABLE = False
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.prompt import Prompt
from rich import box
from google.adk.events.event import Event
from google.adk.events.event_actions import EventActions
from google.genai import types as gen_types
from .agent import FuzzForgeAgent
from .agent_card import get_fuzzforge_agent_card
from .config_manager import ConfigManager
from .config_bridge import ProjectConfigManager
from .remote_agent import RemoteAgentConnection
console = Console()
# Global shutdown flag
shutdown_requested = False
# Dynamic status messages for better UX
THINKING_MESSAGES = [
"Thinking", "Processing", "Computing", "Analyzing", "Working",
"Pondering", "Deliberating", "Calculating", "Reasoning", "Evaluating"
]
WORKING_MESSAGES = [
"Working", "Processing", "Handling", "Executing", "Running",
"Operating", "Performing", "Conducting", "Managing", "Coordinating"
]
SEARCH_MESSAGES = [
"Searching", "Scanning", "Exploring", "Investigating", "Hunting",
"Seeking", "Probing", "Examining", "Inspecting", "Browsing"
]
# Cool prompt symbols
PROMPT_STYLES = [
"", "", "", "", "»", "", "", "", "", ""
]
def get_dynamic_status(action_type="thinking"):
"""Get a random status message based on action type"""
if action_type == "thinking":
return f"{random.choice(THINKING_MESSAGES)}..."
elif action_type == "working":
return f"{random.choice(WORKING_MESSAGES)}..."
elif action_type == "searching":
return f"{random.choice(SEARCH_MESSAGES)}..."
else:
return f"{random.choice(THINKING_MESSAGES)}..."
def get_prompt_symbol():
"""Get prompt symbol indicating where to write"""
return ">>"
def signal_handler(signum, frame):
"""Handle Ctrl+C gracefully"""
global shutdown_requested
shutdown_requested = True
console.print("\n\n[yellow]Shutting down gracefully...[/yellow]")
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
@contextmanager
def safe_status(message: str):
"""Safe status context manager"""
status = console.status(message, spinner="dots")
try:
status.start()
yield
finally:
status.stop()
class FuzzForgeCLI:
"""Command-line interface for FuzzForge"""
def __init__(self):
"""Initialize the CLI"""
# Ensure .env is loaded from .fuzzforge directory
fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
if fuzzforge_env.exists():
load_dotenv(fuzzforge_env, override=True)
# Load configuration for agent registry
self.config_manager = ConfigManager()
# Check environment configuration
if not os.getenv('LITELLM_MODEL'):
console.print("[red]ERROR: LITELLM_MODEL not set in .env file[/red]")
console.print("Please set LITELLM_MODEL to your desired model")
sys.exit(1)
# Create the agent (uses env vars directly)
self.agent = FuzzForgeAgent()
# Create a consistent context ID for this CLI session
self.context_id = f"cli_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
# Track registered agents for config persistence
self.agents_modified = False
# Command handlers
self.commands = {
"/help": self.cmd_help,
"/register": self.cmd_register,
"/unregister": self.cmd_unregister,
"/list": self.cmd_list,
"/memory": self.cmd_memory,
"/recall": self.cmd_recall,
"/artifacts": self.cmd_artifacts,
"/tasks": self.cmd_tasks,
"/skills": self.cmd_skills,
"/sessions": self.cmd_sessions,
"/clear": self.cmd_clear,
"/sendfile": self.cmd_sendfile,
"/quit": self.cmd_quit,
"/exit": self.cmd_quit,
}
self.background_tasks: set[asyncio.Task] = set()
def print_banner(self):
"""Print welcome banner"""
card = self.agent.agent_card
# Print ASCII banner
console.print("[medium_purple3] ███████╗██╗ ██╗███████╗███████╗███████╗ ██████╗ ██████╗ ██████╗ ███████╗ █████╗ ██╗[/medium_purple3]")
console.print("[medium_purple3] ██╔════╝██║ ██║╚══███╔╝╚══███╔╝██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔══██╗██║[/medium_purple3]")
console.print("[medium_purple3] █████╗ ██║ ██║ ███╔╝ ███╔╝ █████╗ ██║ ██║██████╔╝██║ ███╗█████╗ ███████║██║[/medium_purple3]")
console.print("[medium_purple3] ██╔══╝ ██║ ██║ ███╔╝ ███╔╝ ██╔══╝ ██║ ██║██╔══██╗██║ ██║██╔══╝ ██╔══██║██║[/medium_purple3]")
console.print("[medium_purple3] ██║ ╚██████╔╝███████╗███████╗██║ ╚██████╔╝██║ ██║╚██████╔╝███████╗ ██║ ██║██║[/medium_purple3]")
console.print("[medium_purple3] ╚═╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═╝╚═╝[/medium_purple3]")
console.print(f"\n[dim]{card.description}[/dim]\n")
provider = (
os.getenv("LLM_PROVIDER")
or os.getenv("LLM_COGNEE_PROVIDER")
or os.getenv("COGNEE_LLM_PROVIDER")
or "unknown"
)
console.print(
"LLM Provider: [medium_purple1]{provider}[/medium_purple1]".format(
provider=provider
)
)
console.print(
"LLM Model: [medium_purple1]{model}[/medium_purple1]".format(
model=self.agent.model
)
)
if self.agent.executor.agentops_trace:
console.print(f"Tracking: [medium_purple1]AgentOps active[/medium_purple1]")
# Show skills
console.print("\nSkills:")
for skill in card.skills:
console.print(
f" • [deep_sky_blue1]{skill.name}[/deep_sky_blue1] {skill.description}"
)
console.print("\nType /help for commands or just chat\n")
async def cmd_help(self, args: str = "") -> None:
"""Show help"""
help_text = """
[bold]Commands:[/bold]
/register <url> - Register an A2A agent (saves to config)
/unregister <name> - Remove agent from registry and config
/list - List registered agents
[bold]Memory Systems:[/bold]
/recall <query> - Search past conversations (ADK Memory)
/memory - Show knowledge graph (Cognee)
/memory save - Save to knowledge graph
/memory search - Search knowledge graph
[bold]Other:[/bold]
/artifacts - List created artifacts
/artifacts <id> - Show artifact content
/tasks [id] - Show task list or details
/skills - Show FuzzForge skills
/sessions - List active sessions
/sendfile <agent> <path> [message] - Attach file as artifact and route to agent
/clear - Clear screen
/help - Show this help
/quit - Exit
[bold]Sample prompts:[/bold]
run fuzzforge workflow security_assessment on /absolute/path --volume-mode ro
list fuzzforge runs limit=5
get fuzzforge summary <run_id>
query project knowledge about "unsafe Rust" using GRAPH_COMPLETION
export project file src/lib.rs as artifact
/memory search "recent findings"
[bold]Input Editing:[/bold]
Arrow keys - Move cursor
Ctrl+A/E - Start/end of line
Up/Down - Command history
"""
console.print(help_text)
async def cmd_register(self, args: str) -> None:
"""Register an agent"""
if not args:
console.print("Usage: /register <url>")
return
with safe_status(f"{get_dynamic_status('working')} Registering {args}"):
result = await self.agent.register_agent(args.strip())
if result["success"]:
console.print(f"✅ Registered: [bold]{result['name']}[/bold]")
console.print(f" Capabilities: {result['capabilities']} skills")
# Get description from the agent's card
agents = self.agent.list_agents()
description = ""
for agent in agents:
if agent['name'] == result['name']:
description = agent.get('description', '')
break
# Add to config for persistence
self.config_manager.add_registered_agent(
name=result['name'],
url=args.strip(),
description=description
)
console.print(f" [dim]Saved to config for auto-registration[/dim]")
else:
console.print(f"[red]Failed: {result['error']}[/red]")
async def cmd_unregister(self, args: str) -> None:
"""Unregister an agent and remove from config"""
if not args:
console.print("Usage: /unregister <name or url>")
return
# Try to find the agent
agents = self.agent.list_agents()
agent_to_remove = None
for agent in agents:
if agent['name'].lower() == args.lower() or agent['url'] == args:
agent_to_remove = agent
break
if not agent_to_remove:
console.print(f"[yellow]Agent '{args}' not found[/yellow]")
return
# Remove from config
if self.config_manager.remove_registered_agent(name=agent_to_remove['name'], url=agent_to_remove['url']):
console.print(f"✅ Unregistered: [bold]{agent_to_remove['name']}[/bold]")
console.print(f" [dim]Removed from config (won't auto-register next time)[/dim]")
else:
console.print(f"[yellow]Agent unregistered from session but not found in config[/yellow]")
async def cmd_list(self, args: str = "") -> None:
"""List registered agents"""
agents = self.agent.list_agents()
if not agents:
console.print("No agents registered. Use /register <url>")
return
table = Table(title="Registered Agents", box=box.ROUNDED)
table.add_column("Name", style="medium_purple3")
table.add_column("URL", style="deep_sky_blue3")
table.add_column("Skills", style="plum3")
table.add_column("Description", style="dim")
for agent in agents:
desc = agent['description']
if len(desc) > 40:
desc = desc[:37] + "..."
table.add_row(
agent['name'],
agent['url'],
str(agent['skills']),
desc
)
console.print(table)
async def cmd_recall(self, args: str = "") -> None:
"""Search conversational memory (past conversations)"""
if not args:
console.print("Usage: /recall <query>")
return
await self._sync_conversational_memory()
# First try MemoryService (for ingested memories)
with safe_status(get_dynamic_status('searching')):
results = await self.agent.memory_manager.search_conversational_memory(args)
if results and results.memories:
console.print(f"[bold]Found {len(results.memories)} memories:[/bold]\n")
for i, memory in enumerate(results.memories, 1):
# MemoryEntry has 'text' field, not 'content'
text = getattr(memory, 'text', str(memory))
if len(text) > 200:
text = text[:200] + "..."
console.print(f"{i}. {text}")
else:
# If MemoryService is empty, search SQLite directly
console.print("[yellow]No memories in MemoryService, searching SQLite sessions...[/yellow]")
# Check if using DatabaseSessionService
if hasattr(self.agent.executor, 'session_service'):
service_type = type(self.agent.executor.session_service).__name__
if service_type == 'DatabaseSessionService':
# Search SQLite database directly
import sqlite3
import os
db_path = os.getenv('SESSION_DB_PATH', './fuzzforge_sessions.db')
if os.path.exists(db_path):
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Search in events table
query = f"%{args}%"
cursor.execute(
"SELECT content FROM events WHERE content LIKE ? LIMIT 10",
(query,)
)
rows = cursor.fetchall()
conn.close()
if rows:
console.print(f"[green]Found {len(rows)} matches in SQLite sessions:[/green]\n")
for i, (content,) in enumerate(rows, 1):
# Parse JSON content
import json
try:
data = json.loads(content)
if 'parts' in data and data['parts']:
text = data['parts'][0].get('text', '')[:150]
role = data.get('role', 'unknown')
console.print(f"{i}. [{role}]: {text}...")
except:
console.print(f"{i}. {content[:150]}...")
else:
console.print("[yellow]No matches found in SQLite either[/yellow]")
else:
console.print("[yellow]SQLite database not found[/yellow]")
else:
console.print(f"[dim]Using {service_type} (not searchable)[/dim]")
else:
console.print("[yellow]No session history available[/yellow]")
async def cmd_memory(self, args: str = "") -> None:
"""Inspect conversational memory and knowledge graph state."""
raw_args = (args or "").strip()
lower_args = raw_args.lower()
if not raw_args or lower_args in {"status", "info"}:
await self._show_memory_status()
return
if lower_args == "datasets":
await self._show_dataset_summary()
return
if lower_args.startswith("search ") or lower_args.startswith("recall "):
query = raw_args.split(" ", 1)[1].strip() if " " in raw_args else ""
if not query:
console.print("Usage: /memory search <query>")
return
await self.cmd_recall(query)
return
console.print("Usage: /memory [status|datasets|search <query>]")
console.print("[dim]/memory search <query> is an alias for /recall <query>[/dim]")
async def _sync_conversational_memory(self) -> None:
"""Ensure the ADK memory service ingests any completed sessions."""
memory_service = getattr(self.agent.memory_manager, "memory_service", None)
executor_sessions = getattr(self.agent.executor, "sessions", {})
metadata_map = getattr(self.agent.executor, "session_metadata", {})
if not memory_service or not executor_sessions:
return
for context_id, session in list(executor_sessions.items()):
meta = metadata_map.get(context_id, {})
if meta.get('memory_synced'):
continue
add_session = getattr(memory_service, "add_session_to_memory", None)
if not callable(add_session):
return
try:
await add_session(session)
meta['memory_synced'] = True
metadata_map[context_id] = meta
except Exception as exc: # pragma: no cover - defensive logging
if os.getenv('FUZZFORGE_DEBUG', '0') == '1':
console.print(f"[yellow]Memory sync failed:[/yellow] {exc}")
async def _show_memory_status(self) -> None:
"""Render conversational memory, session store, and knowledge graph status."""
await self._sync_conversational_memory()
status = self.agent.memory_manager.get_status()
conversational = status.get("conversational_memory", {})
conv_type = conversational.get("type", "unknown")
conv_active = "yes" if conversational.get("active") else "no"
conv_details = conversational.get("details", "")
session_service = getattr(self.agent.executor, "session_service", None)
session_service_name = type(session_service).__name__ if session_service else "Unavailable"
session_lines = [
f"[bold]Service:[/bold] {session_service_name}"
]
session_count = None
event_count = None
db_path_display = None
if session_service_name == "DatabaseSessionService":
import sqlite3
db_path = os.getenv('SESSION_DB_PATH', './fuzzforge_sessions.db')
session_path = Path(db_path).expanduser().resolve()
db_path_display = str(session_path)
if session_path.exists():
try:
with sqlite3.connect(session_path) as conn:
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM sessions")
session_count = cursor.fetchone()[0]
cursor.execute("SELECT COUNT(*) FROM events")
event_count = cursor.fetchone()[0]
except Exception as exc:
session_lines.append(f"[yellow]Warning:[/yellow] Unable to read session database ({exc})")
else:
session_lines.append("[yellow]SQLite session database not found yet[/yellow]")
elif session_service_name == "InMemorySessionService":
session_lines.append("[dim]Session data persists for the current process only[/dim]")
if db_path_display:
session_lines.append(f"[bold]Database:[/bold] {db_path_display}")
if session_count is not None:
session_lines.append(f"[bold]Sessions Recorded:[/bold] {session_count}")
if event_count is not None:
session_lines.append(f"[bold]Events Logged:[/bold] {event_count}")
conv_lines = [
f"[bold]Type:[/bold] {conv_type}",
f"[bold]Active:[/bold] {conv_active}"
]
if conv_details:
conv_lines.append(f"[bold]Details:[/bold] {conv_details}")
console.print(Panel("\n".join(conv_lines), title="Conversation Memory", border_style="medium_purple3"))
console.print(Panel("\n".join(session_lines), title="Session Store", border_style="deep_sky_blue3"))
# Knowledge graph section
knowledge = status.get("knowledge_graph", {})
kg_active = knowledge.get("active", False)
kg_lines = [
f"[bold]Active:[/bold] {'yes' if kg_active else 'no'}",
f"[bold]Purpose:[/bold] {knowledge.get('purpose', 'N/A')}"
]
cognee_data = None
cognee_error = None
try:
project_config = ProjectConfigManager()
cognee_data = project_config.get_cognee_config()
except Exception as exc: # pragma: no cover - defensive
cognee_error = str(exc)
if cognee_data:
data_dir = cognee_data.get('data_directory')
system_dir = cognee_data.get('system_directory')
if data_dir:
kg_lines.append(f"[bold]Data dir:[/bold] {data_dir}")
if system_dir:
kg_lines.append(f"[bold]System dir:[/bold] {system_dir}")
elif cognee_error:
kg_lines.append(f"[yellow]Config unavailable:[/yellow] {cognee_error}")
dataset_summary = None
if kg_active:
try:
integration = await self.agent.executor._get_knowledge_integration()
if integration:
dataset_summary = await integration.list_datasets()
except Exception as exc: # pragma: no cover - defensive
kg_lines.append(f"[yellow]Dataset listing failed:[/yellow] {exc}")
if dataset_summary:
if dataset_summary.get("error"):
kg_lines.append(f"[yellow]Dataset listing failed:[/yellow] {dataset_summary['error']}")
else:
datasets = dataset_summary.get("datasets", [])
total = dataset_summary.get("total_datasets")
if total is not None:
kg_lines.append(f"[bold]Datasets:[/bold] {total}")
if datasets:
preview = ", ".join(sorted(datasets)[:5])
if len(datasets) > 5:
preview += ", …"
kg_lines.append(f"[bold]Samples:[/bold] {preview}")
else:
kg_lines.append("[dim]Run `fuzzforge ingest` to populate the knowledge graph[/dim]")
console.print(Panel("\n".join(kg_lines), title="Knowledge Graph", border_style="spring_green4"))
console.print("\n[dim]Subcommands: /memory datasets | /memory search <query>[/dim]")
async def _show_dataset_summary(self) -> None:
"""List datasets available in the Cognee knowledge graph."""
try:
integration = await self.agent.executor._get_knowledge_integration()
except Exception as exc:
console.print(f"[yellow]Knowledge graph unavailable:[/yellow] {exc}")
return
if not integration:
console.print("[yellow]Knowledge graph is not initialised yet.[/yellow]")
console.print("[dim]Run `fuzzforge ingest --path . --recursive` to create the project dataset.[/dim]")
return
with safe_status(get_dynamic_status('searching')):
dataset_info = await integration.list_datasets()
if dataset_info.get("error"):
console.print(f"[red]{dataset_info['error']}[/red]")
return
datasets = dataset_info.get("datasets", [])
if not datasets:
console.print("[yellow]No datasets found.[/yellow]")
console.print("[dim]Run `fuzzforge ingest` to populate the knowledge graph.[/dim]")
return
table = Table(title="Cognee Datasets", box=box.ROUNDED)
table.add_column("Dataset", style="medium_purple3")
table.add_column("Notes", style="dim")
for name in sorted(datasets):
note = ""
if name.endswith("_codebase"):
note = "primary project dataset"
table.add_row(name, note)
console.print(table)
console.print(
"[dim]Use knowledge graph prompts (e.g. `search project knowledge for \"topic\" using INSIGHTS`) to query these datasets.[/dim]"
)
async def cmd_artifacts(self, args: str = "") -> None:
"""List or show artifacts"""
if args:
# Show specific artifact
artifacts = await self.agent.executor.get_artifacts(self.context_id)
for artifact in artifacts:
if artifact['id'] == args or args in artifact['id']:
console.print(Panel(
f"[bold]{artifact['title']}[/bold]\n"
f"Type: {artifact['type']} | Created: {artifact['created_at'][:19]}\n\n"
f"[code]{artifact['content']}[/code]",
title=f"Artifact: {artifact['id']}",
border_style="medium_purple3"
))
return
console.print(f"[yellow]Artifact {args} not found[/yellow]")
return
# List all artifacts
artifacts = await self.agent.executor.get_artifacts(self.context_id)
if not artifacts:
console.print("No artifacts created yet")
console.print("[dim]Artifacts are created when generating code, configs, or documents[/dim]")
return
table = Table(title="Artifacts", box=box.ROUNDED)
table.add_column("ID", style="medium_purple3")
table.add_column("Type", style="deep_sky_blue3")
table.add_column("Title", style="plum3")
table.add_column("Size", style="dim")
table.add_column("Created", style="dim")
for artifact in artifacts:
size = f"{len(artifact['content'])} chars"
created = artifact['created_at'][:19] # Just date and time
table.add_row(
artifact['id'],
artifact['type'],
artifact['title'][:40] + "..." if len(artifact['title']) > 40 else artifact['title'],
size,
created
)
console.print(table)
console.print(f"\n[dim]Use /artifacts <id> to view artifact content[/dim]")
async def cmd_tasks(self, args: str = "") -> None:
"""List tasks or show details for a specific task."""
store = getattr(self.agent.executor, "task_store", None)
if not store or not hasattr(store, "tasks"):
console.print("Task store not available")
return
task_id = args.strip()
async with store.lock:
tasks = dict(store.tasks)
if not tasks:
console.print("No tasks recorded yet")
return
if task_id:
task = tasks.get(task_id)
if not task:
console.print(f"Task '{task_id}' not found")
return
state_str = task.status.state.value if hasattr(task.status.state, "value") else str(task.status.state)
console.print(f"\n[bold]Task {task.id}[/bold]")
console.print(f"Context: {task.context_id}")
console.print(f"State: {state_str}")
console.print(f"Timestamp: {task.status.timestamp}")
if task.metadata:
console.print("Metadata:")
for key, value in task.metadata.items():
console.print(f"{key}: {value}")
if task.history:
console.print("History:")
for entry in task.history[-5:]:
text = getattr(entry, "text", None)
if not text and hasattr(entry, "parts"):
text = " ".join(
getattr(part, "text", "") for part in getattr(entry, "parts", [])
)
console.print(f" - {text}")
return
table = Table(title="FuzzForge Tasks", box=box.ROUNDED)
table.add_column("ID", style="medium_purple3")
table.add_column("State", style="white")
table.add_column("Workflow", style="deep_sky_blue3")
table.add_column("Updated", style="green")
for task in tasks.values():
state_value = task.status.state.value if hasattr(task.status.state, "value") else str(task.status.state)
workflow = ""
if task.metadata:
workflow = task.metadata.get("workflow") or task.metadata.get("workflow_name") or ""
timestamp = task.status.timestamp if task.status else ""
table.add_row(task.id, state_value, workflow, timestamp)
console.print(table)
console.print("\n[dim]Use /tasks <id> to view task details[/dim]")
async def cmd_sessions(self, args: str = "") -> None:
"""List active sessions"""
sessions = self.agent.executor.sessions
if not sessions:
console.print("No active sessions")
return
table = Table(title="Active Sessions", box=box.ROUNDED)
table.add_column("Context ID", style="medium_purple3")
table.add_column("Session ID", style="deep_sky_blue3")
table.add_column("User ID", style="plum3")
table.add_column("State", style="dim")
for context_id, session in sessions.items():
# Get session info
session_id = getattr(session, 'id', 'N/A')
user_id = getattr(session, 'user_id', 'N/A')
state = getattr(session, 'state', {})
# Format state info
agents_count = len(state.get('registered_agents', []))
state_info = f"{agents_count} agents registered"
table.add_row(
context_id[:20] + "..." if len(context_id) > 20 else context_id,
session_id[:20] + "..." if len(str(session_id)) > 20 else str(session_id),
user_id,
state_info
)
console.print(table)
console.print(f"\n[dim]Current session: {self.context_id}[/dim]")
async def cmd_skills(self, args: str = "") -> None:
"""Show FuzzForge skills"""
card = self.agent.agent_card
table = Table(title=f"{card.name} Skills", box=box.ROUNDED)
table.add_column("Skill", style="medium_purple3")
table.add_column("Description", style="white")
table.add_column("Tags", style="deep_sky_blue3")
for skill in card.skills:
table.add_row(
skill.name,
skill.description,
", ".join(skill.tags[:3])
)
console.print(table)
async def cmd_clear(self, args: str = "") -> None:
"""Clear screen"""
console.clear()
self.print_banner()
async def cmd_sendfile(self, args: str) -> None:
"""Encode a local file as an artifact and route it to a registered agent."""
tokens = shlex.split(args)
if len(tokens) < 2:
console.print("Usage: /sendfile <agent_name> <path> [message]")
return
agent_name = tokens[0]
file_arg = tokens[1]
note = " ".join(tokens[2:]).strip()
file_path = Path(file_arg).expanduser()
if not file_path.exists():
console.print(f"[red]File not found:[/red] {file_path}")
return
session = self.agent.executor.sessions.get(self.context_id)
if not session:
console.print("[red]No active session available. Try sending a prompt first.[/red]")
return
console.print(f"[dim]Delegating {file_path.name} to {agent_name}...[/dim]")
async def _delegate() -> None:
try:
response = await self.agent.executor.delegate_file_to_agent(
agent_name,
str(file_path),
note,
session=session,
context_id=self.context_id,
)
console.print(f"[{agent_name}]: {response}")
except Exception as exc:
console.print(f"[red]Failed to delegate file:[/red] {exc}")
finally:
self.background_tasks.discard(asyncio.current_task())
task = asyncio.create_task(_delegate())
self.background_tasks.add(task)
console.print("[dim]Delegation in progress… you can continue working.[/dim]")
async def cmd_quit(self, args: str = "") -> None:
"""Exit the CLI"""
console.print("\n[green]Shutting down...[/green]")
await self.agent.cleanup()
if self.background_tasks:
for task in list(self.background_tasks):
task.cancel()
await asyncio.gather(*self.background_tasks, return_exceptions=True)
console.print("Goodbye!\n")
sys.exit(0)
async def process_command(self, text: str) -> bool:
"""Process slash commands"""
if not text.startswith('/'):
return False
parts = text.split(maxsplit=1)
cmd = parts[0].lower()
args = parts[1] if len(parts) > 1 else ""
if cmd in self.commands:
await self.commands[cmd](args)
return True
console.print(f"Unknown command: {cmd}")
return True
async def auto_register_agents(self):
"""Auto-register agents from config on startup"""
agents_to_register = self.config_manager.get_registered_agents()
if agents_to_register:
console.print(f"\n[dim]Auto-registering {len(agents_to_register)} agents from config...[/dim]")
for agent_config in agents_to_register:
url = agent_config.get('url')
name = agent_config.get('name', 'Unknown')
if url:
try:
with safe_status(f"Registering {name}..."):
result = await self.agent.register_agent(url)
if result["success"]:
console.print(f"{name}: [green]Connected[/green]")
else:
console.print(f" ⚠️ {name}: [yellow]Failed - {result.get('error', 'Unknown error')}[/yellow]")
except Exception as e:
console.print(f" ⚠️ {name}: [yellow]Failed - {e}[/yellow]")
console.print("") # Empty line for spacing
async def run(self):
"""Main CLI loop"""
self.print_banner()
# Auto-register agents from config
await self.auto_register_agents()
while not shutdown_requested:
try:
# Use standard input with non-deletable colored prompt
prompt_symbol = get_prompt_symbol()
try:
# Print colored prompt then use input() for non-deletable behavior
console.print(f"[medium_purple3]{prompt_symbol}[/medium_purple3] ", end="")
user_input = input().strip()
except (EOFError, KeyboardInterrupt):
raise
if not user_input:
continue
# Check for commands
if await self.process_command(user_input):
continue
# Process message
with safe_status(get_dynamic_status('thinking')):
response = await self.agent.process_message(user_input, self.context_id)
# Display response
console.print(f"\n{response}\n")
except KeyboardInterrupt:
await self.cmd_quit()
except EOFError:
await self.cmd_quit()
except Exception as e:
console.print(f"[red]Error: {e}[/red]")
if os.getenv('FUZZFORGE_DEBUG') == '1':
console.print_exception()
console.print("")
await self.agent.cleanup()
def main():
"""Main entry point"""
try:
cli = FuzzForgeCLI()
asyncio.run(cli.run())
except KeyboardInterrupt:
console.print("\n[yellow]Interrupted[/yellow]")
sys.exit(0)
except Exception as e:
console.print(f"[red]Fatal error: {e}[/red]")
if os.getenv('FUZZFORGE_DEBUG') == '1':
console.print_exception()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,435 @@
"""
Cognee Integration Module for FuzzForge
Provides standardized access to project-specific knowledge graphs
Can be reused by external agents and other components
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import os
import asyncio
import json
from typing import Dict, List, Any, Optional, Union
from pathlib import Path
class CogneeProjectIntegration:
"""
Standardized Cognee integration that can be reused across agents
Automatically detects project context and provides knowledge graph access
"""
def __init__(self, project_dir: Optional[str] = None):
"""
Initialize with project directory (defaults to current working directory)
Args:
project_dir: Path to project directory (optional, defaults to cwd)
"""
self.project_dir = Path(project_dir) if project_dir else Path.cwd()
self.config_file = self.project_dir / ".fuzzforge" / "config.yaml"
self.project_context = None
self._cognee = None
self._initialized = False
async def initialize(self) -> bool:
"""
Initialize Cognee with project context
Returns:
bool: True if initialization successful
"""
try:
# Import Cognee
import cognee
self._cognee = cognee
# Load project context
if not self._load_project_context():
return False
# Configure Cognee for this project
await self._setup_cognee_config()
self._initialized = True
return True
except ImportError:
print("Cognee not installed. Install with: pip install cognee")
return False
except Exception as e:
print(f"Failed to initialize Cognee: {e}")
return False
def _load_project_context(self) -> bool:
"""Load project context from FuzzForge config"""
try:
if not self.config_file.exists():
print(f"No FuzzForge config found at {self.config_file}")
return False
import yaml
with open(self.config_file, 'r') as f:
config = yaml.safe_load(f)
self.project_context = {
"project_name": config.get("project", {}).get("name", "default"),
"project_id": config.get("project", {}).get("id", "default"),
"tenant_id": config.get("cognee", {}).get("tenant", "default")
}
return True
except Exception as e:
print(f"Error loading project context: {e}")
return False
async def _setup_cognee_config(self):
"""Configure Cognee for project-specific access"""
# Set API key and model
api_key = os.getenv('OPENAI_API_KEY')
model = os.getenv('LITELLM_MODEL', 'gpt-4o-mini')
if not api_key:
raise ValueError("OPENAI_API_KEY required for Cognee operations")
# Configure Cognee
self._cognee.config.set_llm_api_key(api_key)
self._cognee.config.set_llm_model(model)
self._cognee.config.set_llm_provider("openai")
# Set project-specific directories
project_cognee_dir = self.project_dir / ".fuzzforge" / "cognee" / f"project_{self.project_context['project_id']}"
self._cognee.config.data_root_directory(str(project_cognee_dir / "data"))
self._cognee.config.system_root_directory(str(project_cognee_dir / "system"))
# Ensure directories exist
project_cognee_dir.mkdir(parents=True, exist_ok=True)
(project_cognee_dir / "data").mkdir(exist_ok=True)
(project_cognee_dir / "system").mkdir(exist_ok=True)
async def search_knowledge_graph(self, query: str, search_type: str = "GRAPH_COMPLETION", dataset: str = None) -> Dict[str, Any]:
"""
Search the project's knowledge graph
Args:
query: Search query
search_type: Type of search ("GRAPH_COMPLETION", "INSIGHTS", "CHUNKS", etc.)
dataset: Specific dataset to search (optional)
Returns:
Dict containing search results
"""
if not self._initialized:
await self.initialize()
if not self._initialized:
return {"error": "Cognee not initialized"}
try:
from cognee.modules.search.types import SearchType
# Resolve search type dynamically; fallback to GRAPH_COMPLETION
try:
search_type_enum = getattr(SearchType, search_type.upper())
except AttributeError:
search_type_enum = SearchType.GRAPH_COMPLETION
search_type = "GRAPH_COMPLETION"
# Prepare search kwargs
search_kwargs = {
"query_type": search_type_enum,
"query_text": query
}
# Add dataset filter if specified
if dataset:
search_kwargs["datasets"] = [dataset]
results = await self._cognee.search(**search_kwargs)
return {
"query": query,
"search_type": search_type,
"dataset": dataset,
"results": results,
"project": self.project_context["project_name"]
}
except Exception as e:
return {"error": f"Search failed: {e}"}
async def list_knowledge_data(self) -> Dict[str, Any]:
"""
List available data in the knowledge graph
Returns:
Dict containing available data
"""
if not self._initialized:
await self.initialize()
if not self._initialized:
return {"error": "Cognee not initialized"}
try:
data = await self._cognee.list_data()
return {
"project": self.project_context["project_name"],
"available_data": data
}
except Exception as e:
return {"error": f"Failed to list data: {e}"}
async def ingest_text_to_dataset(self, text: str, dataset: str = None) -> Dict[str, Any]:
"""
Ingest text content into a specific dataset
Args:
text: Text to ingest
dataset: Dataset name (defaults to project_name_codebase)
Returns:
Dict containing ingest results
"""
if not self._initialized:
await self.initialize()
if not self._initialized:
return {"error": "Cognee not initialized"}
if not dataset:
dataset = f"{self.project_context['project_name']}_codebase"
try:
# Add text to dataset
await self._cognee.add([text], dataset_name=dataset)
# Process (cognify) the dataset
await self._cognee.cognify([dataset])
return {
"text_length": len(text),
"dataset": dataset,
"project": self.project_context["project_name"],
"status": "success"
}
except Exception as e:
return {"error": f"Ingest failed: {e}"}
async def ingest_files_to_dataset(self, file_paths: list, dataset: str = None) -> Dict[str, Any]:
"""
Ingest multiple files into a specific dataset
Args:
file_paths: List of file paths to ingest
dataset: Dataset name (defaults to project_name_codebase)
Returns:
Dict containing ingest results
"""
if not self._initialized:
await self.initialize()
if not self._initialized:
return {"error": "Cognee not initialized"}
if not dataset:
dataset = f"{self.project_context['project_name']}_codebase"
try:
# Validate and filter readable files
valid_files = []
for file_path in file_paths:
try:
path = Path(file_path)
if path.exists() and path.is_file():
# Test if file is readable
with open(path, 'r', encoding='utf-8') as f:
f.read(1)
valid_files.append(str(path))
except (UnicodeDecodeError, PermissionError, OSError):
continue
if not valid_files:
return {"error": "No valid files found to ingest"}
# Add files to dataset
await self._cognee.add(valid_files, dataset_name=dataset)
# Process (cognify) the dataset
await self._cognee.cognify([dataset])
return {
"files_processed": len(valid_files),
"total_files_requested": len(file_paths),
"dataset": dataset,
"project": self.project_context["project_name"],
"status": "success"
}
except Exception as e:
return {"error": f"Ingest failed: {e}"}
async def list_datasets(self) -> Dict[str, Any]:
"""
List all datasets available in the project
Returns:
Dict containing available datasets
"""
if not self._initialized:
await self.initialize()
if not self._initialized:
return {"error": "Cognee not initialized"}
try:
# Get available datasets by searching for data
data = await self._cognee.list_data()
# Extract unique dataset names from the data
datasets = set()
if isinstance(data, list):
for item in data:
if isinstance(item, dict) and 'dataset_name' in item:
datasets.add(item['dataset_name'])
return {
"project": self.project_context["project_name"],
"datasets": list(datasets),
"total_datasets": len(datasets)
}
except Exception as e:
return {"error": f"Failed to list datasets: {e}"}
async def create_dataset(self, dataset: str) -> Dict[str, Any]:
"""
Create a new dataset (dataset is created automatically when data is added)
Args:
dataset: Dataset name to create
Returns:
Dict containing creation result
"""
if not self._initialized:
await self.initialize()
if not self._initialized:
return {"error": "Cognee not initialized"}
try:
# In Cognee, datasets are created implicitly when data is added
# We'll add empty content to create the dataset
await self._cognee.add([f"Dataset {dataset} initialized for project {self.project_context['project_name']}"],
dataset_name=dataset)
return {
"dataset": dataset,
"project": self.project_context["project_name"],
"status": "created"
}
except Exception as e:
return {"error": f"Failed to create dataset: {e}"}
def get_project_context(self) -> Optional[Dict[str, str]]:
"""Get current project context"""
return self.project_context
def is_initialized(self) -> bool:
"""Check if Cognee is initialized"""
return self._initialized
# Convenience functions for easy integration
async def search_project_codebase(query: str, project_dir: Optional[str] = None, dataset: str = None, search_type: str = "GRAPH_COMPLETION") -> str:
"""
Convenience function to search project codebase
Args:
query: Search query
project_dir: Project directory (optional, defaults to cwd)
dataset: Specific dataset to search (optional)
search_type: Type of search ("GRAPH_COMPLETION", "INSIGHTS", "CHUNKS")
Returns:
Formatted search results as string
"""
cognee_integration = CogneeProjectIntegration(project_dir)
result = await cognee_integration.search_knowledge_graph(query, search_type, dataset)
if "error" in result:
return f"Error searching codebase: {result['error']}"
project_name = result.get("project", "Unknown")
results = result.get("results", [])
if not results:
return f"No results found for '{query}' in project {project_name}"
output = f"Search results for '{query}' in project {project_name}:\n\n"
# Format results
if isinstance(results, list):
for i, item in enumerate(results, 1):
if isinstance(item, dict):
# Handle structured results
output += f"{i}. "
if "search_result" in item:
output += f"Dataset: {item.get('dataset_name', 'Unknown')}\n"
for result_item in item["search_result"]:
if isinstance(result_item, dict):
if "name" in result_item:
output += f" - {result_item['name']}: {result_item.get('description', '')}\n"
elif "text" in result_item:
text = result_item["text"][:200] + "..." if len(result_item["text"]) > 200 else result_item["text"]
output += f" - {text}\n"
else:
output += f" - {str(result_item)[:200]}...\n"
else:
output += f"{str(item)[:200]}...\n"
output += "\n"
else:
output += f"{i}. {str(item)[:200]}...\n\n"
else:
output += f"{str(results)[:500]}..."
return output
async def list_project_knowledge(project_dir: Optional[str] = None) -> str:
"""
Convenience function to list project knowledge
Args:
project_dir: Project directory (optional, defaults to cwd)
Returns:
Formatted list of available data
"""
cognee_integration = CogneeProjectIntegration(project_dir)
result = await cognee_integration.list_knowledge_data()
if "error" in result:
return f"Error listing knowledge: {result['error']}"
project_name = result.get("project", "Unknown")
data = result.get("available_data", [])
output = f"Available knowledge in project {project_name}:\n\n"
if not data:
output += "No data available in knowledge graph"
else:
for i, item in enumerate(data, 1):
output += f"{i}. {item}\n"
return output

View File

@@ -0,0 +1,416 @@
"""
Cognee Service for FuzzForge
Provides integrated Cognee functionality for codebase analysis and knowledge graphs
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import os
import asyncio
import logging
from pathlib import Path
from typing import Dict, List, Any, Optional
from datetime import datetime
logger = logging.getLogger(__name__)
class CogneeService:
"""
Service for managing Cognee integration with FuzzForge
Handles multi-tenant isolation and project-specific knowledge graphs
"""
def __init__(self, config):
"""Initialize with FuzzForge config"""
self.config = config
self.cognee_config = config.get_cognee_config()
self.project_context = config.get_project_context()
self._cognee = None
self._user = None
self._initialized = False
async def initialize(self):
"""Initialize Cognee with project-specific configuration"""
try:
# Ensure environment variables for Cognee are set before import
self.config.setup_cognee_environment()
logger.debug(
"Cognee environment configured",
extra={
"data": self.cognee_config.get("data_directory"),
"system": self.cognee_config.get("system_directory"),
},
)
import cognee
self._cognee = cognee
# Configure LLM with API key BEFORE any other cognee operations
provider = os.getenv("LLM_PROVIDER", "openai")
model = os.getenv("LLM_MODEL") or os.getenv("LITELLM_MODEL", "gpt-4o-mini")
api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
endpoint = os.getenv("LLM_ENDPOINT")
api_version = os.getenv("LLM_API_VERSION")
max_tokens = os.getenv("LLM_MAX_TOKENS")
if provider.lower() in {"openai", "azure_openai", "custom"} and not api_key:
raise ValueError(
"OpenAI-compatible API key is required for Cognee LLM operations. "
"Set OPENAI_API_KEY, LLM_API_KEY, or COGNEE_LLM_API_KEY in your .env"
)
# Expose environment variables for downstream libraries
os.environ["LLM_PROVIDER"] = provider
os.environ["LITELLM_MODEL"] = model
os.environ["LLM_MODEL"] = model
if api_key:
os.environ["LLM_API_KEY"] = api_key
# Maintain compatibility with components still expecting OPENAI_API_KEY
if provider.lower() in {"openai", "azure_openai", "custom"}:
os.environ.setdefault("OPENAI_API_KEY", api_key)
if endpoint:
os.environ["LLM_ENDPOINT"] = endpoint
if api_version:
os.environ["LLM_API_VERSION"] = api_version
if max_tokens:
os.environ["LLM_MAX_TOKENS"] = str(max_tokens)
# Configure Cognee's runtime using its configuration helpers when available
if hasattr(cognee.config, "set_llm_provider"):
cognee.config.set_llm_provider(provider)
if hasattr(cognee.config, "set_llm_model"):
cognee.config.set_llm_model(model)
if api_key and hasattr(cognee.config, "set_llm_api_key"):
cognee.config.set_llm_api_key(api_key)
if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
cognee.config.set_llm_endpoint(endpoint)
if api_version and hasattr(cognee.config, "set_llm_api_version"):
cognee.config.set_llm_api_version(api_version)
if max_tokens and hasattr(cognee.config, "set_llm_max_tokens"):
cognee.config.set_llm_max_tokens(int(max_tokens))
# Configure graph database
cognee.config.set_graph_db_config({
"graph_database_provider": self.cognee_config.get("graph_database_provider", "kuzu"),
})
# Set data directories
data_dir = self.cognee_config.get("data_directory")
system_dir = self.cognee_config.get("system_directory")
if data_dir:
logger.debug("Setting cognee data root", extra={"path": data_dir})
cognee.config.data_root_directory(data_dir)
if system_dir:
logger.debug("Setting cognee system root", extra={"path": system_dir})
cognee.config.system_root_directory(system_dir)
# Setup multi-tenant user context
await self._setup_user_context()
self._initialized = True
logger.info(f"Cognee initialized for project {self.project_context['project_name']} "
f"with Kuzu at {system_dir}")
except ImportError:
logger.error("Cognee not installed. Install with: pip install cognee")
raise
except Exception as e:
logger.error(f"Failed to initialize Cognee: {e}")
raise
async def create_dataset(self):
"""Create dataset for this project if it doesn't exist"""
if not self._initialized:
await self.initialize()
try:
# Dataset creation is handled automatically by Cognee when adding files
# We just ensure we have the right context set up
dataset_name = f"{self.project_context['project_name']}_codebase"
logger.info(f"Dataset {dataset_name} ready for project {self.project_context['project_name']}")
return dataset_name
except Exception as e:
logger.error(f"Failed to create dataset: {e}")
raise
async def _setup_user_context(self):
"""Setup user context for multi-tenant isolation"""
try:
from cognee.modules.users.methods import create_user, get_user
# Always try fallback email first to avoid validation issues
fallback_email = f"project_{self.project_context['project_id']}@fuzzforge.example"
user_tenant = self.project_context['tenant_id']
# Try to get existing fallback user first
try:
self._user = await get_user(fallback_email)
logger.info(f"Using existing user: {fallback_email}")
return
except:
# User doesn't exist, try to create fallback
pass
# Create fallback user
try:
self._user = await create_user(fallback_email, user_tenant)
logger.info(f"Created fallback user: {fallback_email} for tenant: {user_tenant}")
return
except Exception as fallback_error:
logger.warning(f"Fallback user creation failed: {fallback_error}")
self._user = None
return
except Exception as e:
logger.warning(f"Could not setup multi-tenant user context: {e}")
logger.info("Proceeding with default context")
self._user = None
def get_project_dataset_name(self, dataset_suffix: str = "codebase") -> str:
"""Get project-specific dataset name"""
return f"{self.project_context['project_name']}_{dataset_suffix}"
async def ingest_text(self, content: str, dataset: str = "fuzzforge") -> bool:
"""Ingest text content into knowledge graph"""
if not self._initialized:
await self.initialize()
try:
await self._cognee.add([content], dataset)
await self._cognee.cognify([dataset])
return True
except Exception as e:
logger.error(f"Failed to ingest text: {e}")
return False
async def ingest_files(self, file_paths: List[Path], dataset: str = "fuzzforge") -> Dict[str, Any]:
"""Ingest multiple files into knowledge graph"""
if not self._initialized:
await self.initialize()
results = {
"success": 0,
"failed": 0,
"errors": []
}
try:
ingest_paths: List[str] = []
for file_path in file_paths:
try:
with open(file_path, 'r', encoding='utf-8'):
ingest_paths.append(str(file_path))
results["success"] += 1
except (UnicodeDecodeError, PermissionError) as exc:
results["failed"] += 1
results["errors"].append(f"{file_path}: {exc}")
logger.warning("Skipping %s: %s", file_path, exc)
if ingest_paths:
await self._cognee.add(ingest_paths, dataset_name=dataset)
await self._cognee.cognify([dataset])
except Exception as e:
logger.error(f"Failed to ingest files: {e}")
results["errors"].append(f"Cognify error: {str(e)}")
return results
async def search_insights(self, query: str, dataset: str = None) -> List[str]:
"""Search for insights in the knowledge graph"""
if not self._initialized:
await self.initialize()
try:
from cognee.modules.search.types import SearchType
kwargs = {
"query_type": SearchType.INSIGHTS,
"query_text": query
}
if dataset:
kwargs["datasets"] = [dataset]
results = await self._cognee.search(**kwargs)
return results if isinstance(results, list) else []
except Exception as e:
logger.error(f"Failed to search insights: {e}")
return []
async def search_chunks(self, query: str, dataset: str = None) -> List[str]:
"""Search for relevant text chunks"""
if not self._initialized:
await self.initialize()
try:
from cognee.modules.search.types import SearchType
kwargs = {
"query_type": SearchType.CHUNKS,
"query_text": query
}
if dataset:
kwargs["datasets"] = [dataset]
results = await self._cognee.search(**kwargs)
return results if isinstance(results, list) else []
except Exception as e:
logger.error(f"Failed to search chunks: {e}")
return []
async def search_graph_completion(self, query: str) -> List[str]:
"""Search for graph completion (relationships)"""
if not self._initialized:
await self.initialize()
try:
from cognee.modules.search.types import SearchType
results = await self._cognee.search(
query_type=SearchType.GRAPH_COMPLETION,
query_text=query
)
return results if isinstance(results, list) else []
except Exception as e:
logger.error(f"Failed to search graph completion: {e}")
return []
async def get_status(self) -> Dict[str, Any]:
"""Get service status and statistics"""
status = {
"initialized": self._initialized,
"enabled": self.cognee_config.get("enabled", True),
"provider": self.cognee_config.get("graph_database_provider", "kuzu"),
"data_directory": self.cognee_config.get("data_directory"),
"system_directory": self.cognee_config.get("system_directory"),
}
if self._initialized:
try:
# Check if directories exist and get sizes
data_dir = Path(status["data_directory"])
system_dir = Path(status["system_directory"])
status.update({
"data_dir_exists": data_dir.exists(),
"system_dir_exists": system_dir.exists(),
"kuzu_db_exists": (system_dir / "kuzu_db").exists(),
"lancedb_exists": (system_dir / "lancedb").exists(),
})
except Exception as e:
status["status_error"] = str(e)
return status
async def clear_data(self, confirm: bool = False):
"""Clear all ingested data (dangerous!)"""
if not confirm:
raise ValueError("Must confirm data clearing with confirm=True")
if not self._initialized:
await self.initialize()
try:
await self._cognee.prune.prune_data()
await self._cognee.prune.prune_system(metadata=True)
logger.info("Cognee data cleared")
except Exception as e:
logger.error(f"Failed to clear data: {e}")
raise
class FuzzForgeCogneeIntegration:
"""
Main integration class for FuzzForge + Cognee
Provides high-level operations for security analysis
"""
def __init__(self, config):
self.service = CogneeService(config)
async def analyze_codebase(self, path: Path, recursive: bool = True) -> Dict[str, Any]:
"""
Analyze a codebase and extract security-relevant insights
"""
# Collect code files
from fuzzforge_ai.ingest_utils import collect_ingest_files
files = collect_ingest_files(path, recursive, None, [])
if not files:
return {"error": "No files found to analyze"}
# Ingest files
results = await self.service.ingest_files(files, "security_analysis")
if results["success"] == 0:
return {"error": "Failed to ingest any files", "details": results}
# Extract security insights
security_queries = [
"vulnerabilities security risks",
"authentication authorization",
"input validation sanitization",
"encryption cryptography",
"error handling exceptions",
"logging sensitive data"
]
insights = {}
for query in security_queries:
insight_results = await self.service.search_insights(query, "security_analysis")
if insight_results:
insights[query.replace(" ", "_")] = insight_results
return {
"files_processed": results["success"],
"files_failed": results["failed"],
"errors": results["errors"],
"security_insights": insights
}
async def query_codebase(self, query: str, search_type: str = "insights") -> List[str]:
"""Query the ingested codebase"""
if search_type == "insights":
return await self.service.search_insights(query)
elif search_type == "chunks":
return await self.service.search_chunks(query)
elif search_type == "graph":
return await self.service.search_graph_completion(query)
else:
raise ValueError(f"Unknown search type: {search_type}")
async def get_project_summary(self) -> Dict[str, Any]:
"""Get a summary of the analyzed project"""
# Search for general project insights
summary_queries = [
"project structure components",
"main functionality features",
"programming languages frameworks",
"dependencies libraries"
]
summary = {}
for query in summary_queries:
results = await self.service.search_insights(query)
if results:
summary[query.replace(" ", "_")] = results[:3] # Top 3 results
return summary

View File

@@ -0,0 +1,9 @@
# FuzzForge Registered Agents
# These agents will be automatically registered on startup
registered_agents:
# Example entries:
# - name: Calculator
# url: http://localhost:10201
# description: Mathematical calculations agent

View File

@@ -0,0 +1,31 @@
"""Bridge module providing access to the host CLI configuration manager."""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
try:
from fuzzforge_cli.config import ProjectConfigManager as _ProjectConfigManager
except ImportError as exc: # pragma: no cover - used when CLI not available
class _ProjectConfigManager: # type: ignore[no-redef]
"""Fallback implementation that raises a helpful error."""
def __init__(self, *args, **kwargs):
raise ImportError(
"ProjectConfigManager is unavailable. Install the FuzzForge CLI "
"package or supply a compatible configuration object."
) from exc
def __getattr__(name): # pragma: no cover - defensive
raise ImportError("ProjectConfigManager unavailable") from exc
ProjectConfigManager = _ProjectConfigManager
__all__ = ["ProjectConfigManager"]

View File

@@ -0,0 +1,134 @@
"""
Configuration manager for FuzzForge
Handles loading and saving registered agents
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import os
import yaml
from typing import Dict, Any, List
class ConfigManager:
"""Manages FuzzForge agent registry configuration"""
def __init__(self, config_path: str = None):
"""Initialize config manager"""
if config_path:
self.config_path = config_path
else:
# Check for local .fuzzforge/agents.yaml first, then fall back to global
local_config = os.path.join(os.getcwd(), '.fuzzforge', 'agents.yaml')
global_config = os.path.join(os.path.dirname(__file__), 'config.yaml')
if os.path.exists(local_config):
self.config_path = local_config
if os.getenv("FUZZFORGE_DEBUG", "0") == "1":
print(f"[CONFIG] Using local config: {local_config}")
else:
self.config_path = global_config
if os.getenv("FUZZFORGE_DEBUG", "0") == "1":
print(f"[CONFIG] Using global config: {global_config}")
self.config = self.load_config()
def load_config(self) -> Dict[str, Any]:
"""Load configuration from YAML file"""
if not os.path.exists(self.config_path):
# Create default config if it doesn't exist
return {'registered_agents': []}
try:
with open(self.config_path, 'r') as f:
config = yaml.safe_load(f) or {}
# Ensure registered_agents is a list
if 'registered_agents' not in config or config['registered_agents'] is None:
config['registered_agents'] = []
return config
except Exception as e:
print(f"[WARNING] Failed to load config: {e}")
return {'registered_agents': []}
def save_config(self):
"""Save current configuration to file"""
try:
# Create a clean config with comments
config_content = """# FuzzForge Registered Agents
# These agents will be automatically registered on startup
"""
# Add the agents list
if self.config.get('registered_agents'):
config_content += yaml.dump({'registered_agents': self.config['registered_agents']},
default_flow_style=False, sort_keys=False)
else:
config_content += "registered_agents: []\n"
config_content += """
# Example entries:
# - name: Calculator
# url: http://localhost:10201
# description: Mathematical calculations agent
"""
with open(self.config_path, 'w') as f:
f.write(config_content)
return True
except Exception as e:
print(f"[ERROR] Failed to save config: {e}")
return False
def get_registered_agents(self) -> List[Dict[str, Any]]:
"""Get list of registered agents from config"""
return self.config.get('registered_agents', [])
def add_registered_agent(self, name: str, url: str, description: str = "") -> bool:
"""Add a new registered agent to config"""
if 'registered_agents' not in self.config:
self.config['registered_agents'] = []
# Check if agent already exists
for agent in self.config['registered_agents']:
if agent.get('url') == url:
# Update existing agent
agent['name'] = name
agent['description'] = description
return self.save_config()
# Add new agent
self.config['registered_agents'].append({
'name': name,
'url': url,
'description': description
})
return self.save_config()
def remove_registered_agent(self, name: str = None, url: str = None) -> bool:
"""Remove a registered agent from config"""
if 'registered_agents' not in self.config:
return False
original_count = len(self.config['registered_agents'])
# Filter out the agent
self.config['registered_agents'] = [
agent for agent in self.config['registered_agents']
if not ((name and agent.get('name') == name) or
(url and agent.get('url') == url))
]
if len(self.config['registered_agents']) < original_count:
return self.save_config()
return False

View File

@@ -0,0 +1,104 @@
"""Utilities for collecting files to ingest into Cognee."""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from __future__ import annotations
import fnmatch
from pathlib import Path
from typing import Iterable, List, Optional
_DEFAULT_FILE_TYPES = [
".py",
".js",
".ts",
".java",
".cpp",
".c",
".h",
".rs",
".go",
".rb",
".php",
".cs",
".swift",
".kt",
".scala",
".clj",
".hs",
".md",
".txt",
".yaml",
".yml",
".json",
".toml",
".cfg",
".ini",
]
_DEFAULT_EXCLUDE = [
"*.pyc",
"__pycache__",
".git",
".svn",
".hg",
"node_modules",
".venv",
"venv",
".env",
"dist",
"build",
".pytest_cache",
".mypy_cache",
".tox",
"coverage",
"*.log",
"*.tmp",
]
def collect_ingest_files(
path: Path,
recursive: bool = True,
file_types: Optional[Iterable[str]] = None,
exclude: Optional[Iterable[str]] = None,
) -> List[Path]:
"""Return a list of files eligible for ingestion."""
path = path.resolve()
files: List[Path] = []
extensions = list(file_types) if file_types else list(_DEFAULT_FILE_TYPES)
exclusions = list(exclude) if exclude else []
exclusions.extend(_DEFAULT_EXCLUDE)
def should_exclude(file_path: Path) -> bool:
file_str = str(file_path)
for pattern in exclusions:
if fnmatch.fnmatch(file_str, f"*{pattern}*") or fnmatch.fnmatch(file_path.name, pattern):
return True
return False
if path.is_file():
if not should_exclude(path) and any(str(path).endswith(ext) for ext in extensions):
files.append(path)
return files
pattern = "**/*" if recursive else "*"
for file_path in path.glob(pattern):
if file_path.is_file() and not should_exclude(file_path):
if any(str(file_path).endswith(ext) for ext in extensions):
files.append(file_path)
return files
__all__ = ["collect_ingest_files"]

View File

@@ -0,0 +1,247 @@
"""
FuzzForge Memory Service
Implements ADK MemoryService pattern for conversational memory
Separate from Cognee which will be used for RAG/codebase analysis
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import os
import json
from typing import Dict, List, Any, Optional
from datetime import datetime
import logging
# ADK Memory imports
from google.adk.memory import InMemoryMemoryService, BaseMemoryService
from google.adk.memory.base_memory_service import SearchMemoryResponse
from google.adk.memory.memory_entry import MemoryEntry
# Optional VertexAI Memory Bank
try:
from google.adk.memory import VertexAiMemoryBankService
VERTEX_AVAILABLE = True
except ImportError:
VERTEX_AVAILABLE = False
logger = logging.getLogger(__name__)
class FuzzForgeMemoryService:
"""
Manages conversational memory using ADK patterns
This is separate from Cognee which will handle RAG/codebase
"""
def __init__(self, memory_type: str = "inmemory", **kwargs):
"""
Initialize memory service
Args:
memory_type: "inmemory" or "vertexai"
**kwargs: Additional args for specific memory service
For vertexai: project, location, agent_engine_id
"""
self.memory_type = memory_type
self.service = self._create_service(memory_type, **kwargs)
def _create_service(self, memory_type: str, **kwargs) -> BaseMemoryService:
"""Create the appropriate memory service"""
if memory_type == "inmemory":
# Use ADK's InMemoryMemoryService for local development
logger.info("Using InMemory MemoryService for conversational memory")
return InMemoryMemoryService()
elif memory_type == "vertexai" and VERTEX_AVAILABLE:
# Use VertexAI Memory Bank for production
project = kwargs.get('project') or os.getenv('GOOGLE_CLOUD_PROJECT')
location = kwargs.get('location') or os.getenv('GOOGLE_CLOUD_LOCATION', 'us-central1')
agent_engine_id = kwargs.get('agent_engine_id') or os.getenv('AGENT_ENGINE_ID')
if not all([project, location, agent_engine_id]):
logger.warning("VertexAI config missing, falling back to InMemory")
return InMemoryMemoryService()
logger.info(f"Using VertexAI MemoryBank: {agent_engine_id}")
return VertexAiMemoryBankService(
project=project,
location=location,
agent_engine_id=agent_engine_id
)
else:
# Default to in-memory
logger.info("Defaulting to InMemory MemoryService")
return InMemoryMemoryService()
async def add_session_to_memory(self, session: Any) -> None:
"""
Add a completed session to long-term memory
This extracts meaningful information from the conversation
Args:
session: The session object to process
"""
try:
# Let the underlying service handle the ingestion
# It will extract relevant information based on the implementation
await self.service.add_session_to_memory(session)
logger.debug(f"Added session {session.id} to {self.memory_type} memory")
except Exception as e:
logger.error(f"Failed to add session to memory: {e}")
async def search_memory(self,
query: str,
app_name: str = "fuzzforge",
user_id: str = None,
max_results: int = 10) -> SearchMemoryResponse:
"""
Search long-term memory for relevant information
Args:
query: The search query
app_name: Application name for filtering
user_id: User ID for filtering (optional)
max_results: Maximum number of results
Returns:
SearchMemoryResponse with relevant memories
"""
try:
# Search the memory service
results = await self.service.search_memory(
app_name=app_name,
user_id=user_id,
query=query
)
logger.debug(f"Memory search for '{query}' returned {len(results.memories)} results")
return results
except Exception as e:
logger.error(f"Memory search failed: {e}")
# Return empty results on error
return SearchMemoryResponse(memories=[])
async def ingest_completed_sessions(self, session_service) -> int:
"""
Batch ingest all completed sessions into memory
Useful for initial memory population
Args:
session_service: The session service containing sessions
Returns:
Number of sessions ingested
"""
ingested = 0
try:
# Get all sessions from the session service
sessions = await session_service.list_sessions(app_name="fuzzforge")
for session_info in sessions:
# Load full session
session = await session_service.load_session(
app_name="fuzzforge",
user_id=session_info.get('user_id'),
session_id=session_info.get('id')
)
if session and len(session.get_events()) > 0:
await self.add_session_to_memory(session)
ingested += 1
logger.info(f"Ingested {ingested} sessions into {self.memory_type} memory")
except Exception as e:
logger.error(f"Failed to batch ingest sessions: {e}")
return ingested
def get_status(self) -> Dict[str, Any]:
"""Get memory service status"""
return {
"type": self.memory_type,
"active": self.service is not None,
"vertex_available": VERTEX_AVAILABLE,
"details": {
"inmemory": "Non-persistent, keyword search",
"vertexai": "Persistent, semantic search with LLM extraction"
}.get(self.memory_type, "Unknown")
}
class HybridMemoryManager:
"""
Manages both ADK MemoryService (conversational) and Cognee (RAG/codebase)
Provides unified interface for both memory systems
"""
def __init__(self,
memory_service: FuzzForgeMemoryService = None,
cognee_tools = None):
"""
Initialize with both memory systems
Args:
memory_service: ADK-pattern memory for conversations
cognee_tools: Cognee MCP tools for RAG/codebase
"""
# ADK memory for conversations
self.memory_service = memory_service or FuzzForgeMemoryService()
# Cognee for knowledge graphs and RAG (future)
self.cognee_tools = cognee_tools
async def search_conversational_memory(self, query: str) -> SearchMemoryResponse:
"""Search past conversations using ADK memory"""
return await self.memory_service.search_memory(query)
async def search_knowledge_graph(self, query: str, search_type: str = "GRAPH_COMPLETION"):
"""Search Cognee knowledge graph (for RAG/codebase in future)"""
if not self.cognee_tools:
return None
try:
# Use Cognee's graph search
return await self.cognee_tools.search(
query=query,
search_type=search_type
)
except Exception as e:
logger.debug(f"Cognee search failed: {e}")
return None
async def store_in_graph(self, content: str):
"""Store in Cognee knowledge graph (for codebase analysis later)"""
if not self.cognee_tools:
return None
try:
# Use cognify to create graph structures
return await self.cognee_tools.cognify(content)
except Exception as e:
logger.debug(f"Cognee store failed: {e}")
return None
def get_status(self) -> Dict[str, Any]:
"""Get status of both memory systems"""
return {
"conversational_memory": self.memory_service.get_status(),
"knowledge_graph": {
"active": self.cognee_tools is not None,
"purpose": "RAG/codebase analysis (future)"
}
}

View File

@@ -0,0 +1,148 @@
"""
Remote Agent Connection Handler
Handles A2A protocol communication with remote agents
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import httpx
import uuid
from typing import Dict, Any, Optional, List
class RemoteAgentConnection:
"""Handles A2A protocol communication with remote agents"""
def __init__(self, url: str):
"""Initialize connection to a remote agent"""
self.url = url.rstrip('/')
self.agent_card = None
self.client = httpx.AsyncClient(timeout=120.0)
self.context_id = None
async def get_agent_card(self) -> Optional[Dict[str, Any]]:
"""Get the agent card from the remote agent"""
try:
# Try new path first (A2A 0.3.0+)
response = await self.client.get(f"{self.url}/.well-known/agent-card.json")
response.raise_for_status()
self.agent_card = response.json()
return self.agent_card
except:
# Try old path for compatibility
try:
response = await self.client.get(f"{self.url}/.well-known/agent.json")
response.raise_for_status()
self.agent_card = response.json()
return self.agent_card
except Exception as e:
print(f"Failed to get agent card from {self.url}: {e}")
return None
async def send_message(self, message: str | Dict[str, Any] | List[Dict[str, Any]]) -> str:
"""Send a message to the remote agent using A2A protocol"""
try:
parts: List[Dict[str, Any]]
metadata: Dict[str, Any] | None = None
if isinstance(message, dict):
metadata = message.get("metadata") if isinstance(message.get("metadata"), dict) else None
raw_parts = message.get("parts", [])
if not raw_parts:
text_value = message.get("text") or message.get("message")
if isinstance(text_value, str):
raw_parts = [{"type": "text", "text": text_value}]
parts = [raw_part for raw_part in raw_parts if isinstance(raw_part, dict)]
elif isinstance(message, list):
parts = [part for part in message if isinstance(part, dict)]
metadata = None
else:
parts = [{"type": "text", "text": message}]
metadata = None
if not parts:
parts = [{"type": "text", "text": ""}]
# Build JSON-RPC request per A2A spec
payload = {
"jsonrpc": "2.0",
"method": "message/send",
"params": {
"message": {
"messageId": str(uuid.uuid4()),
"role": "user",
"parts": parts,
}
},
"id": 1
}
if metadata:
payload["params"]["message"]["metadata"] = metadata
# Include context if we have one
if self.context_id:
payload["params"]["contextId"] = self.context_id
# Send to root endpoint per A2A protocol
response = await self.client.post(f"{self.url}/", json=payload)
response.raise_for_status()
result = response.json()
# Extract response based on A2A JSON-RPC format
if isinstance(result, dict):
# Update context for continuity
if "result" in result and isinstance(result["result"], dict):
if "contextId" in result["result"]:
self.context_id = result["result"]["contextId"]
# Extract text from artifacts
if "artifacts" in result["result"]:
texts = []
for artifact in result["result"]["artifacts"]:
if isinstance(artifact, dict) and "parts" in artifact:
for part in artifact["parts"]:
if isinstance(part, dict) and "text" in part:
texts.append(part["text"])
if texts:
return " ".join(texts)
# Extract from message format
if "message" in result["result"]:
msg = result["result"]["message"]
if isinstance(msg, dict) and "parts" in msg:
texts = []
for part in msg["parts"]:
if isinstance(part, dict) and "text" in part:
texts.append(part["text"])
return " ".join(texts) if texts else str(msg)
return str(msg)
return str(result["result"])
# Handle error response
elif "error" in result:
error = result["error"]
if isinstance(error, dict):
return f"Error: {error.get('message', str(error))}"
return f"Error: {error}"
# Fallback
return result.get("response", result.get("message", str(result)))
return str(result)
except Exception as e:
return f"Error communicating with agent: {e}"
async def close(self):
"""Close the connection properly"""
await self.client.aclose()

41
backend/Dockerfile Normal file
View File

@@ -0,0 +1,41 @@
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies including Docker client and rsync
RUN apt-get update && apt-get install -y \
curl \
ca-certificates \
gnupg \
lsb-release \
rsync \
&& curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null \
&& apt-get update \
&& apt-get install -y docker-ce-cli \
&& rm -rf /var/lib/apt/lists/*
# Docker client configuration removed - localhost:5001 doesn't require insecure registry config
# Install uv for faster package management
RUN pip install uv
# Copy project files
COPY pyproject.toml ./
COPY uv.lock ./
# Install dependencies
RUN uv sync --no-dev
# Copy source code
COPY . .
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Start the application
CMD ["uv", "run", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

257
backend/README.md Normal file
View File

@@ -0,0 +1,257 @@
# FuzzForge Backend
A stateless API server for security testing workflow orchestration using Prefect. This system dynamically discovers workflows, executes them in isolated Docker containers with volume mounting, and returns findings in SARIF format.
## Architecture Overview
### Core Components
1. **Workflow Discovery System**: Automatically discovers workflows at startup
2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
3. **Prefect Integration**: Handles container orchestration, workflow execution, and monitoring
4. **Volume Mounting**: Secure file access with configurable permissions (ro/rw)
5. **SARIF Output**: Standardized security findings format
### Key Features
- **Stateless**: No persistent data, fully scalable
- **Generic**: No hardcoded workflows, automatic discovery
- **Isolated**: Each workflow runs in its own Docker container
- **Extensible**: Easy to add new workflows and modules
- **Secure**: Read-only volume mounts by default, path validation
- **Observable**: Comprehensive logging and status tracking
## Quick Start
### Prerequisites
- Docker and Docker Compose
### Installation
From the project root, start all services:
```bash
docker-compose up -d
```
This will start:
- Prefect server (API at http://localhost:4200/api)
- PostgreSQL database
- Redis cache
- Docker registry (port 5001)
- Prefect worker (for running workflows)
- FuzzForge backend API (port 8000)
- FuzzForge MCP server (port 8010)
**Note**: The Prefect UI at http://localhost:4200 is not currently accessible from the host due to the API being configured for inter-container communication. Use the REST API or MCP interface instead.
## API Endpoints
### Workflows
- `GET /workflows` - List all discovered workflows
- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
- `GET /workflows/{name}/parameters` - Get workflow parameter schema
- `GET /workflows/metadata/schema` - Get metadata.yaml schema
- `POST /workflows/{name}/submit` - Submit a workflow for execution
### Runs
- `GET /runs/{run_id}/status` - Get run status
- `GET /runs/{run_id}/findings` - Get SARIF findings from completed run
- `GET /runs/{workflow_name}/findings/{run_id}` - Alternative findings endpoint with workflow name
## Workflow Structure
Each workflow must have:
```
toolbox/workflows/{workflow_name}/
workflow.py # Prefect flow definition
metadata.yaml # Mandatory metadata (parameters, version, etc.)
Dockerfile # Optional custom container definition
requirements.txt # Optional Python dependencies
```
### Example metadata.yaml
```yaml
name: security_assessment
version: "1.0.0"
description: "Comprehensive security analysis workflow"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "security"
- "analysis"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
requirements:
tools:
- "file_scanner"
- "security_analyzer"
- "sarif_reporter"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
scanner_config:
type: object
description: "Scanner configuration"
properties:
max_file_size:
type: integer
description: "Maximum file size to scan (bytes)"
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted security findings"
summary:
type: object
description: "Scan execution summary"
```
### Metadata Field Descriptions
- **name**: Workflow identifier (must match directory name)
- **version**: Semantic version (x.y.z format)
- **description**: Human-readable description of the workflow
- **author**: Workflow author/maintainer
- **category**: Workflow category (comprehensive, specialized, fuzzing, focused)
- **tags**: Array of descriptive tags for categorization
- **requirements.tools**: Required security tools that the workflow uses
- **requirements.resources**: Resource requirements enforced at runtime:
- `memory`: Memory limit (e.g., "512Mi", "1Gi")
- `cpu`: CPU limit (e.g., "500m" for 0.5 cores, "1" for 1 core)
- `timeout`: Maximum execution time in seconds
- **parameters**: JSON Schema object defining workflow parameters
- **output_schema**: Expected output format (typically SARIF)
### Resource Requirements
Resource requirements defined in workflow metadata are automatically enforced. Users can override defaults when submitting workflows:
```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/tmp/project",
"volume_mode": "ro",
"resource_limits": {
"memory_limit": "1Gi",
"cpu_limit": "1"
}
}'
```
Resource precedence: User limits > Workflow requirements > System defaults
## Module Development
Modules implement the `BaseModule` interface:
```python
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
class MyModule(BaseModule):
def get_metadata(self) -> ModuleMetadata:
return ModuleMetadata(
name="my_module",
version="1.0.0",
description="Module description",
category="scanner",
...
)
async def execute(self, config: Dict, workspace: Path) -> ModuleResult:
# Module logic here
findings = [...]
return self.create_result(findings=findings)
def validate_config(self, config: Dict) -> bool:
# Validate configuration
return True
```
## Submitting a Workflow
```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/home/user/project",
"volume_mode": "ro",
"parameters": {
"scanner_config": {"patterns": ["*.py"]},
"analyzer_config": {"check_secrets": true}
}
}'
```
## Getting Findings
```bash
curl "http://localhost:8000/runs/{run_id}/findings"
```
Returns SARIF-formatted findings:
```json
{
"workflow": "security_assessment",
"run_id": "abc-123",
"sarif": {
"version": "2.1.0",
"runs": [{
"tool": {...},
"results": [...]
}]
}
}
```
## Security Considerations
1. **Volume Mounting**: Only allowed directories can be mounted
2. **Read-Only Default**: Volumes mounted as read-only unless explicitly set
3. **Container Isolation**: Each workflow runs in an isolated container
4. **Resource Limits**: Can set CPU/memory limits via Prefect
5. **Network Isolation**: Containers use bridge networking
## Development
### Adding a New Workflow
1. Create directory: `toolbox/workflows/my_workflow/`
2. Add `workflow.py` with a Prefect flow
3. Add mandatory `metadata.yaml`
4. Restart backend: `docker-compose restart fuzzforge-backend`
### Adding a New Module
1. Create module in `toolbox/modules/{category}/`
2. Implement `BaseModule` interface
3. Use in workflows via import

122
backend/mcp-config.json Normal file
View File

@@ -0,0 +1,122 @@
{
"name": "FuzzForge Security Testing Platform",
"description": "MCP server for FuzzForge security testing workflows via Docker Compose",
"version": "0.6.0",
"connection": {
"type": "http",
"host": "localhost",
"port": 8010,
"base_url": "http://localhost:8010",
"mcp_endpoint": "/mcp"
},
"docker_compose": {
"service": "fuzzforge-backend",
"command": "docker compose up -d",
"health_check": "http://localhost:8000/health"
},
"capabilities": {
"tools": [
{
"name": "submit_security_scan_mcp",
"description": "Submit a security scanning workflow for execution",
"parameters": {
"workflow_name": "string",
"target_path": "string",
"volume_mode": "string (ro|rw)",
"parameters": "object"
}
},
{
"name": "get_comprehensive_scan_summary",
"description": "Get a comprehensive summary of scan results with analysis",
"parameters": {
"run_id": "string"
}
}
],
"fastapi_routes": [
{
"method": "GET",
"path": "/",
"description": "Get API status and loaded workflows count"
},
{
"method": "GET",
"path": "/workflows/",
"description": "List all available security testing workflows"
},
{
"method": "POST",
"path": "/workflows/{workflow_name}/submit",
"description": "Submit a security scanning workflow for execution"
},
{
"method": "GET",
"path": "/runs/{run_id}/status",
"description": "Get the current status of a security scan run"
},
{
"method": "GET",
"path": "/runs/{run_id}/findings",
"description": "Get security findings from a completed scan"
},
{
"method": "GET",
"path": "/fuzzing/{run_id}/stats",
"description": "Get fuzzing statistics for a run"
}
]
},
"examples": {
"start_infrastructure_scan": {
"description": "Run infrastructure security scan on a project",
"steps": [
"1. Start Docker Compose: docker compose up -d",
"2. Submit scan via MCP tool: submit_security_scan_mcp",
"3. Monitor status and get results"
],
"workflow_name": "infrastructure_scan",
"target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/infrastructure_vulnerable",
"parameters": {
"checkov_config": {
"severity": ["HIGH", "MEDIUM", "LOW"]
},
"hadolint_config": {
"severity": ["error", "warning", "info", "style"]
}
}
},
"static_analysis_scan": {
"description": "Run static analysis security scan",
"workflow_name": "static_analysis_scan",
"target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/static_analysis_vulnerable",
"parameters": {
"bandit_config": {
"severity": ["HIGH", "MEDIUM", "LOW"]
},
"opengrep_config": {
"severity": ["HIGH", "MEDIUM", "LOW"]
}
}
},
"secret_detection_scan": {
"description": "Run secret detection scan",
"workflow_name": "secret_detection_scan",
"target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/secret_detection_vulnerable",
"parameters": {
"trufflehog_config": {
"verified_only": false
},
"gitleaks_config": {
"no_git": true
}
}
}
},
"usage": {
"via_mcp": "Connect MCP client to http://localhost:8010/mcp after starting Docker Compose",
"via_api": "Use FastAPI endpoints directly at http://localhost:8000",
"start_system": "docker compose up -d",
"stop_system": "docker compose down"
}
}

25
backend/pyproject.toml Normal file
View File

@@ -0,0 +1,25 @@
[project]
name = "backend"
version = "0.6.0"
description = "FuzzForge OSS backend"
authors = []
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.116.1",
"prefect>=3.4.18",
"pydantic>=2.0.0",
"pyyaml>=6.0",
"docker>=7.0.0",
"aiofiles>=23.0.0",
"uvicorn>=0.30.0",
"aiohttp>=3.12.15",
"fastmcp",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"httpx>=0.27.0",
]

11
backend/src/__init__.py Normal file
View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

325
backend/src/api/fuzzing.py Normal file
View File

@@ -0,0 +1,325 @@
"""
API endpoints for fuzzing workflow management and real-time monitoring
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from typing import List, Dict, Any
from fastapi import APIRouter, HTTPException, Depends, WebSocket, WebSocketDisconnect
from fastapi.responses import StreamingResponse
import asyncio
import json
from datetime import datetime
from src.models.findings import (
FuzzingStats,
CrashReport
)
from src.core.workflow_discovery import WorkflowDiscovery
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/fuzzing", tags=["fuzzing"])
# In-memory storage for real-time stats (in production, use Redis or similar)
fuzzing_stats: Dict[str, FuzzingStats] = {}
crash_reports: Dict[str, List[CrashReport]] = {}
active_connections: Dict[str, List[WebSocket]] = {}
def initialize_fuzzing_tracking(run_id: str, workflow_name: str):
"""
Initialize fuzzing tracking for a new run.
This function should be called when a workflow is submitted to enable
real-time monitoring and stats collection.
Args:
run_id: The run identifier
workflow_name: Name of the workflow
"""
fuzzing_stats[run_id] = FuzzingStats(
run_id=run_id,
workflow=workflow_name
)
crash_reports[run_id] = []
active_connections[run_id] = []
@router.get("/{run_id}/stats", response_model=FuzzingStats)
async def get_fuzzing_stats(run_id: str) -> FuzzingStats:
"""
Get current fuzzing statistics for a run.
Args:
run_id: The fuzzing run ID
Returns:
Current fuzzing statistics
Raises:
HTTPException: 404 if run not found
"""
if run_id not in fuzzing_stats:
raise HTTPException(
status_code=404,
detail=f"Fuzzing run not found: {run_id}"
)
return fuzzing_stats[run_id]
@router.get("/{run_id}/crashes", response_model=List[CrashReport])
async def get_crash_reports(run_id: str) -> List[CrashReport]:
"""
Get crash reports for a fuzzing run.
Args:
run_id: The fuzzing run ID
Returns:
List of crash reports
Raises:
HTTPException: 404 if run not found
"""
if run_id not in crash_reports:
raise HTTPException(
status_code=404,
detail=f"Fuzzing run not found: {run_id}"
)
return crash_reports[run_id]
@router.post("/{run_id}/stats")
async def update_fuzzing_stats(run_id: str, stats: FuzzingStats):
"""
Update fuzzing statistics (called by fuzzing workflows).
Args:
run_id: The fuzzing run ID
stats: Updated statistics
Raises:
HTTPException: 404 if run not found
"""
if run_id not in fuzzing_stats:
raise HTTPException(
status_code=404,
detail=f"Fuzzing run not found: {run_id}"
)
# Update stats
fuzzing_stats[run_id] = stats
# Debug: log reception for live instrumentation
try:
logger.info(
"Received fuzzing stats update: run_id=%s exec=%s eps=%.2f crashes=%s corpus=%s elapsed=%ss",
run_id,
stats.executions,
stats.executions_per_sec,
stats.crashes,
stats.corpus_size,
stats.elapsed_time,
)
except Exception:
pass
# Notify connected WebSocket clients
if run_id in active_connections:
message = {
"type": "stats_update",
"data": stats.model_dump()
}
for websocket in active_connections[run_id][:]: # Copy to avoid modification during iteration
try:
await websocket.send_text(json.dumps(message))
except Exception:
# Remove disconnected clients
active_connections[run_id].remove(websocket)
@router.post("/{run_id}/crash")
async def report_crash(run_id: str, crash: CrashReport):
"""
Report a new crash (called by fuzzing workflows).
Args:
run_id: The fuzzing run ID
crash: Crash report details
"""
if run_id not in crash_reports:
crash_reports[run_id] = []
# Add crash report
crash_reports[run_id].append(crash)
# Update stats
if run_id in fuzzing_stats:
fuzzing_stats[run_id].crashes += 1
fuzzing_stats[run_id].last_crash_time = crash.timestamp
# Notify connected WebSocket clients
if run_id in active_connections:
message = {
"type": "crash_report",
"data": crash.model_dump()
}
for websocket in active_connections[run_id][:]:
try:
await websocket.send_text(json.dumps(message))
except Exception:
active_connections[run_id].remove(websocket)
@router.websocket("/{run_id}/live")
async def websocket_endpoint(websocket: WebSocket, run_id: str):
"""
WebSocket endpoint for real-time fuzzing updates.
Args:
websocket: WebSocket connection
run_id: The fuzzing run ID to monitor
"""
await websocket.accept()
# Initialize connection tracking
if run_id not in active_connections:
active_connections[run_id] = []
active_connections[run_id].append(websocket)
try:
# Send current stats on connection
if run_id in fuzzing_stats:
current = fuzzing_stats[run_id]
if isinstance(current, dict):
payload = current
elif hasattr(current, "model_dump"):
payload = current.model_dump()
elif hasattr(current, "dict"):
payload = current.dict()
else:
payload = getattr(current, "__dict__", {"run_id": run_id})
message = {"type": "stats_update", "data": payload}
await websocket.send_text(json.dumps(message))
# Keep connection alive
while True:
try:
# Wait for ping or handle disconnect
data = await asyncio.wait_for(websocket.receive_text(), timeout=30.0)
# Echo back for ping-pong
if data == "ping":
await websocket.send_text("pong")
except asyncio.TimeoutError:
# Send periodic heartbeat
await websocket.send_text(json.dumps({"type": "heartbeat"}))
except WebSocketDisconnect:
# Clean up connection
if run_id in active_connections and websocket in active_connections[run_id]:
active_connections[run_id].remove(websocket)
except Exception as e:
logger.error(f"WebSocket error for run {run_id}: {e}")
if run_id in active_connections and websocket in active_connections[run_id]:
active_connections[run_id].remove(websocket)
@router.get("/{run_id}/stream")
async def stream_fuzzing_updates(run_id: str):
"""
Server-Sent Events endpoint for real-time fuzzing updates.
Args:
run_id: The fuzzing run ID to monitor
Returns:
Streaming response with real-time updates
"""
if run_id not in fuzzing_stats:
raise HTTPException(
status_code=404,
detail=f"Fuzzing run not found: {run_id}"
)
async def event_stream():
"""Generate server-sent events for fuzzing updates"""
last_stats_time = datetime.utcnow()
while True:
try:
# Send current stats
if run_id in fuzzing_stats:
current_stats = fuzzing_stats[run_id]
if isinstance(current_stats, dict):
stats_payload = current_stats
elif hasattr(current_stats, "model_dump"):
stats_payload = current_stats.model_dump()
elif hasattr(current_stats, "dict"):
stats_payload = current_stats.dict()
else:
stats_payload = getattr(current_stats, "__dict__", {"run_id": run_id})
event_data = f"data: {json.dumps({'type': 'stats', 'data': stats_payload})}\n\n"
yield event_data
# Send recent crashes
if run_id in crash_reports:
recent_crashes = [
crash for crash in crash_reports[run_id]
if crash.timestamp > last_stats_time
]
for crash in recent_crashes:
event_data = f"data: {json.dumps({'type': 'crash', 'data': crash.model_dump()})}\n\n"
yield event_data
last_stats_time = datetime.utcnow()
await asyncio.sleep(5) # Update every 5 seconds
except Exception as e:
logger.error(f"Error in event stream for run {run_id}: {e}")
break
return StreamingResponse(
event_stream(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
}
)
@router.delete("/{run_id}")
async def cleanup_fuzzing_run(run_id: str):
"""
Clean up fuzzing run data.
Args:
run_id: The fuzzing run ID to clean up
"""
# Clean up tracking data
fuzzing_stats.pop(run_id, None)
crash_reports.pop(run_id, None)
# Close any active WebSocket connections
if run_id in active_connections:
for websocket in active_connections[run_id]:
try:
await websocket.close()
except Exception:
pass
del active_connections[run_id]
return {"message": f"Cleaned up fuzzing run {run_id}"}

184
backend/src/api/runs.py Normal file
View File

@@ -0,0 +1,184 @@
"""
API endpoints for workflow run management and findings retrieval
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from typing import Dict, Any
from fastapi import APIRouter, HTTPException, Depends
from src.models.findings import WorkflowFindings, WorkflowStatus
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/runs", tags=["runs"])
def get_prefect_manager():
"""Dependency to get the Prefect manager instance"""
from src.main import prefect_mgr
return prefect_mgr
@router.get("/{run_id}/status", response_model=WorkflowStatus)
async def get_run_status(
run_id: str,
prefect_mgr=Depends(get_prefect_manager)
) -> WorkflowStatus:
"""
Get the current status of a workflow run.
Args:
run_id: The flow run ID
Returns:
Status information including state, timestamps, and completion flags
Raises:
HTTPException: 404 if run not found
"""
try:
status = await prefect_mgr.get_flow_run_status(run_id)
# Find workflow name from deployment
workflow_name = "unknown"
workflow_deployment_id = status.get("workflow", "")
for name, deployment_id in prefect_mgr.deployments.items():
if str(deployment_id) == str(workflow_deployment_id):
workflow_name = name
break
return WorkflowStatus(
run_id=status["run_id"],
workflow=workflow_name,
status=status["status"],
is_completed=status["is_completed"],
is_failed=status["is_failed"],
is_running=status["is_running"],
created_at=status["created_at"],
updated_at=status["updated_at"]
)
except Exception as e:
logger.error(f"Failed to get status for run {run_id}: {e}")
raise HTTPException(
status_code=404,
detail=f"Run not found: {run_id}"
)
@router.get("/{run_id}/findings", response_model=WorkflowFindings)
async def get_run_findings(
run_id: str,
prefect_mgr=Depends(get_prefect_manager)
) -> WorkflowFindings:
"""
Get the findings from a completed workflow run.
Args:
run_id: The flow run ID
Returns:
SARIF-formatted findings from the workflow execution
Raises:
HTTPException: 404 if run not found, 400 if run not completed
"""
try:
# Get run status first
status = await prefect_mgr.get_flow_run_status(run_id)
if not status["is_completed"]:
if status["is_running"]:
raise HTTPException(
status_code=400,
detail=f"Run {run_id} is still running. Current status: {status['status']}"
)
elif status["is_failed"]:
raise HTTPException(
status_code=400,
detail=f"Run {run_id} failed. Status: {status['status']}"
)
else:
raise HTTPException(
status_code=400,
detail=f"Run {run_id} not completed. Status: {status['status']}"
)
# Get the findings
findings = await prefect_mgr.get_flow_run_findings(run_id)
# Find workflow name
workflow_name = "unknown"
workflow_deployment_id = status.get("workflow", "")
for name, deployment_id in prefect_mgr.deployments.items():
if str(deployment_id) == str(workflow_deployment_id):
workflow_name = name
break
# Get workflow version if available
metadata = {
"completion_time": status["updated_at"],
"workflow_version": "unknown"
}
if workflow_name in prefect_mgr.workflows:
workflow_info = prefect_mgr.workflows[workflow_name]
metadata["workflow_version"] = workflow_info.metadata.get("version", "unknown")
return WorkflowFindings(
workflow=workflow_name,
run_id=run_id,
sarif=findings,
metadata=metadata
)
except HTTPException:
raise
except Exception as e:
logger.error(f"Failed to get findings for run {run_id}: {e}")
raise HTTPException(
status_code=500,
detail=f"Failed to retrieve findings: {str(e)}"
)
@router.get("/{workflow_name}/findings/{run_id}", response_model=WorkflowFindings)
async def get_workflow_findings(
workflow_name: str,
run_id: str,
prefect_mgr=Depends(get_prefect_manager)
) -> WorkflowFindings:
"""
Get findings for a specific workflow run.
Alternative endpoint that includes workflow name in the path for clarity.
Args:
workflow_name: Name of the workflow
run_id: The flow run ID
Returns:
SARIF-formatted findings from the workflow execution
Raises:
HTTPException: 404 if workflow or run not found, 400 if run not completed
"""
if workflow_name not in prefect_mgr.workflows:
raise HTTPException(
status_code=404,
detail=f"Workflow not found: {workflow_name}"
)
# Delegate to the main findings endpoint
return await get_run_findings(run_id, prefect_mgr)

View File

@@ -0,0 +1,386 @@
"""
API endpoints for workflow management with enhanced error handling
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import traceback
from typing import List, Dict, Any, Optional
from fastapi import APIRouter, HTTPException, Depends
from pathlib import Path
from src.models.findings import (
WorkflowSubmission,
WorkflowMetadata,
WorkflowListItem,
RunSubmissionResponse
)
from src.core.workflow_discovery import WorkflowDiscovery
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/workflows", tags=["workflows"])
def create_structured_error_response(
error_type: str,
message: str,
workflow_name: Optional[str] = None,
run_id: Optional[str] = None,
container_info: Optional[Dict[str, Any]] = None,
deployment_info: Optional[Dict[str, Any]] = None,
suggestions: Optional[List[str]] = None
) -> Dict[str, Any]:
"""Create a structured error response with rich context."""
error_response = {
"error": {
"type": error_type,
"message": message,
"timestamp": __import__("datetime").datetime.utcnow().isoformat() + "Z"
}
}
if workflow_name:
error_response["error"]["workflow_name"] = workflow_name
if run_id:
error_response["error"]["run_id"] = run_id
if container_info:
error_response["error"]["container"] = container_info
if deployment_info:
error_response["error"]["deployment"] = deployment_info
if suggestions:
error_response["error"]["suggestions"] = suggestions
return error_response
def get_prefect_manager():
"""Dependency to get the Prefect manager instance"""
from src.main import prefect_mgr
return prefect_mgr
@router.get("/", response_model=List[WorkflowListItem])
async def list_workflows(
prefect_mgr=Depends(get_prefect_manager)
) -> List[WorkflowListItem]:
"""
List all discovered workflows with their metadata.
Returns a summary of each workflow including name, version, description,
author, and tags.
"""
workflows = []
for name, info in prefect_mgr.workflows.items():
workflows.append(WorkflowListItem(
name=name,
version=info.metadata.get("version", "0.6.0"),
description=info.metadata.get("description", ""),
author=info.metadata.get("author"),
tags=info.metadata.get("tags", [])
))
return workflows
@router.get("/metadata/schema")
async def get_metadata_schema() -> Dict[str, Any]:
"""
Get the JSON schema for workflow metadata files.
This schema defines the structure and requirements for metadata.yaml files
that must accompany each workflow.
"""
return WorkflowDiscovery.get_metadata_schema()
@router.get("/{workflow_name}/metadata", response_model=WorkflowMetadata)
async def get_workflow_metadata(
workflow_name: str,
prefect_mgr=Depends(get_prefect_manager)
) -> WorkflowMetadata:
"""
Get complete metadata for a specific workflow.
Args:
workflow_name: Name of the workflow
Returns:
Complete metadata including parameters schema, supported volume modes,
required modules, and more.
Raises:
HTTPException: 404 if workflow not found
"""
if workflow_name not in prefect_mgr.workflows:
available_workflows = list(prefect_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
workflow_name=workflow_name,
suggestions=[
f"Available workflows: {', '.join(available_workflows)}",
"Use GET /workflows/ to see all available workflows",
"Check workflow name spelling and case sensitivity"
]
)
raise HTTPException(
status_code=404,
detail=error_response
)
info = prefect_mgr.workflows[workflow_name]
metadata = info.metadata
return WorkflowMetadata(
name=workflow_name,
version=metadata.get("version", "0.6.0"),
description=metadata.get("description", ""),
author=metadata.get("author"),
tags=metadata.get("tags", []),
parameters=metadata.get("parameters", {}),
default_parameters=metadata.get("default_parameters", {}),
required_modules=metadata.get("required_modules", []),
supported_volume_modes=metadata.get("supported_volume_modes", ["ro", "rw"]),
has_custom_docker=info.has_docker
)
@router.post("/{workflow_name}/submit", response_model=RunSubmissionResponse)
async def submit_workflow(
workflow_name: str,
submission: WorkflowSubmission,
prefect_mgr=Depends(get_prefect_manager)
) -> RunSubmissionResponse:
"""
Submit a workflow for execution with volume mounting.
Args:
workflow_name: Name of the workflow to execute
submission: Submission parameters including target path and volume mode
Returns:
Run submission response with run_id and initial status
Raises:
HTTPException: 404 if workflow not found, 400 for invalid parameters
"""
if workflow_name not in prefect_mgr.workflows:
available_workflows = list(prefect_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
workflow_name=workflow_name,
suggestions=[
f"Available workflows: {', '.join(available_workflows)}",
"Use GET /workflows/ to see all available workflows",
"Check workflow name spelling and case sensitivity"
]
)
raise HTTPException(
status_code=404,
detail=error_response
)
try:
# Convert ResourceLimits to dict if provided
resource_limits_dict = None
if submission.resource_limits:
resource_limits_dict = {
"cpu_limit": submission.resource_limits.cpu_limit,
"memory_limit": submission.resource_limits.memory_limit,
"cpu_request": submission.resource_limits.cpu_request,
"memory_request": submission.resource_limits.memory_request
}
# Submit the workflow with enhanced parameters
flow_run = await prefect_mgr.submit_workflow(
workflow_name=workflow_name,
target_path=submission.target_path,
volume_mode=submission.volume_mode,
parameters=submission.parameters,
resource_limits=resource_limits_dict,
additional_volumes=submission.additional_volumes,
timeout=submission.timeout
)
run_id = str(flow_run.id)
# Initialize fuzzing tracking if this looks like a fuzzing workflow
workflow_info = prefect_mgr.workflows.get(workflow_name, {})
workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
from src.api.fuzzing import initialize_fuzzing_tracking
initialize_fuzzing_tracking(run_id, workflow_name)
return RunSubmissionResponse(
run_id=run_id,
status=flow_run.state.name if flow_run.state else "PENDING",
workflow=workflow_name,
message=f"Workflow '{workflow_name}' submitted successfully"
)
except ValueError as e:
# Parameter validation errors
error_response = create_structured_error_response(
error_type="ValidationError",
message=str(e),
workflow_name=workflow_name,
suggestions=[
"Check parameter types and values",
"Use GET /workflows/{workflow_name}/parameters for schema",
"Ensure all required parameters are provided"
]
)
raise HTTPException(status_code=400, detail=error_response)
except Exception as e:
logger.error(f"Failed to submit workflow '{workflow_name}': {e}")
logger.error(f"Traceback: {traceback.format_exc()}")
# Try to get more context about the error
container_info = None
deployment_info = None
suggestions = []
error_message = str(e)
error_type = "WorkflowSubmissionError"
# Detect specific error patterns
if "deployment" in error_message.lower():
error_type = "DeploymentError"
deployment_info = {
"status": "failed",
"error": error_message
}
suggestions.extend([
"Check if Prefect server is running and accessible",
"Verify Docker is running and has sufficient resources",
"Check container image availability",
"Ensure volume paths exist and are accessible"
])
elif "volume" in error_message.lower() or "mount" in error_message.lower():
error_type = "VolumeError"
suggestions.extend([
"Check if the target path exists and is accessible",
"Verify file permissions (Docker needs read access)",
"Ensure the path is not in use by another process",
"Try using an absolute path instead of relative path"
])
elif "memory" in error_message.lower() or "resource" in error_message.lower():
error_type = "ResourceError"
suggestions.extend([
"Check system memory and CPU availability",
"Consider reducing resource limits or dataset size",
"Monitor Docker resource usage",
"Increase Docker memory limits if needed"
])
elif "image" in error_message.lower():
error_type = "ImageError"
suggestions.extend([
"Check if the workflow image exists",
"Verify Docker registry access",
"Try rebuilding the workflow image",
"Check network connectivity to registries"
])
else:
suggestions.extend([
"Check FuzzForge backend logs for details",
"Verify all services are running (docker-compose up -d)",
"Try restarting the workflow deployment",
"Contact support if the issue persists"
])
error_response = create_structured_error_response(
error_type=error_type,
message=f"Failed to submit workflow: {error_message}",
workflow_name=workflow_name,
container_info=container_info,
deployment_info=deployment_info,
suggestions=suggestions
)
raise HTTPException(
status_code=500,
detail=error_response
)
@router.get("/{workflow_name}/parameters")
async def get_workflow_parameters(
workflow_name: str,
prefect_mgr=Depends(get_prefect_manager)
) -> Dict[str, Any]:
"""
Get the parameters schema for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Parameters schema with types, descriptions, and defaults
Raises:
HTTPException: 404 if workflow not found
"""
if workflow_name not in prefect_mgr.workflows:
available_workflows = list(prefect_mgr.workflows.keys())
error_response = create_structured_error_response(
error_type="WorkflowNotFound",
message=f"Workflow '{workflow_name}' not found",
workflow_name=workflow_name,
suggestions=[
f"Available workflows: {', '.join(available_workflows)}",
"Use GET /workflows/ to see all available workflows"
]
)
raise HTTPException(
status_code=404,
detail=error_response
)
info = prefect_mgr.workflows[workflow_name]
metadata = info.metadata
# Return parameters with enhanced schema information
parameters_schema = metadata.get("parameters", {})
# Extract the actual parameter definitions from JSON schema structure
if "properties" in parameters_schema:
param_definitions = parameters_schema["properties"]
else:
param_definitions = parameters_schema
# Add default values to the schema
default_params = metadata.get("default_parameters", {})
for param_name, param_schema in param_definitions.items():
if isinstance(param_schema, dict) and param_name in default_params:
param_schema["default"] = default_params[param_name]
return {
"workflow": workflow_name,
"parameters": param_definitions,
"default_parameters": default_params,
"required_parameters": [
name for name, schema in param_definitions.items()
if isinstance(schema, dict) and schema.get("required", False)
]
}

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,770 @@
"""
Prefect Manager - Core orchestration for workflow deployment and execution
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import os
import platform
import re
from pathlib import Path
from typing import Dict, Optional, Any
from prefect import get_client
from prefect.docker import DockerImage
from prefect.client.schemas import FlowRun
from src.core.workflow_discovery import WorkflowDiscovery, WorkflowInfo
logger = logging.getLogger(__name__)
def get_registry_url(context: str = "default") -> str:
"""
Get the container registry URL to use for a given operation context.
Goals:
- Work reliably across Linux and macOS Docker Desktop
- Prefer in-network service discovery when running inside containers
- Allow full override via env vars from docker-compose
Env overrides:
- FUZZFORGE_REGISTRY_PUSH_URL: used for image builds/pushes
- FUZZFORGE_REGISTRY_PULL_URL: used for workers to pull images
"""
# Normalize context
ctx = (context or "default").lower()
# Always honor explicit overrides first
if ctx in ("push", "build"):
push_url = os.getenv("FUZZFORGE_REGISTRY_PUSH_URL")
if push_url:
logger.debug("Using FUZZFORGE_REGISTRY_PUSH_URL: %s", push_url)
return push_url
# Default to host-published registry for Docker daemon operations
return "localhost:5001"
if ctx == "pull":
pull_url = os.getenv("FUZZFORGE_REGISTRY_PULL_URL")
if pull_url:
logger.debug("Using FUZZFORGE_REGISTRY_PULL_URL: %s", pull_url)
return pull_url
# Prefect worker pulls via host Docker daemon as well
return "localhost:5001"
# Default/fallback
return os.getenv("FUZZFORGE_REGISTRY_PULL_URL", os.getenv("FUZZFORGE_REGISTRY_PUSH_URL", "localhost:5001"))
def _compose_project_name(default: str = "fuzzforge") -> str:
"""Return the docker-compose project name used for network/volume naming.
Always returns 'fuzzforge' regardless of environment variables.
"""
return "fuzzforge"
class PrefectManager:
"""
Manages Prefect deployments and flow runs for discovered workflows.
This class handles:
- Workflow discovery and registration
- Docker image building through Prefect
- Deployment creation and management
- Flow run submission with volume mounting
- Findings retrieval from completed runs
"""
def __init__(self, workflows_dir: Path = None):
"""
Initialize the Prefect manager.
Args:
workflows_dir: Path to the workflows directory (default: toolbox/workflows)
"""
if workflows_dir is None:
workflows_dir = Path("toolbox/workflows")
self.discovery = WorkflowDiscovery(workflows_dir)
self.workflows: Dict[str, WorkflowInfo] = {}
self.deployments: Dict[str, str] = {} # workflow_name -> deployment_id
# Security: Define allowed and forbidden paths for host mounting
self.allowed_base_paths = [
"/tmp",
"/home",
"/Users", # macOS users
"/opt",
"/var/tmp",
"/workspace", # Common container workspace
"/app" # Container application directory (for test projects)
]
self.forbidden_paths = [
"/etc",
"/root",
"/var/run",
"/sys",
"/proc",
"/dev",
"/boot",
"/var/lib/docker", # Critical Docker data
"/var/log", # System logs
"/usr/bin", # System binaries
"/usr/sbin",
"/sbin",
"/bin"
]
@staticmethod
def _parse_memory_to_bytes(memory_str: str) -> int:
"""
Parse memory string (like '512Mi', '1Gi') to bytes.
Args:
memory_str: Memory string with unit suffix
Returns:
Memory in bytes
Raises:
ValueError: If format is invalid
"""
if not memory_str:
return 0
match = re.match(r'^(\d+(?:\.\d+)?)\s*([GMK]i?)$', memory_str.strip())
if not match:
raise ValueError(f"Invalid memory format: {memory_str}. Expected format like '512Mi', '1Gi'")
value, unit = match.groups()
value = float(value)
# Convert to bytes based on unit (binary units: Ki, Mi, Gi)
if unit in ['K', 'Ki']:
multiplier = 1024
elif unit in ['M', 'Mi']:
multiplier = 1024 * 1024
elif unit in ['G', 'Gi']:
multiplier = 1024 * 1024 * 1024
else:
raise ValueError(f"Unsupported memory unit: {unit}")
return int(value * multiplier)
@staticmethod
def _parse_cpu_to_millicores(cpu_str: str) -> int:
"""
Parse CPU string (like '500m', '1', '2.5') to millicores.
Args:
cpu_str: CPU string
Returns:
CPU in millicores (1 core = 1000 millicores)
Raises:
ValueError: If format is invalid
"""
if not cpu_str:
return 0
cpu_str = cpu_str.strip()
# Handle millicores format (e.g., '500m')
if cpu_str.endswith('m'):
try:
return int(cpu_str[:-1])
except ValueError:
raise ValueError(f"Invalid CPU format: {cpu_str}")
# Handle core format (e.g., '1', '2.5')
try:
cores = float(cpu_str)
return int(cores * 1000) # Convert to millicores
except ValueError:
raise ValueError(f"Invalid CPU format: {cpu_str}")
def _extract_resource_requirements(self, workflow_info: WorkflowInfo) -> Dict[str, str]:
"""
Extract resource requirements from workflow metadata.
Args:
workflow_info: Workflow information with metadata
Returns:
Dictionary with resource requirements in Docker format
"""
metadata = workflow_info.metadata
requirements = metadata.get("requirements", {})
resources = requirements.get("resources", {})
resource_config = {}
# Extract memory requirement
memory = resources.get("memory")
if memory:
try:
# Validate memory format and store original string for Docker
self._parse_memory_to_bytes(memory)
resource_config["memory"] = memory
except ValueError as e:
logger.warning(f"Invalid memory requirement in {workflow_info.name}: {e}")
# Extract CPU requirement
cpu = resources.get("cpu")
if cpu:
try:
# Validate CPU format and store original string for Docker
self._parse_cpu_to_millicores(cpu)
resource_config["cpus"] = cpu
except ValueError as e:
logger.warning(f"Invalid CPU requirement in {workflow_info.name}: {e}")
# Extract timeout
timeout = resources.get("timeout")
if timeout and isinstance(timeout, int):
resource_config["timeout"] = str(timeout)
return resource_config
async def initialize(self):
"""
Initialize the manager by discovering and deploying all workflows.
This method:
1. Discovers all valid workflows in the workflows directory
2. Validates their metadata
3. Deploys each workflow to Prefect with Docker images
"""
try:
# Discover workflows
self.workflows = await self.discovery.discover_workflows()
if not self.workflows:
logger.warning("No workflows discovered")
return
logger.info(f"Discovered {len(self.workflows)} workflows: {list(self.workflows.keys())}")
# Deploy each workflow
for name, info in self.workflows.items():
try:
await self._deploy_workflow(name, info)
except Exception as e:
logger.error(f"Failed to deploy workflow '{name}': {e}")
except Exception as e:
logger.error(f"Failed to initialize Prefect manager: {e}")
raise
async def _deploy_workflow(self, name: str, info: WorkflowInfo):
"""
Deploy a single workflow to Prefect with Docker image.
Args:
name: Workflow name
info: Workflow information including metadata and paths
"""
logger.info(f"Deploying workflow '{name}'...")
# Get the flow function from registry
flow_func = self.discovery.get_flow_function(name)
if not flow_func:
logger.error(
f"Failed to get flow function for '{name}' from registry. "
f"Ensure the workflow is properly registered in toolbox/workflows/registry.py"
)
return
# Use the mandatory Dockerfile with absolute paths for Docker Compose
# Get absolute paths for build context and dockerfile
toolbox_path = info.path.parent.parent.resolve()
dockerfile_abs_path = info.dockerfile.resolve()
# Calculate relative dockerfile path from toolbox context
try:
dockerfile_rel_path = dockerfile_abs_path.relative_to(toolbox_path)
except ValueError:
# If relative path fails, use the workflow-specific path
dockerfile_rel_path = Path("workflows") / name / "Dockerfile"
# Determine deployment strategy based on Dockerfile presence
base_image = "prefecthq/prefect:3-python3.11"
has_custom_dockerfile = info.has_docker and info.dockerfile.exists()
logger.info(f"=== DEPLOYMENT DEBUG for '{name}' ===")
logger.info(f"info.has_docker: {info.has_docker}")
logger.info(f"info.dockerfile: {info.dockerfile}")
logger.info(f"info.dockerfile.exists(): {info.dockerfile.exists()}")
logger.info(f"has_custom_dockerfile: {has_custom_dockerfile}")
logger.info(f"toolbox_path: {toolbox_path}")
logger.info(f"dockerfile_rel_path: {dockerfile_rel_path}")
if has_custom_dockerfile:
logger.info(f"Workflow '{name}' has custom Dockerfile - building custom image")
# Decide whether to use registry or keep images local to host engine
import os
# Default to using the local registry; set FUZZFORGE_USE_REGISTRY=false to bypass (not recommended)
use_registry = os.getenv("FUZZFORGE_USE_REGISTRY", "true").lower() == "true"
if use_registry:
registry_url = get_registry_url(context="push")
image_spec = DockerImage(
name=f"{registry_url}/fuzzforge/{name}",
tag="latest",
dockerfile=str(dockerfile_rel_path),
context=str(toolbox_path)
)
deploy_image = f"{registry_url}/fuzzforge/{name}:latest"
build_custom = True
push_custom = True
logger.info(f"Using registry: {registry_url} for '{name}'")
else:
# Single-host mode: build into host engine cache; no push required
image_spec = DockerImage(
name=f"fuzzforge/{name}",
tag="latest",
dockerfile=str(dockerfile_rel_path),
context=str(toolbox_path)
)
deploy_image = f"fuzzforge/{name}:latest"
build_custom = True
push_custom = False
logger.info("Using single-host image (no registry push): %s", deploy_image)
else:
logger.info(f"Workflow '{name}' using base image - no custom dependencies needed")
deploy_image = base_image
build_custom = False
push_custom = False
# Pre-validate registry connectivity when pushing
if push_custom:
try:
from .setup import validate_registry_connectivity
await validate_registry_connectivity(registry_url)
logger.info(f"Registry connectivity validated for {registry_url}")
except Exception as e:
logger.error(f"Registry connectivity validation failed for {registry_url}: {e}")
raise RuntimeError(f"Cannot deploy workflow '{name}': Registry {registry_url} is not accessible. {e}")
# Deploy the workflow
try:
# Ensure any previous deployment is removed so job variables are updated
try:
async with get_client() as client:
existing = await client.read_deployment_by_name(
f"{name}/{name}-deployment"
)
if existing:
logger.info(f"Removing existing deployment for '{name}' to refresh settings...")
await client.delete_deployment(existing.id)
except Exception:
# If not found or deletion fails, continue with deployment
pass
# Extract resource requirements from metadata
workflow_resource_requirements = self._extract_resource_requirements(info)
logger.info(f"Workflow '{name}' resource requirements: {workflow_resource_requirements}")
# Build job variables with resource requirements
job_variables = {
"image": deploy_image, # Use the worker-accessible registry name
"volumes": [], # Populated at run submission with toolbox mount
"env": {
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect",
"WORKFLOW_NAME": name
}
}
# Add resource requirements to job variables if present
if workflow_resource_requirements:
job_variables["resources"] = workflow_resource_requirements
# Prepare deployment parameters
deploy_params = {
"name": f"{name}-deployment",
"work_pool_name": "docker-pool",
"image": image_spec if has_custom_dockerfile else deploy_image,
"push": push_custom,
"build": build_custom,
"job_variables": job_variables
}
deployment = await flow_func.deploy(**deploy_params)
self.deployments[name] = str(deployment.id) if hasattr(deployment, 'id') else name
logger.info(f"Successfully deployed workflow '{name}'")
except Exception as e:
# Enhanced error reporting with more context
import traceback
logger.error(f"Failed to deploy workflow '{name}': {e}")
logger.error(f"Deployment traceback: {traceback.format_exc()}")
# Try to capture Docker-specific context
error_context = {
"workflow_name": name,
"has_dockerfile": has_custom_dockerfile,
"image_name": deploy_image if 'deploy_image' in locals() else "unknown",
"registry_url": registry_url if 'registry_url' in locals() else "unknown",
"error_type": type(e).__name__,
"error_message": str(e)
}
# Check for specific error patterns with detailed categorization
error_msg_lower = str(e).lower()
if "registry" in error_msg_lower and ("no such host" in error_msg_lower or "connection" in error_msg_lower):
error_context["category"] = "registry_connectivity_error"
error_context["solution"] = f"Cannot reach registry at {error_context['registry_url']}. Check Docker network and registry service."
elif "docker" in error_msg_lower:
error_context["category"] = "docker_error"
if "build" in error_msg_lower:
error_context["subcategory"] = "image_build_failed"
error_context["solution"] = "Check Dockerfile syntax and dependencies."
elif "pull" in error_msg_lower:
error_context["subcategory"] = "image_pull_failed"
error_context["solution"] = "Check if image exists in registry and network connectivity."
elif "push" in error_msg_lower:
error_context["subcategory"] = "image_push_failed"
error_context["solution"] = f"Check registry connectivity and push permissions to {error_context['registry_url']}."
elif "registry" in error_msg_lower:
error_context["category"] = "registry_error"
error_context["solution"] = "Check registry configuration and accessibility."
elif "prefect" in error_msg_lower:
error_context["category"] = "prefect_error"
error_context["solution"] = "Check Prefect server connectivity and deployment configuration."
else:
error_context["category"] = "unknown_deployment_error"
error_context["solution"] = "Check logs for more specific error details."
logger.error(f"Deployment error context: {error_context}")
# Raise enhanced exception with context
enhanced_error = Exception(f"Deployment failed for workflow '{name}': {str(e)} | Context: {error_context}")
enhanced_error.original_error = e
enhanced_error.context = error_context
raise enhanced_error
async def submit_workflow(
self,
workflow_name: str,
target_path: str,
volume_mode: str = "ro",
parameters: Dict[str, Any] = None,
resource_limits: Dict[str, str] = None,
additional_volumes: list = None,
timeout: int = None
) -> FlowRun:
"""
Submit a workflow for execution with volume mounting.
Args:
workflow_name: Name of the workflow to execute
target_path: Host path to mount as volume
volume_mode: Volume mount mode ("ro" for read-only, "rw" for read-write)
parameters: Workflow-specific parameters
resource_limits: CPU/memory limits for container
additional_volumes: List of additional volume mounts
timeout: Timeout in seconds
Returns:
FlowRun object with run information
Raises:
ValueError: If workflow not found or volume mode not supported
"""
if workflow_name not in self.workflows:
raise ValueError(f"Unknown workflow: {workflow_name}")
# Validate volume mode
workflow_info = self.workflows[workflow_name]
supported_modes = workflow_info.metadata.get("supported_volume_modes", ["ro", "rw"])
if volume_mode not in supported_modes:
raise ValueError(
f"Workflow '{workflow_name}' doesn't support volume mode '{volume_mode}'. "
f"Supported modes: {supported_modes}"
)
# Validate target path with security checks
self._validate_target_path(target_path)
# Validate additional volumes if provided
if additional_volumes:
for volume in additional_volumes:
self._validate_target_path(volume.host_path)
async with get_client() as client:
# Get the deployment, auto-redeploy once if missing
try:
deployment = await client.read_deployment_by_name(
f"{workflow_name}/{workflow_name}-deployment"
)
except Exception as e:
import traceback
logger.error(f"Failed to find deployment for workflow '{workflow_name}': {e}")
logger.error(f"Deployment lookup traceback: {traceback.format_exc()}")
# Attempt a one-time auto-deploy to recover from startup races
try:
logger.info(f"Auto-deploying missing workflow '{workflow_name}' and retrying...")
await self._deploy_workflow(workflow_name, workflow_info)
deployment = await client.read_deployment_by_name(
f"{workflow_name}/{workflow_name}-deployment"
)
except Exception as redeploy_exc:
# Enhanced error with context
error_context = {
"workflow_name": workflow_name,
"error_type": type(e).__name__,
"error_message": str(e),
"redeploy_error": str(redeploy_exc),
"available_deployments": list(self.deployments.keys()),
}
enhanced_error = ValueError(
f"Deployment not found and redeploy failed for workflow '{workflow_name}': {e} | Context: {error_context}"
)
enhanced_error.context = error_context
raise enhanced_error
# Determine the Docker Compose network name and volume names
# Hardcoded to 'fuzzforge' to avoid directory name dependencies
import os
compose_project = "fuzzforge"
docker_network = "fuzzforge_default"
# Build volume mounts
# Add toolbox volume mount for workflow code access
backend_toolbox_path = "/app/toolbox" # Path in backend container
# Hardcoded volume names
prefect_storage_volume = "fuzzforge_prefect_storage"
toolbox_code_volume = "fuzzforge_toolbox_code"
volumes = [
f"{target_path}:/workspace:{volume_mode}",
f"{prefect_storage_volume}:/prefect-storage", # Shared storage for results
f"{toolbox_code_volume}:/opt/prefect/toolbox:ro" # Mount workflow code
]
# Add additional volumes if provided
if additional_volumes:
for volume in additional_volumes:
volume_spec = f"{volume.host_path}:{volume.container_path}:{volume.mode}"
volumes.append(volume_spec)
# Build environment variables
env_vars = {
"PREFECT_API_URL": "http://prefect-server:4200/api", # Use internal network hostname
"PREFECT_LOGGING_LEVEL": "INFO",
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage", # Use shared storage
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true", # Enable result persistence
"PREFECT_DEFAULT_RESULT_STORAGE_BLOCK": "local-file-system/fuzzforge-results", # Use our storage block
"WORKSPACE_PATH": "/workspace",
"VOLUME_MODE": volume_mode,
"WORKFLOW_NAME": workflow_name
}
# Add additional volume paths to environment for easy access
if additional_volumes:
for i, volume in enumerate(additional_volumes):
env_vars[f"ADDITIONAL_VOLUME_{i}_PATH"] = volume.container_path
# Determine which image to use based on workflow configuration
workflow_info = self.workflows[workflow_name]
has_custom_dockerfile = workflow_info.has_docker and workflow_info.dockerfile.exists()
# Use pull context for worker to pull from registry
registry_url = get_registry_url(context="pull")
workflow_image = f"{registry_url}/fuzzforge/{workflow_name}:latest" if has_custom_dockerfile else "prefecthq/prefect:3-python3.11"
logger.debug(f"Worker will pull image: {workflow_image} (Registry: {registry_url})")
# Configure job variables with volume mounting and network access
job_variables = {
# Use custom image if available, otherwise base Prefect image
"image": workflow_image,
"volumes": volumes,
"networks": [docker_network], # Connect to Docker Compose network
"env": {
**env_vars,
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect/toolbox/workflows",
"WORKFLOW_NAME": workflow_name
}
}
# Apply resource requirements from workflow metadata and user overrides
workflow_resource_requirements = self._extract_resource_requirements(workflow_info)
final_resource_config = {}
# Start with workflow requirements as base
if workflow_resource_requirements:
final_resource_config.update(workflow_resource_requirements)
# Apply user-provided resource limits (overrides workflow defaults)
if resource_limits:
user_resource_config = {}
if resource_limits.get("cpu_limit"):
user_resource_config["cpus"] = resource_limits["cpu_limit"]
if resource_limits.get("memory_limit"):
user_resource_config["memory"] = resource_limits["memory_limit"]
# Note: cpu_request and memory_request are not directly supported by Docker
# but could be used for Kubernetes in the future
# User overrides take precedence
final_resource_config.update(user_resource_config)
# Apply final resource configuration
if final_resource_config:
job_variables["resources"] = final_resource_config
logger.info(f"Applied resource limits: {final_resource_config}")
# Merge parameters with defaults from metadata
default_params = workflow_info.metadata.get("default_parameters", {})
final_params = {**default_params, **(parameters or {})}
# Set flow parameters that match the flow signature
final_params["target_path"] = "/workspace" # Container path where volume is mounted
final_params["volume_mode"] = volume_mode
# Create and submit the flow run
# Pass job_variables to ensure network, volumes, and environment are configured
logger.info(f"Submitting flow with job_variables: {job_variables}")
logger.info(f"Submitting flow with parameters: {final_params}")
# Prepare flow run creation parameters
flow_run_params = {
"deployment_id": deployment.id,
"parameters": final_params,
"job_variables": job_variables
}
# Note: Timeout is handled through workflow-level configuration
# Additional timeout configuration can be added to deployment metadata if needed
flow_run = await client.create_flow_run_from_deployment(**flow_run_params)
logger.info(
f"Submitted workflow '{workflow_name}' with run_id: {flow_run.id}, "
f"target: {target_path}, mode: {volume_mode}"
)
return flow_run
async def get_flow_run_findings(self, run_id: str) -> Dict[str, Any]:
"""
Retrieve findings from a completed flow run.
Args:
run_id: The flow run ID
Returns:
Dictionary containing SARIF-formatted findings
Raises:
ValueError: If run not completed or not found
"""
async with get_client() as client:
flow_run = await client.read_flow_run(run_id)
if not flow_run.state.is_completed():
raise ValueError(
f"Flow run {run_id} not completed. Current status: {flow_run.state.name}"
)
# Get the findings from the flow run result
try:
findings = await flow_run.state.result()
return findings
except Exception as e:
logger.error(f"Failed to retrieve findings for run {run_id}: {e}")
raise ValueError(f"Failed to retrieve findings: {e}")
async def get_flow_run_status(self, run_id: str) -> Dict[str, Any]:
"""
Get the current status of a flow run.
Args:
run_id: The flow run ID
Returns:
Dictionary with status information
"""
async with get_client() as client:
flow_run = await client.read_flow_run(run_id)
return {
"run_id": str(flow_run.id),
"workflow": flow_run.deployment_id,
"status": flow_run.state.name,
"is_completed": flow_run.state.is_completed(),
"is_failed": flow_run.state.is_failed(),
"is_running": flow_run.state.is_running(),
"created_at": flow_run.created,
"updated_at": flow_run.updated
}
def _validate_target_path(self, target_path: str) -> None:
"""
Validate target path for security before mounting as volume.
Args:
target_path: Host path to validate
Raises:
ValueError: If path is not allowed for security reasons
"""
target = Path(target_path)
# Path must be absolute
if not target.is_absolute():
raise ValueError(f"Target path must be absolute: {target_path}")
# Resolve path to handle symlinks and relative components
try:
resolved_path = target.resolve()
except (OSError, RuntimeError) as e:
raise ValueError(f"Cannot resolve target path: {target_path} - {e}")
resolved_str = str(resolved_path)
# Check against forbidden paths first (more restrictive)
for forbidden in self.forbidden_paths:
if resolved_str.startswith(forbidden):
raise ValueError(
f"Access denied: Path '{target_path}' resolves to forbidden directory '{forbidden}'. "
f"This path contains sensitive system files and cannot be mounted."
)
# Check if path starts with any allowed base path
path_allowed = False
for allowed in self.allowed_base_paths:
if resolved_str.startswith(allowed):
path_allowed = True
break
if not path_allowed:
allowed_list = ", ".join(self.allowed_base_paths)
raise ValueError(
f"Access denied: Path '{target_path}' is not in allowed directories. "
f"Allowed base paths: {allowed_list}"
)
# Additional security checks
if resolved_str == "/":
raise ValueError("Cannot mount root filesystem")
# Warn if path doesn't exist (but don't block - it might be created later)
if not resolved_path.exists():
logger.warning(f"Target path does not exist: {target_path}")
logger.info(f"Path validation passed for: {target_path} -> {resolved_str}")

402
backend/src/core/setup.py Normal file
View File

@@ -0,0 +1,402 @@
"""
Setup utilities for Prefect infrastructure
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from prefect import get_client
from prefect.client.schemas.actions import WorkPoolCreate
from prefect.client.schemas.objects import WorkPool
from .prefect_manager import get_registry_url
logger = logging.getLogger(__name__)
async def setup_docker_pool():
"""
Create or update the Docker work pool for container execution.
This work pool is configured to:
- Connect to the local Docker daemon
- Support volume mounting at runtime
- Clean up containers after execution
- Use bridge networking by default
"""
import os
async with get_client() as client:
pool_name = "docker-pool"
# Add force recreation flag for debugging fresh install issues
force_recreate = os.getenv('FORCE_RECREATE_WORK_POOL', 'false').lower() == 'true'
debug_setup = os.getenv('DEBUG_WORK_POOL_SETUP', 'false').lower() == 'true'
if force_recreate:
logger.warning(f"FORCE_RECREATE_WORK_POOL=true - Will recreate work pool regardless of existing configuration")
if debug_setup:
logger.warning(f"DEBUG_WORK_POOL_SETUP=true - Enhanced logging enabled")
# Temporarily set logging level to DEBUG for this function
original_level = logger.level
logger.setLevel(logging.DEBUG)
try:
# Check if pool already exists and supports custom images
existing_pools = await client.read_work_pools()
existing_pool = None
for pool in existing_pools:
if pool.name == pool_name:
existing_pool = pool
break
if existing_pool and not force_recreate:
logger.info(f"Found existing work pool '{pool_name}' - validating configuration...")
# Check if the existing pool has the correct configuration
base_template = existing_pool.base_job_template or {}
logger.debug(f"Base template keys: {list(base_template.keys())}")
job_config = base_template.get("job_configuration", {})
logger.debug(f"Job config keys: {list(job_config.keys())}")
image_config = job_config.get("image", "")
has_image_variable = "{{ image }}" in str(image_config)
logger.debug(f"Image config: '{image_config}' -> has_image_variable: {has_image_variable}")
# Check if volume defaults include toolbox mount
variables = base_template.get("variables", {})
properties = variables.get("properties", {})
volume_config = properties.get("volumes", {})
volume_defaults = volume_config.get("default", [])
has_toolbox_volume = any("toolbox_code" in str(vol) for vol in volume_defaults) if volume_defaults else False
logger.debug(f"Volume defaults: {volume_defaults}")
logger.debug(f"Has toolbox volume: {has_toolbox_volume}")
# Check if environment defaults include required settings
env_config = properties.get("env", {})
env_defaults = env_config.get("default", {})
has_api_url = "PREFECT_API_URL" in env_defaults
has_storage_path = "PREFECT_LOCAL_STORAGE_PATH" in env_defaults
has_results_persist = "PREFECT_RESULTS_PERSIST_BY_DEFAULT" in env_defaults
has_required_env = has_api_url and has_storage_path and has_results_persist
logger.debug(f"Environment defaults: {env_defaults}")
logger.debug(f"Has API URL: {has_api_url}, Has storage path: {has_storage_path}, Has results persist: {has_results_persist}")
logger.debug(f"Has required env: {has_required_env}")
# Log the full validation result
logger.info(f"Work pool validation - Image: {has_image_variable}, Toolbox: {has_toolbox_volume}, Environment: {has_required_env}")
if has_image_variable and has_toolbox_volume and has_required_env:
logger.info(f"Docker work pool '{pool_name}' already exists with correct configuration")
return
else:
reasons = []
if not has_image_variable:
reasons.append("missing image template")
if not has_toolbox_volume:
reasons.append("missing toolbox volume mount")
if not has_required_env:
if not has_api_url:
reasons.append("missing PREFECT_API_URL")
if not has_storage_path:
reasons.append("missing PREFECT_LOCAL_STORAGE_PATH")
if not has_results_persist:
reasons.append("missing PREFECT_RESULTS_PERSIST_BY_DEFAULT")
logger.warning(f"Docker work pool '{pool_name}' exists but lacks: {', '.join(reasons)}. Recreating...")
# Delete the old pool and recreate it
try:
await client.delete_work_pool(pool_name)
logger.info(f"Deleted old work pool '{pool_name}'")
except Exception as e:
logger.warning(f"Failed to delete old work pool: {e}")
elif force_recreate and existing_pool:
logger.warning(f"Force recreation enabled - deleting existing work pool '{pool_name}'")
try:
await client.delete_work_pool(pool_name)
logger.info(f"Deleted existing work pool for force recreation")
except Exception as e:
logger.warning(f"Failed to delete work pool for force recreation: {e}")
logger.info(f"Creating Docker work pool '{pool_name}' with custom image support...")
# Create the work pool with proper Docker configuration
work_pool = WorkPoolCreate(
name=pool_name,
type="docker",
description="Docker work pool for FuzzForge workflows with custom image support",
base_job_template={
"job_configuration": {
"image": "{{ image }}", # Template variable for custom images
"volumes": "{{ volumes }}", # List of volume mounts
"env": "{{ env }}", # Environment variables
"networks": "{{ networks }}", # Docker networks
"stream_output": True,
"auto_remove": True,
"privileged": False,
"network_mode": None, # Use networks instead
"labels": {},
"command": None # Let the image's CMD/ENTRYPOINT run
},
"variables": {
"type": "object",
"properties": {
"image": {
"type": "string",
"title": "Docker Image",
"default": "prefecthq/prefect:3-python3.11",
"description": "Docker image for the flow run"
},
"volumes": {
"type": "array",
"title": "Volume Mounts",
"default": [
"fuzzforge_prefect_storage:/prefect-storage",
"fuzzforge_toolbox_code:/opt/prefect/toolbox:ro"
],
"description": "Volume mounts in format 'host:container:mode'",
"items": {
"type": "string"
}
},
"networks": {
"type": "array",
"title": "Docker Networks",
"default": ["fuzzforge_default"],
"description": "Docker networks to connect container to",
"items": {
"type": "string"
}
},
"env": {
"type": "object",
"title": "Environment Variables",
"default": {
"PREFECT_API_URL": "http://prefect-server:4200/api",
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage",
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true"
},
"description": "Environment variables for the container",
"additionalProperties": {
"type": "string"
}
}
}
}
}
)
await client.create_work_pool(work_pool)
logger.info(f"Created Docker work pool '{pool_name}'")
except Exception as e:
logger.error(f"Failed to setup Docker work pool: {e}")
raise
finally:
# Restore original logging level if debug mode was enabled
if debug_setup and 'original_level' in locals():
logger.setLevel(original_level)
def get_actual_compose_project_name():
"""
Return the hardcoded compose project name for FuzzForge.
Always returns 'fuzzforge' as per system requirements.
"""
logger.info("Using hardcoded compose project name: fuzzforge")
return "fuzzforge"
async def setup_result_storage():
"""
Create or update Prefect result storage block for findings persistence.
This sets up a LocalFileSystem storage block pointing to the shared
/prefect-storage volume for result persistence.
"""
from prefect.filesystems import LocalFileSystem
storage_name = "fuzzforge-results"
try:
# Create the storage block, overwrite if it exists
logger.info(f"Setting up storage block '{storage_name}'...")
storage = LocalFileSystem(basepath="/prefect-storage")
block_doc_id = await storage.save(name=storage_name, overwrite=True)
logger.info(f"Storage block '{storage_name}' configured successfully")
return str(block_doc_id)
except Exception as e:
logger.error(f"Failed to setup result storage: {e}")
# Don't raise the exception - continue without storage block
logger.warning("Continuing without result storage block - findings may not persist")
return None
async def validate_docker_connection():
"""
Validate that Docker is accessible and running.
Note: In containerized deployments with Docker socket proxy,
the backend doesn't need direct Docker access.
Raises:
RuntimeError: If Docker is not accessible
"""
import os
# Skip Docker validation if running in container without socket access
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
logger.info("Running in container without Docker socket - skipping Docker validation")
return
try:
import docker
client = docker.from_env()
client.ping()
logger.info("Docker connection validated")
except Exception as e:
logger.error(f"Docker is not accessible: {e}")
raise RuntimeError(
"Docker is not running or not accessible. "
"Please ensure Docker is installed and running."
)
async def validate_registry_connectivity(registry_url: str = None):
"""
Validate that the Docker registry is accessible.
Args:
registry_url: URL of the Docker registry to validate (auto-detected if None)
Raises:
RuntimeError: If registry is not accessible
"""
# Resolve a reachable test URL from within this process
if registry_url is None:
# If not specified, prefer internal service name in containers, host port on host
import os
if os.path.exists('/.dockerenv'):
registry_url = "registry:5000"
else:
registry_url = "localhost:5001"
# If we're running inside a container and asked to probe localhost:PORT,
# the probe would hit the container, not the host. Use host.docker.internal instead.
import os
try:
host_part, port_part = registry_url.split(":", 1)
except ValueError:
host_part, port_part = registry_url, "80"
if os.path.exists('/.dockerenv') and host_part in ("localhost", "127.0.0.1"):
test_host = "host.docker.internal"
else:
test_host = host_part
test_url = f"http://{test_host}:{port_part}/v2/"
import aiohttp
import asyncio
logger.info(f"Validating registry connectivity to {registry_url}...")
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
async with session.get(test_url) as response:
if response.status == 200:
logger.info(f"Registry at {registry_url} is accessible (tested via {test_host})")
return
else:
raise RuntimeError(f"Registry returned status {response.status}")
except asyncio.TimeoutError:
raise RuntimeError(f"Registry at {registry_url} is not responding (timeout)")
except aiohttp.ClientError as e:
raise RuntimeError(f"Registry at {registry_url} is not accessible: {e}")
except Exception as e:
raise RuntimeError(f"Failed to validate registry connectivity: {e}")
async def validate_docker_network(network_name: str):
"""
Validate that the specified Docker network exists.
Args:
network_name: Name of the Docker network to validate
Raises:
RuntimeError: If network doesn't exist
"""
import os
# Skip network validation if running in container without Docker socket
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
logger.info("Running in container without Docker socket - skipping network validation")
return
try:
import docker
client = docker.from_env()
# List all networks
networks = client.networks.list(names=[network_name])
if not networks:
# Try to find networks with similar names
all_networks = client.networks.list()
similar_networks = [n.name for n in all_networks if "fuzzforge" in n.name.lower()]
error_msg = f"Docker network '{network_name}' not found."
if similar_networks:
error_msg += f" Available networks: {similar_networks}"
else:
error_msg += " Please ensure Docker Compose is running."
raise RuntimeError(error_msg)
logger.info(f"Docker network '{network_name}' validated")
except Exception as e:
if isinstance(e, RuntimeError):
raise
logger.error(f"Network validation failed: {e}")
raise RuntimeError(f"Failed to validate Docker network: {e}")
async def validate_infrastructure():
"""
Validate all required infrastructure components.
This should be called during startup to ensure everything is ready.
"""
logger.info("Validating infrastructure...")
# Validate Docker connection
await validate_docker_connection()
# Validate registry connectivity for custom image building
await validate_registry_connectivity()
# Validate network (hardcoded to avoid directory name dependencies)
import os
compose_project = "fuzzforge"
docker_network = "fuzzforge_default"
try:
await validate_docker_network(docker_network)
except RuntimeError as e:
logger.warning(f"Network validation failed: {e}")
logger.warning("Workflows may not be able to connect to Prefect services")
logger.info("Infrastructure validation completed")

View File

@@ -0,0 +1,459 @@
"""
Workflow Discovery - Registry-based discovery and loading of workflows
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import yaml
from pathlib import Path
from typing import Dict, Optional, Any, Callable
from pydantic import BaseModel, Field, ConfigDict
logger = logging.getLogger(__name__)
class WorkflowInfo(BaseModel):
"""Information about a discovered workflow"""
name: str = Field(..., description="Workflow name")
path: Path = Field(..., description="Path to workflow directory")
workflow_file: Path = Field(..., description="Path to workflow.py file")
dockerfile: Path = Field(..., description="Path to Dockerfile")
has_docker: bool = Field(..., description="Whether workflow has custom Dockerfile")
metadata: Dict[str, Any] = Field(..., description="Workflow metadata from YAML")
flow_function_name: str = Field(default="main_flow", description="Name of the flow function")
model_config = ConfigDict(arbitrary_types_allowed=True)
class WorkflowDiscovery:
"""
Discovers workflows from the filesystem and validates them against the registry.
This system:
1. Scans for workflows with metadata.yaml files
2. Cross-references them with the manual registry
3. Provides registry-based flow functions for deployment
Workflows must have:
- workflow.py: Contains the Prefect flow
- metadata.yaml: Mandatory metadata file
- Entry in toolbox/workflows/registry.py: Manual registration
- Dockerfile (optional): Custom container definition
- requirements.txt (optional): Python dependencies
"""
def __init__(self, workflows_dir: Path):
"""
Initialize workflow discovery.
Args:
workflows_dir: Path to the workflows directory
"""
self.workflows_dir = workflows_dir
if not self.workflows_dir.exists():
self.workflows_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Created workflows directory: {self.workflows_dir}")
# Import registry - this validates it on import
try:
from toolbox.workflows.registry import WORKFLOW_REGISTRY, list_registered_workflows
self.registry = WORKFLOW_REGISTRY
logger.info(f"Loaded workflow registry with {len(self.registry)} registered workflows")
except ImportError as e:
logger.error(f"Failed to import workflow registry: {e}")
self.registry = {}
except Exception as e:
logger.error(f"Registry validation failed: {e}")
self.registry = {}
# Cache for discovered workflows
self._workflow_cache: Optional[Dict[str, WorkflowInfo]] = None
self._cache_timestamp: Optional[float] = None
self._cache_ttl = 60.0 # Cache TTL in seconds
async def discover_workflows(self) -> Dict[str, WorkflowInfo]:
"""
Discover workflows by cross-referencing filesystem with registry.
Uses caching to avoid frequent filesystem scans.
Returns:
Dictionary mapping workflow names to their information
"""
# Check cache validity
import time
current_time = time.time()
if (self._workflow_cache is not None and
self._cache_timestamp is not None and
(current_time - self._cache_timestamp) < self._cache_ttl):
# Return cached results
logger.debug(f"Returning cached workflow discovery ({len(self._workflow_cache)} workflows)")
return self._workflow_cache
workflows = {}
discovered_dirs = set()
registry_names = set(self.registry.keys())
if not self.workflows_dir.exists():
logger.warning(f"Workflows directory does not exist: {self.workflows_dir}")
return workflows
# Recursively scan all directories and subdirectories
await self._scan_directory_recursive(self.workflows_dir, workflows, discovered_dirs)
# Check for registry entries without corresponding directories
missing_dirs = registry_names - discovered_dirs
if missing_dirs:
logger.warning(
f"Registry contains workflows without filesystem directories: {missing_dirs}. "
f"These workflows cannot be deployed."
)
logger.info(
f"Discovery complete: {len(workflows)} workflows ready for deployment, "
f"{len(missing_dirs)} registry entries missing directories, "
f"{len(discovered_dirs - registry_names)} filesystem workflows not registered"
)
# Update cache
self._workflow_cache = workflows
self._cache_timestamp = current_time
return workflows
async def _scan_directory_recursive(self, directory: Path, workflows: Dict[str, WorkflowInfo], discovered_dirs: set):
"""
Recursively scan directory for workflows.
Args:
directory: Directory to scan
workflows: Dictionary to populate with discovered workflows
discovered_dirs: Set to track discovered workflow names
"""
for item in directory.iterdir():
if not item.is_dir():
continue
if item.name.startswith('_') or item.name.startswith('.'):
continue # Skip hidden or private directories
# Check if this directory contains workflow files (workflow.py and metadata.yaml)
workflow_file = item / "workflow.py"
metadata_file = item / "metadata.yaml"
if workflow_file.exists() and metadata_file.exists():
# This is a workflow directory
workflow_name = item.name
discovered_dirs.add(workflow_name)
# Only process workflows that are in the registry
if workflow_name not in self.registry:
logger.warning(
f"Workflow '{workflow_name}' found in filesystem but not in registry. "
f"Add it to toolbox/workflows/registry.py to enable deployment."
)
continue
try:
workflow_info = await self._load_workflow(item)
if workflow_info:
workflows[workflow_info.name] = workflow_info
logger.info(f"Discovered and registered workflow: {workflow_info.name}")
except Exception as e:
logger.error(f"Failed to load workflow from {item}: {e}")
else:
# This is a category directory, recurse into it
await self._scan_directory_recursive(item, workflows, discovered_dirs)
async def _load_workflow(self, workflow_dir: Path) -> Optional[WorkflowInfo]:
"""
Load and validate a single workflow.
Args:
workflow_dir: Path to the workflow directory
Returns:
WorkflowInfo if valid, None otherwise
"""
workflow_name = workflow_dir.name
# Check for mandatory files
workflow_file = workflow_dir / "workflow.py"
metadata_file = workflow_dir / "metadata.yaml"
if not workflow_file.exists():
logger.warning(f"Workflow {workflow_name} missing workflow.py")
return None
if not metadata_file.exists():
logger.error(f"Workflow {workflow_name} missing mandatory metadata.yaml")
return None
# Load and validate metadata
try:
metadata = self._load_metadata(metadata_file)
if not self._validate_metadata(metadata, workflow_name):
return None
except Exception as e:
logger.error(f"Failed to load metadata for {workflow_name}: {e}")
return None
# Check for mandatory Dockerfile
dockerfile = workflow_dir / "Dockerfile"
if not dockerfile.exists():
logger.error(f"Workflow {workflow_name} missing mandatory Dockerfile")
return None
has_docker = True # Always True since Dockerfile is mandatory
# Get flow function name from metadata or use default
flow_function_name = metadata.get("flow_function", "main_flow")
return WorkflowInfo(
name=workflow_name,
path=workflow_dir,
workflow_file=workflow_file,
dockerfile=dockerfile,
has_docker=has_docker,
metadata=metadata,
flow_function_name=flow_function_name
)
def _load_metadata(self, metadata_file: Path) -> Dict[str, Any]:
"""
Load metadata from YAML file.
Args:
metadata_file: Path to metadata.yaml
Returns:
Dictionary containing metadata
"""
with open(metadata_file, 'r') as f:
metadata = yaml.safe_load(f)
if metadata is None:
raise ValueError("Empty metadata file")
return metadata
def _validate_metadata(self, metadata: Dict[str, Any], workflow_name: str) -> bool:
"""
Validate that metadata contains all required fields.
Args:
metadata: Metadata dictionary
workflow_name: Name of the workflow for logging
Returns:
True if valid, False otherwise
"""
required_fields = ["name", "version", "description", "author", "category", "parameters", "requirements"]
missing_fields = []
for field in required_fields:
if field not in metadata:
missing_fields.append(field)
if missing_fields:
logger.error(
f"Workflow {workflow_name} metadata missing required fields: {missing_fields}"
)
return False
# Validate version format (semantic versioning)
version = metadata.get("version", "")
if not self._is_valid_version(version):
logger.error(f"Workflow {workflow_name} has invalid version format: {version}")
return False
# Validate parameters structure
parameters = metadata.get("parameters", {})
if not isinstance(parameters, dict):
logger.error(f"Workflow {workflow_name} parameters must be a dictionary")
return False
return True
def _is_valid_version(self, version: str) -> bool:
"""
Check if version follows semantic versioning (x.y.z).
Args:
version: Version string
Returns:
True if valid semantic version
"""
try:
parts = version.split('.')
if len(parts) != 3:
return False
for part in parts:
int(part) # Check if each part is a number
return True
except (ValueError, AttributeError):
return False
def invalidate_cache(self) -> None:
"""
Invalidate the workflow discovery cache.
Useful when workflows are added or modified.
"""
self._workflow_cache = None
self._cache_timestamp = None
logger.debug("Workflow discovery cache invalidated")
def get_flow_function(self, workflow_name: str) -> Optional[Callable]:
"""
Get the flow function from the registry.
Args:
workflow_name: Name of the workflow
Returns:
The flow function if found in registry, None otherwise
"""
if workflow_name not in self.registry:
logger.error(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {list(self.registry.keys())}"
)
return None
try:
from toolbox.workflows.registry import get_workflow_flow
flow_func = get_workflow_flow(workflow_name)
logger.debug(f"Retrieved flow function for '{workflow_name}' from registry")
return flow_func
except Exception as e:
logger.error(f"Failed to get flow function for '{workflow_name}': {e}")
return None
def get_registry_info(self, workflow_name: str) -> Optional[Dict[str, Any]]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information if found, None otherwise
"""
if workflow_name not in self.registry:
return None
try:
from toolbox.workflows.registry import get_workflow_info
return get_workflow_info(workflow_name)
except Exception as e:
logger.error(f"Failed to get registry info for '{workflow_name}': {e}")
return None
@staticmethod
def get_metadata_schema() -> Dict[str, Any]:
"""
Get the JSON schema for workflow metadata.
Returns:
JSON schema dictionary
"""
return {
"type": "object",
"required": ["name", "version", "description", "author", "category", "parameters", "requirements"],
"properties": {
"name": {
"type": "string",
"description": "Workflow name"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Semantic version (x.y.z)"
},
"description": {
"type": "string",
"description": "Workflow description"
},
"author": {
"type": "string",
"description": "Workflow author"
},
"category": {
"type": "string",
"enum": ["comprehensive", "specialized", "fuzzing", "focused"],
"description": "Workflow category"
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Workflow tags for categorization"
},
"requirements": {
"type": "object",
"required": ["tools", "resources"],
"properties": {
"tools": {
"type": "array",
"items": {"type": "string"},
"description": "Required security tools"
},
"resources": {
"type": "object",
"required": ["memory", "cpu", "timeout"],
"properties": {
"memory": {
"type": "string",
"pattern": "^\\d+[GMK]i$",
"description": "Memory limit (e.g., 1Gi, 512Mi)"
},
"cpu": {
"type": "string",
"pattern": "^\\d+m?$",
"description": "CPU limit (e.g., 1000m, 2)"
},
"timeout": {
"type": "integer",
"minimum": 60,
"maximum": 7200,
"description": "Workflow timeout in seconds"
}
}
}
}
},
"parameters": {
"type": "object",
"description": "Workflow parameters schema"
},
"default_parameters": {
"type": "object",
"description": "Default parameter values"
},
"required_modules": {
"type": "array",
"items": {"type": "string"},
"description": "Required module names"
},
"supported_volume_modes": {
"type": "array",
"items": {"enum": ["ro", "rw"]},
"default": ["ro", "rw"],
"description": "Supported volume mount modes"
},
"flow_function": {
"type": "string",
"default": "main_flow",
"description": "Name of the flow function in workflow.py"
}
}
}

864
backend/src/main.py Normal file
View File

@@ -0,0 +1,864 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import logging
import os
from uuid import UUID
from contextlib import AsyncExitStack, asynccontextmanager, suppress
from typing import Any, Dict, Optional, List
import uvicorn
from fastapi import FastAPI
from starlette.applications import Starlette
from starlette.routing import Mount
from fastmcp.server.http import create_sse_app
from src.core.prefect_manager import PrefectManager
from src.core.setup import setup_docker_pool, setup_result_storage, validate_infrastructure
from src.core.workflow_discovery import WorkflowDiscovery
from src.api import workflows, runs, fuzzing
from src.services.prefect_stats_monitor import prefect_stats_monitor
from fastmcp import FastMCP
from prefect.client.orchestration import get_client
from prefect.client.schemas.filters import (
FlowRunFilter,
FlowRunFilterDeploymentId,
FlowRunFilterState,
FlowRunFilterStateType,
)
from prefect.client.schemas.sorting import FlowRunSort
from prefect.states import StateType
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
prefect_mgr = PrefectManager()
class PrefectBootstrapState:
"""Tracks Prefect initialization progress for API and MCP consumers."""
def __init__(self) -> None:
self.ready: bool = False
self.status: str = "not_started"
self.last_error: Optional[str] = None
self.task_running: bool = False
def as_dict(self) -> Dict[str, Any]:
return {
"ready": self.ready,
"status": self.status,
"last_error": self.last_error,
"task_running": self.task_running,
}
prefect_bootstrap_state = PrefectBootstrapState()
# Configure retry strategy for bootstrapping Prefect + infrastructure
STARTUP_RETRY_SECONDS = max(1, int(os.getenv("FUZZFORGE_STARTUP_RETRY_SECONDS", "5")))
STARTUP_RETRY_MAX_SECONDS = max(
STARTUP_RETRY_SECONDS,
int(os.getenv("FUZZFORGE_STARTUP_RETRY_MAX_SECONDS", "60")),
)
prefect_bootstrap_task: Optional[asyncio.Task] = None
# ---------------------------------------------------------------------------
# FastAPI application (REST API remains unchanged)
# ---------------------------------------------------------------------------
app = FastAPI(
title="FuzzForge API",
description="Security testing workflow orchestration API with fuzzing support",
version="0.6.0",
)
app.include_router(workflows.router)
app.include_router(runs.router)
app.include_router(fuzzing.router)
def get_prefect_status() -> Dict[str, Any]:
"""Return a snapshot of Prefect bootstrap state for diagnostics."""
status = prefect_bootstrap_state.as_dict()
status["workflows_loaded"] = len(prefect_mgr.workflows)
status["deployments_tracked"] = len(prefect_mgr.deployments)
status["bootstrap_task_running"] = (
prefect_bootstrap_task is not None and not prefect_bootstrap_task.done()
)
return status
def _prefect_not_ready_status() -> Optional[Dict[str, Any]]:
"""Return status details if Prefect is not ready yet."""
status = get_prefect_status()
if status.get("ready"):
return None
return status
@app.get("/")
async def root() -> Dict[str, Any]:
status = get_prefect_status()
return {
"name": "FuzzForge API",
"version": "0.6.0",
"status": "ready" if status.get("ready") else "initializing",
"workflows_loaded": status.get("workflows_loaded", 0),
"prefect": status,
}
@app.get("/health")
async def health() -> Dict[str, str]:
status = get_prefect_status()
health_status = "healthy" if status.get("ready") else "initializing"
return {"status": health_status}
# Map FastAPI OpenAPI operationIds to readable MCP tool names
FASTAPI_MCP_NAME_OVERRIDES: Dict[str, str] = {
"list_workflows_workflows__get": "api_list_workflows",
"get_metadata_schema_workflows_metadata_schema_get": "api_get_metadata_schema",
"get_workflow_metadata_workflows__workflow_name__metadata_get": "api_get_workflow_metadata",
"submit_workflow_workflows__workflow_name__submit_post": "api_submit_workflow",
"get_workflow_parameters_workflows__workflow_name__parameters_get": "api_get_workflow_parameters",
"get_run_status_runs__run_id__status_get": "api_get_run_status",
"get_run_findings_runs__run_id__findings_get": "api_get_run_findings",
"get_workflow_findings_runs__workflow_name__findings__run_id__get": "api_get_workflow_findings",
"get_fuzzing_stats_fuzzing__run_id__stats_get": "api_get_fuzzing_stats",
"update_fuzzing_stats_fuzzing__run_id__stats_post": "api_update_fuzzing_stats",
"get_crash_reports_fuzzing__run_id__crashes_get": "api_get_crash_reports",
"report_crash_fuzzing__run_id__crash_post": "api_report_crash",
"stream_fuzzing_updates_fuzzing__run_id__stream_get": "api_stream_fuzzing_updates",
"cleanup_fuzzing_run_fuzzing__run_id__delete": "api_cleanup_fuzzing_run",
"root__get": "api_root",
"health_health_get": "api_health",
}
# Create an MCP adapter exposing all FastAPI endpoints via OpenAPI parsing
FASTAPI_MCP_ADAPTER = FastMCP.from_fastapi(
app,
name="FuzzForge FastAPI",
mcp_names=FASTAPI_MCP_NAME_OVERRIDES,
)
_fastapi_mcp_imported = False
# ---------------------------------------------------------------------------
# FastMCP server (runs on dedicated port outside FastAPI)
# ---------------------------------------------------------------------------
mcp = FastMCP(name="FuzzForge MCP")
async def _bootstrap_prefect_with_retries() -> None:
"""Initialize Prefect infrastructure with exponential backoff retries."""
attempt = 0
while True:
attempt += 1
prefect_bootstrap_state.task_running = True
prefect_bootstrap_state.status = "starting"
prefect_bootstrap_state.ready = False
prefect_bootstrap_state.last_error = None
try:
logger.info("Bootstrapping Prefect infrastructure...")
await validate_infrastructure()
await setup_docker_pool()
await setup_result_storage()
await prefect_mgr.initialize()
await prefect_stats_monitor.start_monitoring()
prefect_bootstrap_state.ready = True
prefect_bootstrap_state.status = "ready"
prefect_bootstrap_state.task_running = False
logger.info("Prefect infrastructure ready")
return
except asyncio.CancelledError:
prefect_bootstrap_state.status = "cancelled"
prefect_bootstrap_state.task_running = False
logger.info("Prefect bootstrap task cancelled")
raise
except Exception as exc: # pragma: no cover - defensive logging on infra startup
logger.exception("Prefect bootstrap failed")
prefect_bootstrap_state.ready = False
prefect_bootstrap_state.status = "error"
prefect_bootstrap_state.last_error = str(exc)
# Ensure partial initialization does not leave stale state behind
prefect_mgr.workflows.clear()
prefect_mgr.deployments.clear()
await prefect_stats_monitor.stop_monitoring()
wait_time = min(
STARTUP_RETRY_SECONDS * (2 ** (attempt - 1)),
STARTUP_RETRY_MAX_SECONDS,
)
logger.info("Retrying Prefect bootstrap in %s second(s)", wait_time)
try:
await asyncio.sleep(wait_time)
except asyncio.CancelledError:
prefect_bootstrap_state.status = "cancelled"
prefect_bootstrap_state.task_running = False
raise
def _lookup_workflow(workflow_name: str):
info = prefect_mgr.workflows.get(workflow_name)
if not info:
return None
metadata = info.metadata
defaults = metadata.get("default_parameters", {})
default_target_path = metadata.get("default_target_path") or defaults.get("target_path")
supported_modes = metadata.get("supported_volume_modes") or ["ro", "rw"]
if not isinstance(supported_modes, list) or not supported_modes:
supported_modes = ["ro", "rw"]
default_volume_mode = (
metadata.get("default_volume_mode")
or defaults.get("volume_mode")
or supported_modes[0]
)
return {
"name": workflow_name,
"version": metadata.get("version", "0.6.0"),
"description": metadata.get("description", ""),
"author": metadata.get("author"),
"tags": metadata.get("tags", []),
"parameters": metadata.get("parameters", {}),
"default_parameters": metadata.get("default_parameters", {}),
"required_modules": metadata.get("required_modules", []),
"supported_volume_modes": supported_modes,
"default_target_path": default_target_path,
"default_volume_mode": default_volume_mode,
"has_custom_docker": bool(info.has_docker),
}
@mcp.tool
async def list_workflows_mcp() -> Dict[str, Any]:
"""List all discovered workflows and their metadata summary."""
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"workflows": [],
"prefect": not_ready,
"message": "Prefect infrastructure is still initializing",
}
workflows_summary = []
for name, info in prefect_mgr.workflows.items():
metadata = info.metadata
defaults = metadata.get("default_parameters", {})
workflows_summary.append({
"name": name,
"version": metadata.get("version", "0.6.0"),
"description": metadata.get("description", ""),
"author": metadata.get("author"),
"tags": metadata.get("tags", []),
"supported_volume_modes": metadata.get("supported_volume_modes", ["ro", "rw"]),
"default_volume_mode": metadata.get("default_volume_mode")
or defaults.get("volume_mode")
or "ro",
"default_target_path": metadata.get("default_target_path")
or defaults.get("target_path"),
"has_custom_docker": bool(info.has_docker),
})
return {"workflows": workflows_summary, "prefect": get_prefect_status()}
@mcp.tool
async def get_workflow_metadata_mcp(workflow_name: str) -> Dict[str, Any]:
"""Fetch detailed metadata for a workflow."""
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
data = _lookup_workflow(workflow_name)
if not data:
return {"error": f"Workflow not found: {workflow_name}"}
return data
@mcp.tool
async def get_workflow_parameters_mcp(workflow_name: str) -> Dict[str, Any]:
"""Return the parameter schema and defaults for a workflow."""
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
data = _lookup_workflow(workflow_name)
if not data:
return {"error": f"Workflow not found: {workflow_name}"}
return {
"parameters": data.get("parameters", {}),
"defaults": data.get("default_parameters", {}),
}
@mcp.tool
async def get_workflow_metadata_schema_mcp() -> Dict[str, Any]:
"""Return the JSON schema describing workflow metadata files."""
return WorkflowDiscovery.get_metadata_schema()
@mcp.tool
async def submit_security_scan_mcp(
workflow_name: str,
target_path: str | None = None,
volume_mode: str | None = None,
parameters: Dict[str, Any] | None = None,
) -> Dict[str, Any] | Dict[str, str]:
"""Submit a Prefect workflow via MCP."""
try:
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
workflow_info = prefect_mgr.workflows.get(workflow_name)
if not workflow_info:
return {"error": f"Workflow '{workflow_name}' not found"}
metadata = workflow_info.metadata or {}
defaults = metadata.get("default_parameters", {})
resolved_target_path = target_path or metadata.get("default_target_path") or defaults.get("target_path")
if not resolved_target_path:
return {
"error": (
"target_path is required and no default_target_path is defined in metadata"
),
"metadata": {
"workflow": workflow_name,
"default_target_path": metadata.get("default_target_path"),
},
}
requested_volume_mode = volume_mode or metadata.get("default_volume_mode") or defaults.get("volume_mode")
if not requested_volume_mode:
requested_volume_mode = "ro"
normalised_volume_mode = (
str(requested_volume_mode).strip().lower().replace("-", "_")
)
if normalised_volume_mode in {"read_only", "readonly", "ro"}:
normalised_volume_mode = "ro"
elif normalised_volume_mode in {"read_write", "readwrite", "rw"}:
normalised_volume_mode = "rw"
else:
supported_modes = metadata.get("supported_volume_modes", ["ro", "rw"])
if isinstance(supported_modes, list) and normalised_volume_mode in supported_modes:
pass
else:
normalised_volume_mode = "ro"
parameters = parameters or {}
cleaned_parameters: Dict[str, Any] = {**defaults, **parameters}
# Ensure *_config structures default to dicts so Prefect validation passes.
for key, value in list(cleaned_parameters.items()):
if isinstance(key, str) and key.endswith("_config") and value is None:
cleaned_parameters[key] = {}
# Some workflows expect configuration dictionaries even when omitted.
parameter_definitions = (
metadata.get("parameters", {}).get("properties", {})
if isinstance(metadata.get("parameters"), dict)
else {}
)
for key, definition in parameter_definitions.items():
if not isinstance(key, str) or not key.endswith("_config"):
continue
if key not in cleaned_parameters:
default_value = definition.get("default") if isinstance(definition, dict) else None
cleaned_parameters[key] = default_value if default_value is not None else {}
elif cleaned_parameters[key] is None:
cleaned_parameters[key] = {}
flow_run = await prefect_mgr.submit_workflow(
workflow_name=workflow_name,
target_path=resolved_target_path,
volume_mode=normalised_volume_mode,
parameters=cleaned_parameters,
)
return {
"run_id": str(flow_run.id),
"status": flow_run.state.name if flow_run.state else "PENDING",
"workflow": workflow_name,
"message": f"Workflow '{workflow_name}' submitted successfully",
"target_path": resolved_target_path,
"volume_mode": normalised_volume_mode,
"parameters": cleaned_parameters,
"mcp_enabled": True,
}
except Exception as exc: # pragma: no cover - defensive logging
logger.exception("MCP submit failed")
return {"error": f"Failed to submit workflow: {exc}"}
@mcp.tool
async def get_comprehensive_scan_summary(run_id: str) -> Dict[str, Any] | Dict[str, str]:
"""Return a summary for the given flow run via MCP."""
try:
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
status = await prefect_mgr.get_flow_run_status(run_id)
findings = await prefect_mgr.get_flow_run_findings(run_id)
workflow_name = "unknown"
deployment_id = status.get("workflow", "")
for name, deployment in prefect_mgr.deployments.items():
if str(deployment) == str(deployment_id):
workflow_name = name
break
total_findings = 0
severity_summary = {"critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0}
if findings and "sarif" in findings:
sarif = findings["sarif"]
if isinstance(sarif, dict):
total_findings = sarif.get("total_findings", 0)
return {
"run_id": run_id,
"workflow": workflow_name,
"status": status.get("status", "unknown"),
"is_completed": status.get("is_completed", False),
"total_findings": total_findings,
"severity_summary": severity_summary,
"scan_duration": status.get("updated_at", "")
if status.get("is_completed")
else "In progress",
"recommendations": (
[
"Review high and critical severity findings first",
"Implement security fixes based on finding recommendations",
"Re-run scan after applying fixes to verify remediation",
]
if total_findings > 0
else ["No security issues found"]
),
"mcp_analysis": True,
}
except Exception as exc: # pragma: no cover
logger.exception("MCP summary failed")
return {"error": f"Failed to summarize run: {exc}"}
@mcp.tool
async def get_run_status_mcp(run_id: str) -> Dict[str, Any]:
"""Return current status information for a Prefect run."""
try:
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
status = await prefect_mgr.get_flow_run_status(run_id)
workflow_name = "unknown"
deployment_id = status.get("workflow", "")
for name, deployment in prefect_mgr.deployments.items():
if str(deployment) == str(deployment_id):
workflow_name = name
break
return {
"run_id": status["run_id"],
"workflow": workflow_name,
"status": status["status"],
"is_completed": status["is_completed"],
"is_failed": status["is_failed"],
"is_running": status["is_running"],
"created_at": status["created_at"],
"updated_at": status["updated_at"],
}
except Exception as exc:
logger.exception("MCP run status failed")
return {"error": f"Failed to get run status: {exc}"}
@mcp.tool
async def get_run_findings_mcp(run_id: str) -> Dict[str, Any]:
"""Return SARIF findings for a completed run."""
try:
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
status = await prefect_mgr.get_flow_run_status(run_id)
if not status.get("is_completed"):
return {"error": f"Run {run_id} not completed. Status: {status.get('status')}"}
findings = await prefect_mgr.get_flow_run_findings(run_id)
workflow_name = "unknown"
deployment_id = status.get("workflow", "")
for name, deployment in prefect_mgr.deployments.items():
if str(deployment) == str(deployment_id):
workflow_name = name
break
metadata = {
"completion_time": status.get("updated_at"),
"workflow_version": "unknown",
}
info = prefect_mgr.workflows.get(workflow_name)
if info:
metadata["workflow_version"] = info.metadata.get("version", "unknown")
return {
"workflow": workflow_name,
"run_id": run_id,
"sarif": findings,
"metadata": metadata,
}
except Exception as exc:
logger.exception("MCP findings failed")
return {"error": f"Failed to retrieve findings: {exc}"}
@mcp.tool
async def list_recent_runs_mcp(
limit: int = 10,
workflow_name: str | None = None,
states: List[str] | None = None,
) -> Dict[str, Any]:
"""List recent Prefect runs with optional workflow/state filters."""
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"runs": [],
"prefect": not_ready,
"message": "Prefect infrastructure is still initializing",
}
try:
limit_value = int(limit)
except (TypeError, ValueError):
limit_value = 10
limit_value = max(1, min(limit_value, 100))
deployment_map = {
str(deployment_id): workflow
for workflow, deployment_id in prefect_mgr.deployments.items()
}
deployment_filter_value = None
if workflow_name:
deployment_id = prefect_mgr.deployments.get(workflow_name)
if not deployment_id:
return {
"runs": [],
"prefect": get_prefect_status(),
"error": f"Workflow '{workflow_name}' has no registered deployment",
}
try:
deployment_filter_value = UUID(str(deployment_id))
except ValueError:
return {
"runs": [],
"prefect": get_prefect_status(),
"error": (
f"Deployment id '{deployment_id}' for workflow '{workflow_name}' is invalid"
),
}
desired_state_types: List[StateType] = []
if states:
for raw_state in states:
if not raw_state:
continue
normalised = raw_state.strip().upper()
if normalised == "ALL":
desired_state_types = []
break
try:
desired_state_types.append(StateType[normalised])
except KeyError:
continue
if not desired_state_types:
desired_state_types = [
StateType.RUNNING,
StateType.COMPLETED,
StateType.FAILED,
StateType.CANCELLED,
]
flow_filter = FlowRunFilter()
if desired_state_types:
flow_filter.state = FlowRunFilterState(
type=FlowRunFilterStateType(any_=desired_state_types)
)
if deployment_filter_value:
flow_filter.deployment_id = FlowRunFilterDeploymentId(
any_=[deployment_filter_value]
)
async with get_client() as client:
flow_runs = await client.read_flow_runs(
limit=limit_value,
flow_run_filter=flow_filter,
sort=FlowRunSort.START_TIME_DESC,
)
results: List[Dict[str, Any]] = []
for flow_run in flow_runs:
deployment_id = getattr(flow_run, "deployment_id", None)
workflow = deployment_map.get(str(deployment_id), "unknown")
state = getattr(flow_run, "state", None)
state_name = getattr(state, "name", None) if state else None
state_type = getattr(state, "type", None) if state else None
results.append(
{
"run_id": str(flow_run.id),
"workflow": workflow,
"deployment_id": str(deployment_id) if deployment_id else None,
"state": state_name or (state_type.name if state_type else None),
"state_type": state_type.name if state_type else None,
"is_completed": bool(getattr(state, "is_completed", lambda: False)()),
"is_running": bool(getattr(state, "is_running", lambda: False)()),
"is_failed": bool(getattr(state, "is_failed", lambda: False)()),
"created_at": getattr(flow_run, "created", None),
"updated_at": getattr(flow_run, "updated", None),
"expected_start_time": getattr(flow_run, "expected_start_time", None),
"start_time": getattr(flow_run, "start_time", None),
}
)
# Normalise datetimes to ISO 8601 strings for serialization
for entry in results:
for key in ("created_at", "updated_at", "expected_start_time", "start_time"):
value = entry.get(key)
if value is None:
continue
try:
entry[key] = value.isoformat()
except AttributeError:
entry[key] = str(value)
return {"runs": results, "prefect": get_prefect_status()}
@mcp.tool
async def get_fuzzing_stats_mcp(run_id: str) -> Dict[str, Any]:
"""Return fuzzing statistics for a run if available."""
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
stats = fuzzing.fuzzing_stats.get(run_id)
if not stats:
return {"error": f"Fuzzing run not found: {run_id}"}
# Be resilient if a plain dict slipped into the cache
if isinstance(stats, dict):
return stats
if hasattr(stats, "model_dump"):
return stats.model_dump()
if hasattr(stats, "dict"):
return stats.dict()
# Last resort
return getattr(stats, "__dict__", {"run_id": run_id})
@mcp.tool
async def get_fuzzing_crash_reports_mcp(run_id: str) -> Dict[str, Any]:
"""Return crash reports collected for a fuzzing run."""
not_ready = _prefect_not_ready_status()
if not_ready:
return {
"error": "Prefect infrastructure not ready",
"prefect": not_ready,
}
reports = fuzzing.crash_reports.get(run_id)
if reports is None:
return {"error": f"Fuzzing run not found: {run_id}"}
return {"run_id": run_id, "crashes": [report.model_dump() for report in reports]}
@mcp.tool
async def get_backend_status_mcp() -> Dict[str, Any]:
"""Expose backend readiness, workflows, and registered MCP tools."""
status = get_prefect_status()
response: Dict[str, Any] = {"prefect": status}
if status.get("ready"):
response["workflows"] = list(prefect_mgr.workflows.keys())
try:
tools = await mcp._tool_manager.list_tools()
response["mcp_tools"] = sorted(tool.name for tool in tools)
except Exception as exc: # pragma: no cover - defensive logging
logger.debug("Failed to enumerate MCP tools: %s", exc)
return response
def create_mcp_transport_app() -> Starlette:
"""Build a Starlette app serving HTTP + SSE transports on one port."""
http_app = mcp.http_app(path="/", transport="streamable-http")
sse_app = create_sse_app(
server=mcp,
message_path="/messages",
sse_path="/",
auth=mcp.auth,
)
routes = [
Mount("/mcp", app=http_app),
Mount("/mcp/sse", app=sse_app),
]
@asynccontextmanager
async def lifespan(app: Starlette): # pragma: no cover - integration wiring
async with AsyncExitStack() as stack:
await stack.enter_async_context(
http_app.router.lifespan_context(http_app)
)
await stack.enter_async_context(
sse_app.router.lifespan_context(sse_app)
)
yield
combined_app = Starlette(routes=routes, lifespan=lifespan)
combined_app.state.fastmcp_server = mcp
combined_app.state.http_app = http_app
combined_app.state.sse_app = sse_app
return combined_app
# ---------------------------------------------------------------------------
# Combined lifespan: Prefect init + dedicated MCP transports
# ---------------------------------------------------------------------------
@asynccontextmanager
async def combined_lifespan(app: FastAPI):
global prefect_bootstrap_task, _fastapi_mcp_imported
logger.info("Starting FuzzForge backend...")
# Ensure FastAPI endpoints are exposed via MCP once
if not _fastapi_mcp_imported:
try:
await mcp.import_server(FASTAPI_MCP_ADAPTER)
_fastapi_mcp_imported = True
logger.info("Mounted FastAPI endpoints as MCP tools")
except Exception as exc:
logger.exception("Failed to import FastAPI endpoints into MCP", exc_info=exc)
# Kick off Prefect bootstrap in the background if needed
if prefect_bootstrap_task is None or prefect_bootstrap_task.done():
prefect_bootstrap_task = asyncio.create_task(_bootstrap_prefect_with_retries())
logger.info("Prefect bootstrap task started")
else:
logger.info("Prefect bootstrap task already running")
# Start MCP transports on shared port (HTTP + SSE)
mcp_app = create_mcp_transport_app()
mcp_config = uvicorn.Config(
app=mcp_app,
host="0.0.0.0",
port=8010,
log_level="info",
lifespan="on",
)
mcp_server = uvicorn.Server(mcp_config)
mcp_server.install_signal_handlers = lambda: None # type: ignore[assignment]
mcp_task = asyncio.create_task(mcp_server.serve())
async def _wait_for_uvicorn_startup() -> None:
started_attr = getattr(mcp_server, "started", None)
if hasattr(started_attr, "wait"):
await asyncio.wait_for(started_attr.wait(), timeout=10)
return
# Fallback for uvicorn versions where "started" is a bool
poll_interval = 0.1
checks = int(10 / poll_interval)
for _ in range(checks):
if getattr(mcp_server, "started", False):
return
await asyncio.sleep(poll_interval)
raise asyncio.TimeoutError
try:
await _wait_for_uvicorn_startup()
except asyncio.TimeoutError: # pragma: no cover - defensive logging
if mcp_task.done():
raise RuntimeError("MCP server failed to start") from mcp_task.exception()
logger.warning("Timed out waiting for MCP server startup; continuing anyway")
logger.info("MCP HTTP available at http://0.0.0.0:8010/mcp")
logger.info("MCP SSE available at http://0.0.0.0:8010/mcp/sse")
try:
yield
finally:
logger.info("Shutting down MCP transports...")
mcp_server.should_exit = True
mcp_server.force_exit = True
await asyncio.gather(mcp_task, return_exceptions=True)
if prefect_bootstrap_task and not prefect_bootstrap_task.done():
prefect_bootstrap_task.cancel()
with suppress(asyncio.CancelledError):
await prefect_bootstrap_task
prefect_bootstrap_state.task_running = False
if not prefect_bootstrap_state.ready:
prefect_bootstrap_state.status = "stopped"
prefect_bootstrap_state.next_retry_seconds = None
prefect_bootstrap_task = None
logger.info("Shutting down Prefect statistics monitor...")
await prefect_stats_monitor.stop_monitoring()
logger.info("Shutting down FuzzForge backend...")
app.router.lifespan_context = combined_lifespan

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,182 @@
"""
Models for workflow findings and submissions
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from pydantic import BaseModel, Field, field_validator
from typing import Dict, Any, Optional, Literal, List
from datetime import datetime
from pathlib import Path
class WorkflowFindings(BaseModel):
"""Findings from a workflow execution in SARIF format"""
workflow: str = Field(..., description="Workflow name")
run_id: str = Field(..., description="Unique run identifier")
sarif: Dict[str, Any] = Field(..., description="SARIF formatted findings")
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
class ResourceLimits(BaseModel):
"""Resource limits for workflow execution"""
cpu_limit: Optional[str] = Field(None, description="CPU limit (e.g., '2' for 2 cores, '500m' for 0.5 cores)")
memory_limit: Optional[str] = Field(None, description="Memory limit (e.g., '1Gi', '512Mi')")
cpu_request: Optional[str] = Field(None, description="CPU request (guaranteed)")
memory_request: Optional[str] = Field(None, description="Memory request (guaranteed)")
class VolumeMount(BaseModel):
"""Volume mount specification"""
host_path: str = Field(..., description="Host path to mount")
container_path: str = Field(..., description="Container path for mount")
mode: Literal["ro", "rw"] = Field(default="ro", description="Mount mode")
@field_validator("host_path")
@classmethod
def validate_host_path(cls, v):
"""Validate that the host path is absolute (existence checked at runtime)"""
path = Path(v)
if not path.is_absolute():
raise ValueError(f"Host path must be absolute: {v}")
# Note: Path existence is validated at workflow runtime
# We can't validate existence here as this runs inside Docker container
return str(path)
@field_validator("container_path")
@classmethod
def validate_container_path(cls, v):
"""Validate that the container path is absolute"""
if not v.startswith('/'):
raise ValueError(f"Container path must be absolute: {v}")
return v
class WorkflowSubmission(BaseModel):
"""Submit a workflow with configurable settings"""
target_path: str = Field(..., description="Absolute path to analyze")
volume_mode: Literal["ro", "rw"] = Field(
default="ro",
description="Volume mount mode: read-only (ro) or read-write (rw)"
)
parameters: Dict[str, Any] = Field(
default_factory=dict,
description="Workflow-specific parameters"
)
timeout: Optional[int] = Field(
default=None, # Allow workflow-specific defaults
description="Timeout in seconds (None for workflow default)",
ge=1,
le=604800 # Max 7 days to support fuzzing campaigns
)
resource_limits: Optional[ResourceLimits] = Field(
None,
description="Resource limits for workflow container"
)
additional_volumes: List[VolumeMount] = Field(
default_factory=list,
description="Additional volume mounts (e.g., for corpus, output directories)"
)
@field_validator("target_path")
@classmethod
def validate_path(cls, v):
"""Validate that the target path is absolute (existence checked at runtime)"""
path = Path(v)
if not path.is_absolute():
raise ValueError(f"Path must be absolute: {v}")
# Note: Path existence is validated at workflow runtime when volumes are mounted
# We can't validate existence here as this runs inside Docker container
return str(path)
class WorkflowStatus(BaseModel):
"""Status of a workflow run"""
run_id: str = Field(..., description="Unique run identifier")
workflow: str = Field(..., description="Workflow name")
status: str = Field(..., description="Current status")
is_completed: bool = Field(..., description="Whether the run is completed")
is_failed: bool = Field(..., description="Whether the run failed")
is_running: bool = Field(..., description="Whether the run is currently running")
created_at: datetime = Field(..., description="Run creation time")
updated_at: datetime = Field(..., description="Last update time")
class WorkflowMetadata(BaseModel):
"""Complete metadata for a workflow"""
name: str = Field(..., description="Workflow name")
version: str = Field(..., description="Semantic version")
description: str = Field(..., description="Workflow description")
author: Optional[str] = Field(None, description="Workflow author")
tags: List[str] = Field(default_factory=list, description="Workflow tags")
parameters: Dict[str, Any] = Field(..., description="Parameters schema")
default_parameters: Dict[str, Any] = Field(
default_factory=dict,
description="Default parameter values"
)
required_modules: List[str] = Field(
default_factory=list,
description="Required module names"
)
supported_volume_modes: List[Literal["ro", "rw"]] = Field(
default=["ro", "rw"],
description="Supported volume mount modes"
)
has_custom_docker: bool = Field(
default=False,
description="Whether workflow has custom Dockerfile"
)
class WorkflowListItem(BaseModel):
"""Summary information for a workflow in list views"""
name: str = Field(..., description="Workflow name")
version: str = Field(..., description="Semantic version")
description: str = Field(..., description="Workflow description")
author: Optional[str] = Field(None, description="Workflow author")
tags: List[str] = Field(default_factory=list, description="Workflow tags")
class RunSubmissionResponse(BaseModel):
"""Response after submitting a workflow"""
run_id: str = Field(..., description="Unique run identifier")
status: str = Field(..., description="Initial status")
workflow: str = Field(..., description="Workflow name")
message: str = Field(default="Workflow submitted successfully")
class FuzzingStats(BaseModel):
"""Real-time fuzzing statistics"""
run_id: str = Field(..., description="Unique run identifier")
workflow: str = Field(..., description="Workflow name")
executions: int = Field(default=0, description="Total executions")
executions_per_sec: float = Field(default=0.0, description="Current execution rate")
crashes: int = Field(default=0, description="Total crashes found")
unique_crashes: int = Field(default=0, description="Unique crashes")
coverage: Optional[float] = Field(None, description="Code coverage percentage")
corpus_size: int = Field(default=0, description="Current corpus size")
elapsed_time: int = Field(default=0, description="Elapsed time in seconds")
last_crash_time: Optional[datetime] = Field(None, description="Time of last crash")
class CrashReport(BaseModel):
"""Individual crash report from fuzzing"""
run_id: str = Field(..., description="Run identifier")
crash_id: str = Field(..., description="Unique crash identifier")
timestamp: datetime = Field(default_factory=datetime.utcnow)
signal: Optional[str] = Field(None, description="Crash signal (SIGSEGV, etc.)")
crash_type: Optional[str] = Field(None, description="Type of crash")
stack_trace: Optional[str] = Field(None, description="Stack trace")
input_file: Optional[str] = Field(None, description="Path to crashing input")
reproducer: Optional[str] = Field(None, description="Minimized reproducer")
severity: str = Field(default="medium", description="Crash severity")
exploitability: Optional[str] = Field(None, description="Exploitability assessment")

View File

@@ -0,0 +1,394 @@
"""
Generic Prefect Statistics Monitor Service
This service monitors ALL workflows for structured live data logging and
updates the appropriate statistics APIs. Works with any workflow that follows
the standard LIVE_STATS logging pattern.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import json
import logging
from datetime import datetime, timedelta, timezone
from typing import Dict, Any, Optional
from prefect.client.orchestration import get_client
from prefect.client.schemas.objects import FlowRun, TaskRun
from src.models.findings import FuzzingStats
from src.api.fuzzing import fuzzing_stats, initialize_fuzzing_tracking, active_connections
logger = logging.getLogger(__name__)
class PrefectStatsMonitor:
"""Monitors Prefect flows and tasks for live statistics from any workflow"""
def __init__(self):
self.monitoring = False
self.monitor_task = None
self.monitored_runs = set()
self.last_log_ts: Dict[str, datetime] = {}
self._client = None
self._client_refresh_time = None
self._client_refresh_interval = 300 # Refresh connection every 5 minutes
async def start_monitoring(self):
"""Start the Prefect statistics monitoring service"""
if self.monitoring:
logger.warning("Prefect stats monitor already running")
return
self.monitoring = True
self.monitor_task = asyncio.create_task(self._monitor_flows())
logger.info("Started Prefect statistics monitor")
async def stop_monitoring(self):
"""Stop the monitoring service"""
self.monitoring = False
if self.monitor_task:
self.monitor_task.cancel()
try:
await self.monitor_task
except asyncio.CancelledError:
pass
logger.info("Stopped Prefect statistics monitor")
async def _get_or_refresh_client(self):
"""Get or refresh Prefect client with connection pooling."""
now = datetime.now(timezone.utc)
if (self._client is None or
self._client_refresh_time is None or
(now - self._client_refresh_time).total_seconds() > self._client_refresh_interval):
if self._client:
try:
await self._client.aclose()
except Exception:
pass
self._client = get_client()
self._client_refresh_time = now
await self._client.__aenter__()
return self._client
async def _monitor_flows(self):
"""Main monitoring loop that watches Prefect flows"""
try:
while self.monitoring:
try:
# Use connection pooling for better performance
client = await self._get_or_refresh_client()
# Get recent flow runs (limit to reduce load)
flow_runs = await client.read_flow_runs(
limit=50,
sort="START_TIME_DESC",
)
# Only consider runs from the last 15 minutes
recent_cutoff = datetime.now(timezone.utc) - timedelta(minutes=15)
for flow_run in flow_runs:
created = getattr(flow_run, "created", None)
if created is None:
continue
try:
# Ensure timezone-aware comparison
if created.tzinfo is None:
created = created.replace(tzinfo=timezone.utc)
if created >= recent_cutoff:
await self._monitor_flow_run(client, flow_run)
except Exception:
# If comparison fails, attempt monitoring anyway
await self._monitor_flow_run(client, flow_run)
await asyncio.sleep(5) # Check every 5 seconds
except Exception as e:
logger.error(f"Error in Prefect monitoring: {e}")
await asyncio.sleep(10)
except asyncio.CancelledError:
logger.info("Prefect monitoring cancelled")
except Exception as e:
logger.error(f"Fatal error in Prefect monitoring: {e}")
finally:
# Clean up client on exit
if self._client:
try:
await self._client.__aexit__(None, None, None)
except Exception:
pass
self._client = None
async def _monitor_flow_run(self, client, flow_run: FlowRun):
"""Monitor a specific flow run for statistics"""
run_id = str(flow_run.id)
workflow_name = flow_run.name or "unknown"
try:
# Initialize tracking if not exists - only for workflows that might have live stats
if run_id not in fuzzing_stats:
initialize_fuzzing_tracking(run_id, workflow_name)
self.monitored_runs.add(run_id)
# Skip corrupted entries (should not happen after startup cleanup, but defensive)
elif not isinstance(fuzzing_stats[run_id], FuzzingStats):
logger.warning(f"Skipping corrupted stats entry for {run_id}, reinitializing")
initialize_fuzzing_tracking(run_id, workflow_name)
self.monitored_runs.add(run_id)
# Get task runs for this flow
task_runs = await client.read_task_runs(
flow_run_filter={"id": {"any_": [flow_run.id]}},
limit=25,
)
# Check all tasks for live statistics logging
for task_run in task_runs:
await self._extract_stats_from_task(client, run_id, task_run, workflow_name)
# Also scan flow-level logs as a fallback
await self._extract_stats_from_flow_logs(client, run_id, flow_run, workflow_name)
except Exception as e:
logger.warning(f"Error monitoring flow run {run_id}: {e}")
async def _extract_stats_from_task(self, client, run_id: str, task_run: TaskRun, workflow_name: str):
"""Extract statistics from any task that logs live stats"""
try:
# Get task run logs
logs = await client.read_logs(
log_filter={
"task_run_id": {"any_": [task_run.id]}
},
limit=100,
sort="TIMESTAMP_ASC"
)
# Parse logs for LIVE_STATS entries (generic pattern for any workflow)
latest_stats = None
for log in logs:
# Prefer structured extra field if present
extra_data = getattr(log, "extra", None) or getattr(log, "extra_fields", None) or None
if isinstance(extra_data, dict):
stat_type = extra_data.get("stats_type")
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
latest_stats = extra_data
continue
# Fallback to parsing from message text
if ("FUZZ_STATS" in log.message or "LIVE_STATS" in log.message):
stats = self._parse_stats_from_log(log.message)
if stats:
latest_stats = stats
# Update statistics if we found any
if latest_stats:
# Calculate elapsed time from task start
elapsed_time = 0
if task_run.start_time:
# Ensure timezone-aware arithmetic
now = datetime.now(timezone.utc)
try:
elapsed_time = int((now - task_run.start_time).total_seconds())
except Exception:
# Fallback to naive UTC if types mismatch
elapsed_time = int((datetime.utcnow() - task_run.start_time.replace(tzinfo=None)).total_seconds())
updated_stats = FuzzingStats(
run_id=run_id,
workflow=workflow_name,
executions=latest_stats.get("executions", 0),
executions_per_sec=latest_stats.get("executions_per_sec", 0.0),
crashes=latest_stats.get("crashes", 0),
unique_crashes=latest_stats.get("unique_crashes", 0),
corpus_size=latest_stats.get("corpus_size", 0),
elapsed_time=elapsed_time
)
# Update the global stats
previous = fuzzing_stats.get(run_id)
fuzzing_stats[run_id] = updated_stats
# Broadcast to any active WebSocket clients for this run
if active_connections.get(run_id):
# Handle both Pydantic objects and plain dicts
if isinstance(updated_stats, dict):
stats_data = updated_stats
elif hasattr(updated_stats, 'model_dump'):
stats_data = updated_stats.model_dump()
elif hasattr(updated_stats, 'dict'):
stats_data = updated_stats.dict()
else:
stats_data = updated_stats.__dict__
message = {
"type": "stats_update",
"data": stats_data,
}
disconnected = []
for ws in active_connections[run_id]:
try:
await ws.send_text(json.dumps(message))
except Exception:
disconnected.append(ws)
# Clean up disconnected sockets
for ws in disconnected:
try:
active_connections[run_id].remove(ws)
except ValueError:
pass
logger.debug(f"Updated Prefect stats for {run_id}: {updated_stats.executions} execs")
except Exception as e:
logger.warning(f"Error extracting stats from task {task_run.id}: {e}")
async def _extract_stats_from_flow_logs(self, client, run_id: str, flow_run: FlowRun, workflow_name: str):
"""Extract statistics by scanning flow-level logs for LIVE/FUZZ stats"""
try:
logs = await client.read_logs(
log_filter={
"flow_run_id": {"any_": [flow_run.id]}
},
limit=200,
sort="TIMESTAMP_ASC"
)
latest_stats = None
last_seen = self.last_log_ts.get(run_id)
max_ts = last_seen
for log in logs:
# Skip logs we've already processed
ts = getattr(log, "timestamp", None)
if last_seen and ts and ts <= last_seen:
continue
if ts and (max_ts is None or ts > max_ts):
max_ts = ts
# Prefer structured extra field if available
extra_data = getattr(log, "extra", None) or getattr(log, "extra_fields", None) or None
if isinstance(extra_data, dict):
stat_type = extra_data.get("stats_type")
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
latest_stats = extra_data
continue
# Fallback to message parse
if ("FUZZ_STATS" in log.message or "LIVE_STATS" in log.message):
stats = self._parse_stats_from_log(log.message)
if stats:
latest_stats = stats
if max_ts:
self.last_log_ts[run_id] = max_ts
if latest_stats:
# Use flow_run timestamps for elapsed time if available
elapsed_time = 0
start_time = getattr(flow_run, "start_time", None) or getattr(flow_run, "start_time", None)
if start_time:
now = datetime.now(timezone.utc)
try:
if start_time.tzinfo is None:
start_time = start_time.replace(tzinfo=timezone.utc)
elapsed_time = int((now - start_time).total_seconds())
except Exception:
elapsed_time = int((datetime.utcnow() - start_time.replace(tzinfo=None)).total_seconds())
updated_stats = FuzzingStats(
run_id=run_id,
workflow=workflow_name,
executions=latest_stats.get("executions", 0),
executions_per_sec=latest_stats.get("executions_per_sec", 0.0),
crashes=latest_stats.get("crashes", 0),
unique_crashes=latest_stats.get("unique_crashes", 0),
corpus_size=latest_stats.get("corpus_size", 0),
elapsed_time=elapsed_time
)
fuzzing_stats[run_id] = updated_stats
# Broadcast if listeners exist
if active_connections.get(run_id):
# Handle both Pydantic objects and plain dicts
if isinstance(updated_stats, dict):
stats_data = updated_stats
elif hasattr(updated_stats, 'model_dump'):
stats_data = updated_stats.model_dump()
elif hasattr(updated_stats, 'dict'):
stats_data = updated_stats.dict()
else:
stats_data = updated_stats.__dict__
message = {
"type": "stats_update",
"data": stats_data,
}
disconnected = []
for ws in active_connections[run_id]:
try:
await ws.send_text(json.dumps(message))
except Exception:
disconnected.append(ws)
for ws in disconnected:
try:
active_connections[run_id].remove(ws)
except ValueError:
pass
except Exception as e:
logger.warning(f"Error extracting stats from flow logs {run_id}: {e}")
def _parse_stats_from_log(self, log_message: str) -> Optional[Dict[str, Any]]:
"""Parse statistics from a log message"""
try:
import re
# Prefer explicit JSON after marker tokens
m = re.search(r'(?:FUZZ_STATS|LIVE_STATS)\s+(\{.*\})', log_message)
if m:
try:
return json.loads(m.group(1))
except Exception:
pass
# Fallback: Extract the extra= dict and coerce to JSON
stats_match = re.search(r'extra=({.*?})', log_message)
if not stats_match:
return None
extra_str = stats_match.group(1)
extra_str = extra_str.replace("'", '"')
extra_str = extra_str.replace('None', 'null')
extra_str = extra_str.replace('True', 'true')
extra_str = extra_str.replace('False', 'false')
stats_data = json.loads(extra_str)
# Support multiple stat types for different workflows
stat_type = stats_data.get("stats_type")
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
return stats_data
except Exception as e:
logger.debug(f"Error parsing log stats: {e}")
return None
# Global instance
prefect_stats_monitor = PrefectStatsMonitor()

19
backend/tests/conftest.py Normal file
View File

@@ -0,0 +1,19 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
from pathlib import Path
# Ensure project root is on sys.path so `src` is importable
ROOT = Path(__file__).resolve().parents[1]
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))

View File

@@ -0,0 +1,82 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
from datetime import datetime, timezone, timedelta
from src.services.prefect_stats_monitor import PrefectStatsMonitor
from src.api import fuzzing
class FakeLog:
def __init__(self, message: str):
self.message = message
class FakeClient:
def __init__(self, logs):
self._logs = logs
async def read_logs(self, log_filter=None, limit=100, sort="TIMESTAMP_ASC"):
return self._logs
class FakeTaskRun:
def __init__(self):
self.id = "task-1"
self.start_time = datetime.now(timezone.utc) - timedelta(seconds=5)
def test_parse_stats_from_log_fuzzing():
mon = PrefectStatsMonitor()
msg = (
"INFO LIVE_STATS extra={'stats_type': 'fuzzing_live_update', "
"'executions': 42, 'executions_per_sec': 3.14, 'crashes': 1, 'unique_crashes': 1, 'corpus_size': 9}"
)
stats = mon._parse_stats_from_log(msg)
assert stats is not None
assert stats["stats_type"] == "fuzzing_live_update"
assert stats["executions"] == 42
def test_extract_stats_updates_and_broadcasts():
mon = PrefectStatsMonitor()
run_id = "run-123"
workflow = "wf"
fuzzing.initialize_fuzzing_tracking(run_id, workflow)
# Prepare a fake websocket to capture messages
sent = []
class FakeWS:
async def send_text(self, text: str):
sent.append(text)
fuzzing.active_connections[run_id] = [FakeWS()]
# Craft a log line the parser understands
msg = (
"INFO LIVE_STATS extra={'stats_type': 'fuzzing_live_update', "
"'executions': 10, 'executions_per_sec': 1.5, 'crashes': 0, 'unique_crashes': 0, 'corpus_size': 2}"
)
fake_client = FakeClient([FakeLog(msg)])
task_run = FakeTaskRun()
asyncio.run(mon._extract_stats_from_task(fake_client, run_id, task_run, workflow))
# Verify stats updated
stats = fuzzing.fuzzing_stats[run_id]
assert stats.executions == 10
assert stats.executions_per_sec == 1.5
# Verify a message was sent to WebSocket
assert sent, "Expected a stats_update message to be sent"

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,14 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from .security_analyzer import SecurityAnalyzer
__all__ = ["SecurityAnalyzer"]

View File

@@ -0,0 +1,368 @@
"""
Security Analyzer Module - Analyzes code for security vulnerabilities
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import re
from pathlib import Path
from typing import Dict, Any, List, Optional
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
except ImportError:
try:
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
except ImportError:
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
class SecurityAnalyzer(BaseModule):
"""
Analyzes source code for common security vulnerabilities.
This module:
- Detects hardcoded secrets and credentials
- Identifies dangerous function calls
- Finds SQL injection vulnerabilities
- Detects insecure configurations
"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="security_analyzer",
version="1.0.0",
description="Analyzes code for security vulnerabilities",
author="FuzzForge Team",
category="analyzer",
tags=["security", "vulnerabilities", "static-analysis"],
input_schema={
"file_extensions": {
"type": "array",
"items": {"type": "string"},
"description": "File extensions to analyze",
"default": [".py", ".js", ".java", ".php", ".rb", ".go"]
},
"check_secrets": {
"type": "boolean",
"description": "Check for hardcoded secrets",
"default": True
},
"check_sql": {
"type": "boolean",
"description": "Check for SQL injection risks",
"default": True
},
"check_dangerous_functions": {
"type": "boolean",
"description": "Check for dangerous function calls",
"default": True
}
},
output_schema={
"findings": {
"type": "array",
"description": "List of security findings"
}
},
requires_workspace=True
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate module configuration"""
extensions = config.get("file_extensions", [])
if not isinstance(extensions, list):
raise ValueError("file_extensions must be a list")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""
Execute the security analysis module.
Args:
config: Module configuration
workspace: Path to the workspace directory
Returns:
ModuleResult with security findings
"""
self.start_timer()
self.validate_workspace(workspace)
self.validate_config(config)
findings = []
files_analyzed = 0
# Get configuration
file_extensions = config.get("file_extensions", [".py", ".js", ".java", ".php", ".rb", ".go"])
check_secrets = config.get("check_secrets", True)
check_sql = config.get("check_sql", True)
check_dangerous = config.get("check_dangerous_functions", True)
logger.info(f"Analyzing files with extensions: {file_extensions}")
try:
# Analyze each file
for ext in file_extensions:
for file_path in workspace.rglob(f"*{ext}"):
if not file_path.is_file():
continue
files_analyzed += 1
relative_path = file_path.relative_to(workspace)
try:
content = file_path.read_text(encoding='utf-8', errors='ignore')
lines = content.splitlines()
# Check for secrets
if check_secrets:
secret_findings = self._check_hardcoded_secrets(
content, lines, relative_path
)
findings.extend(secret_findings)
# Check for SQL injection
if check_sql and ext in [".py", ".php", ".java", ".js"]:
sql_findings = self._check_sql_injection(
content, lines, relative_path
)
findings.extend(sql_findings)
# Check for dangerous functions
if check_dangerous:
dangerous_findings = self._check_dangerous_functions(
content, lines, relative_path, ext
)
findings.extend(dangerous_findings)
except Exception as e:
logger.error(f"Error analyzing file {relative_path}: {e}")
# Create summary
summary = {
"files_analyzed": files_analyzed,
"total_findings": len(findings),
"extensions_scanned": file_extensions
}
return self.create_result(
findings=findings,
status="success" if files_analyzed > 0 else "partial",
summary=summary,
metadata={
"workspace": str(workspace),
"config": config
}
)
except Exception as e:
logger.error(f"Security analyzer failed: {e}")
return self.create_result(
findings=findings,
status="failed",
error=str(e)
)
def _check_hardcoded_secrets(
self, content: str, lines: List[str], file_path: Path
) -> List[ModuleFinding]:
"""
Check for hardcoded secrets in code.
Args:
content: File content
lines: File lines
file_path: Relative file path
Returns:
List of findings
"""
findings = []
# Patterns for secrets
secret_patterns = [
(r'api[_-]?key\s*=\s*["\']([^"\']{20,})["\']', 'API Key'),
(r'api[_-]?secret\s*=\s*["\']([^"\']{20,})["\']', 'API Secret'),
(r'password\s*=\s*["\']([^"\']+)["\']', 'Hardcoded Password'),
(r'token\s*=\s*["\']([^"\']{20,})["\']', 'Authentication Token'),
(r'aws[_-]?access[_-]?key\s*=\s*["\']([^"\']+)["\']', 'AWS Access Key'),
(r'aws[_-]?secret[_-]?key\s*=\s*["\']([^"\']+)["\']', 'AWS Secret Key'),
(r'private[_-]?key\s*=\s*["\']([^"\']+)["\']', 'Private Key'),
(r'["\']([A-Za-z0-9]{32,})["\']', 'Potential Secret Hash'),
(r'Bearer\s+([A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+)', 'JWT Token'),
]
for pattern, secret_type in secret_patterns:
for match in re.finditer(pattern, content, re.IGNORECASE):
# Find line number
line_num = content[:match.start()].count('\n') + 1
line_content = lines[line_num - 1] if line_num <= len(lines) else ""
# Skip common false positives
if self._is_false_positive_secret(match.group(0)):
continue
findings.append(self.create_finding(
title=f"Hardcoded {secret_type} detected",
description=f"Found potential hardcoded {secret_type} in {file_path}",
severity="high" if "key" in secret_type.lower() else "medium",
category="hardcoded_secret",
file_path=str(file_path),
line_start=line_num,
code_snippet=line_content.strip()[:100],
recommendation=f"Remove hardcoded {secret_type} and use environment variables or secure vault",
metadata={"secret_type": secret_type}
))
return findings
def _check_sql_injection(
self, content: str, lines: List[str], file_path: Path
) -> List[ModuleFinding]:
"""
Check for potential SQL injection vulnerabilities.
Args:
content: File content
lines: File lines
file_path: Relative file path
Returns:
List of findings
"""
findings = []
# SQL injection patterns
sql_patterns = [
(r'(SELECT|INSERT|UPDATE|DELETE).*\+\s*[\'"]?\s*\+?\s*\w+', 'String concatenation in SQL'),
(r'(SELECT|INSERT|UPDATE|DELETE).*%\s*[\'"]?\s*%?\s*\w+', 'String formatting in SQL'),
(r'f[\'"].*?(SELECT|INSERT|UPDATE|DELETE).*?\{.*?\}', 'F-string in SQL query'),
(r'query\s*=.*?\+', 'Dynamic query building'),
(r'execute\s*\(.*?\+.*?\)', 'Dynamic execute statement'),
]
for pattern, vuln_type in sql_patterns:
for match in re.finditer(pattern, content, re.IGNORECASE):
line_num = content[:match.start()].count('\n') + 1
line_content = lines[line_num - 1] if line_num <= len(lines) else ""
findings.append(self.create_finding(
title=f"Potential SQL Injection: {vuln_type}",
description=f"Detected potential SQL injection vulnerability via {vuln_type}",
severity="high",
category="sql_injection",
file_path=str(file_path),
line_start=line_num,
code_snippet=line_content.strip()[:100],
recommendation="Use parameterized queries or prepared statements instead",
metadata={"vulnerability_type": vuln_type}
))
return findings
def _check_dangerous_functions(
self, content: str, lines: List[str], file_path: Path, ext: str
) -> List[ModuleFinding]:
"""
Check for dangerous function calls.
Args:
content: File content
lines: File lines
file_path: Relative file path
ext: File extension
Returns:
List of findings
"""
findings = []
# Language-specific dangerous functions
dangerous_functions = {
".py": [
(r'eval\s*\(', 'eval()', 'Arbitrary code execution'),
(r'exec\s*\(', 'exec()', 'Arbitrary code execution'),
(r'os\.system\s*\(', 'os.system()', 'Command injection risk'),
(r'subprocess\.call\s*\(.*shell=True', 'subprocess with shell=True', 'Command injection risk'),
(r'pickle\.loads?\s*\(', 'pickle.load()', 'Deserialization vulnerability'),
],
".js": [
(r'eval\s*\(', 'eval()', 'Arbitrary code execution'),
(r'new\s+Function\s*\(', 'new Function()', 'Arbitrary code execution'),
(r'innerHTML\s*=', 'innerHTML', 'XSS vulnerability'),
(r'document\.write\s*\(', 'document.write()', 'XSS vulnerability'),
],
".php": [
(r'eval\s*\(', 'eval()', 'Arbitrary code execution'),
(r'exec\s*\(', 'exec()', 'Command execution'),
(r'system\s*\(', 'system()', 'Command execution'),
(r'shell_exec\s*\(', 'shell_exec()', 'Command execution'),
(r'\$_GET\[', 'Direct $_GET usage', 'Input validation missing'),
(r'\$_POST\[', 'Direct $_POST usage', 'Input validation missing'),
]
}
if ext in dangerous_functions:
for pattern, func_name, risk_type in dangerous_functions[ext]:
for match in re.finditer(pattern, content):
line_num = content[:match.start()].count('\n') + 1
line_content = lines[line_num - 1] if line_num <= len(lines) else ""
findings.append(self.create_finding(
title=f"Dangerous function: {func_name}",
description=f"Use of potentially dangerous function {func_name}: {risk_type}",
severity="medium",
category="dangerous_function",
file_path=str(file_path),
line_start=line_num,
code_snippet=line_content.strip()[:100],
recommendation=f"Consider safer alternatives to {func_name}",
metadata={
"function": func_name,
"risk": risk_type
}
))
return findings
def _is_false_positive_secret(self, value: str) -> bool:
"""
Check if a potential secret is likely a false positive.
Args:
value: Potential secret value
Returns:
True if likely false positive
"""
false_positive_patterns = [
'example',
'test',
'demo',
'sample',
'dummy',
'placeholder',
'xxx',
'123',
'change',
'your',
'here'
]
value_lower = value.lower()
return any(pattern in value_lower for pattern in false_positive_patterns)

View File

@@ -0,0 +1,25 @@
"""
Android Security Modules
This package contains modules for android static code analysis and security testing.
Available modules:
- MobSF: Mobile Security Framework
- Jadx: Dex to Java decompiler
- OpenGrep: Open-source pattern-based static analysis tool
"""
from typing import List, Type
from ..base import BaseModule
# Module registry for automatic discovery
ANDROID_MODULES: List[Type[BaseModule]] = []
def register_module(module_class: Type[BaseModule]):
"""Register a android security module"""
ANDROID_MODULES.append(module_class)
return module_class
def get_available_modules() -> List[Type[BaseModule]]:
"""Get all available android modules"""
return ANDROID_MODULES.copy()

View File

@@ -0,0 +1,15 @@
rules:
- id: clipboard-sensitive-data
severity: WARNING
languages: [java]
message: "Sensitive data may be copied to the clipboard."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
category: security
area: clipboard
verification-level: [L1]
paths:
include:
- "**/*.java"
pattern: "$CLIPBOARD.setPrimaryClip($CLIP)"

View File

@@ -0,0 +1,23 @@
rules:
- id: hardcoded-secrets
severity: WARNING
languages: [java]
message: "Possible hardcoded secret found in variable '$NAME'."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M2
category: secrets
verification-level: [L1]
paths:
include:
- "**/*.java"
patterns:
- pattern-either:
- pattern: 'String $NAME = "$VAL";'
- pattern: 'final String $NAME = "$VAL";'
- pattern: 'private String $NAME = "$VAL";'
- pattern: 'public static String $NAME = "$VAL";'
- pattern: 'static final String $NAME = "$VAL";'
- pattern-regex: "$NAME =~ /(?i).*(api|key|token|secret|pass|auth|session|bearer|access|private).*/"

View File

@@ -0,0 +1,18 @@
rules:
- id: insecure-data-storage
severity: WARNING
languages: [java]
message: "Potential insecure data storage (external storage)."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M2
category: security
area: storage
verification-level: [L1]
paths:
include:
- "**/*.java"
pattern-either:
- pattern: "$CTX.openFileOutput($NAME, $MODE)"
- pattern: "Environment.getExternalStorageDirectory()"

View File

@@ -0,0 +1,16 @@
rules:
- id: insecure-deeplink
severity: WARNING
languages: [xml]
message: "Potential insecure deeplink found in intent-filter."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
category: component
area: manifest
verification-level: [L1]
paths:
include:
- "**/AndroidManifest.xml"
pattern: |
<intent-filter>

View File

@@ -0,0 +1,21 @@
rules:
- id: insecure-logging
severity: WARNING
languages: [java]
message: "Sensitive data logged via Android Log API."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M2
category: logging
verification-level: [L1]
paths:
include:
- "**/*.java"
patterns:
- pattern-either:
- pattern: "Log.d($TAG, $MSG)"
- pattern: "Log.e($TAG, $MSG)"
- pattern: "System.out.println($MSG)"
- pattern-regex: "$MSG =~ /(?i).*(password|token|secret|api|auth|session).*/"

View File

@@ -0,0 +1,15 @@
rules:
- id: intent-redirection
severity: WARNING
languages: [java]
message: "Potential intent redirection: using getIntent().getExtras() without validation."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
category: intent
area: intercomponent
verification-level: [L1]
paths:
include:
- "**/*.java"
pattern: "$ACT.getIntent().getExtras()"

View File

@@ -0,0 +1,18 @@
rules:
- id: sensitive-data-in-shared-preferences
severity: WARNING
languages: [java]
message: "Sensitive data may be stored in SharedPreferences. Please review the key '$KEY'."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M2
category: security
area: storage
verification-level: [L1]
paths:
include:
- "**/*.java"
patterns:
- pattern: "$EDITOR.putString($KEY, $VAL);"
- pattern-regex: "$KEY =~ /(?i).*(username|password|pass|token|auth_token|api_key|secret|sessionid|email).*/"

View File

@@ -0,0 +1,21 @@
rules:
- id: sqlite-injection
severity: ERROR
languages: [java]
message: "Possible SQL injection: concatenated input in rawQuery or execSQL."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M7
category: injection
area: database
verification-level: [L1]
paths:
include:
- "**/*.java"
patterns:
- pattern-either:
- pattern: "$DB.rawQuery($QUERY, ...)"
- pattern: "$DB.execSQL($QUERY)"
- pattern-regex: "$QUERY =~ /.*\".*\".*\\+.*/"

View File

@@ -0,0 +1,16 @@
rules:
- id: vulnerable-activity
severity: WARNING
languages: [xml]
message: "Activity exported without permission."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
category: component
area: manifest
verification-level: [L1]
paths:
include:
- "**/AndroidManifest.xml"
pattern: |
<activity android:exported="true"

View File

@@ -0,0 +1,16 @@
rules:
- id: vulnerable-content-provider
severity: WARNING
languages: [xml]
message: "ContentProvider exported without permission."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
category: component
area: manifest
verification-level: [L1]
paths:
include:
- "**/AndroidManifest.xml"
pattern: |
<provider android:exported="true"

View File

@@ -0,0 +1,16 @@
rules:
- id: vulnerable-service
severity: WARNING
languages: [xml]
message: "Service exported without permission."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
category: component
area: manifest
verification-level: [L1]
paths:
include:
- "**/AndroidManifest.xml"
pattern: |
<service android:exported="true"

View File

@@ -0,0 +1,16 @@
rules:
- id: webview-javascript-enabled
severity: ERROR
languages: [java]
message: "WebView with JavaScript enabled can be dangerous if loading untrusted content."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M7
category: webview
area: ui
verification-level: [L1]
paths:
include:
- "**/*.java"
pattern: "$W.getSettings().setJavaScriptEnabled(true)"

View File

@@ -0,0 +1,16 @@
rules:
- id: webview-load-arbitrary-url
severity: WARNING
languages: [java]
message: "Loading unvalidated URL in WebView may cause open redirect or XSS."
metadata:
authors:
- Guerric ELOI (FuzzingLabs)
owasp-mobile: M7
category: webview
area: ui
verification-level: [L1]
paths:
include:
- "**/*.java"
pattern: "$W.loadUrl($URL)"

View File

@@ -0,0 +1,197 @@
"""Jadx APK Decompilation Module"""
import asyncio
import shutil
from pathlib import Path
from typing import Dict, Any
import logging
from ..base import BaseModule, ModuleMetadata, ModuleResult
from . import register_module
logger = logging.getLogger(__name__)
@register_module
class JadxModule(BaseModule):
"""Module responsible for decompiling APK files with Jadx"""
def get_metadata(self) -> ModuleMetadata:
return ModuleMetadata(
name="jadx",
version="1.5.0",
description="Android APK decompilation using Jadx",
author="FuzzForge Team",
category="android",
tags=["android", "jadx", "decompilation", "reverse"],
input_schema={
"type": "object",
"properties": {
"apk_path": {
"type": "string",
"description": "Path to the APK to decompile (absolute or relative to workspace)",
},
"output_dir": {
"type": "string",
"description": "Directory (relative to workspace) where Jadx output should be written",
"default": "jadx_output",
},
"overwrite": {
"type": "boolean",
"description": "Overwrite existing output directory if present",
"default": True,
},
"threads": {
"type": "integer",
"description": "Number of Jadx decompilation threads",
"default": 4,
},
"decompiler_args": {
"type": "array",
"items": {"type": "string"},
"description": "Additional arguments passed directly to Jadx",
},
},
"required": ["apk_path"],
},
output_schema={
"type": "object",
"properties": {
"output_dir": {"type": "string"},
"source_dir": {"type": "string"},
"resource_dir": {"type": "string"},
},
},
)
def validate_config(self, config: Dict[str, Any]) -> bool:
apk_path = config.get("apk_path")
if not apk_path:
raise ValueError("'apk_path' must be provided for Jadx decompilation")
threads = config.get("threads", 4)
if not isinstance(threads, int) or threads < 1 or threads > 32:
raise ValueError("threads must be between 1 and 32")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
self.start_timer()
try:
self.validate_config(config)
workspace = workspace.resolve()
if not workspace.exists():
raise ValueError(f"Workspace does not exist: {workspace}")
apk_path = Path(config["apk_path"])
if not apk_path.is_absolute():
apk_path = (workspace / apk_path).resolve()
if not apk_path.exists():
raise ValueError(f"APK not found: {apk_path}")
if apk_path.is_dir():
raise ValueError(f"APK path must be a file, not a directory: {apk_path}")
output_dir = Path(config.get("output_dir", "jadx_output"))
if not output_dir.is_absolute():
output_dir = (workspace / output_dir).resolve()
if output_dir.exists():
if config.get("overwrite", True):
shutil.rmtree(output_dir)
else:
raise ValueError(
f"Output directory already exists: {output_dir}. Set overwrite=true to replace it."
)
output_dir.mkdir(parents=True, exist_ok=True)
threads = str(config.get("threads", 4))
extra_args = config.get("decompiler_args", []) or []
cmd = [
"jadx",
"--threads-count",
threads,
"--deobf",
"--output-dir",
str(output_dir),
]
cmd.extend(extra_args)
cmd.append(str(apk_path))
logger.info("Running Jadx decompilation: %s", " ".join(cmd))
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=str(workspace),
)
stdout, stderr = await process.communicate()
stdout_str = stdout.decode(errors="ignore") if stdout else ""
stderr_str = stderr.decode(errors="ignore") if stderr else ""
if stdout_str:
logger.debug("Jadx stdout: %s", stdout_str[:200])
if stderr_str:
logger.debug("Jadx stderr: %s", stderr_str[:200])
if process.returncode != 0:
error_output = stderr_str or stdout_str or "No error output"
raise RuntimeError(
f"Jadx failed with exit code {process.returncode}: {error_output[:500]}"
)
logger.debug("Jadx stdout: %s", stdout.decode(errors="ignore")[:200])
source_dir = output_dir / "sources"
resource_dir = output_dir / "resources"
if not source_dir.exists():
logger.warning("Jadx sources directory not found at expected path: %s", source_dir)
else:
sample_files = []
for idx, file_path in enumerate(source_dir.rglob("*.java")):
sample_files.append(str(file_path))
if idx >= 4:
break
logger.info("Sample Jadx Java files: %s", sample_files or "<none>")
java_files = 0
if source_dir.exists():
java_files = sum(1 for _ in source_dir.rglob("*.java"))
summary = {
"output_dir": str(output_dir),
"source_dir": str(source_dir if source_dir.exists() else output_dir),
"resource_dir": str(resource_dir if resource_dir.exists() else output_dir),
"java_files": java_files,
}
metadata = {
"apk_path": str(apk_path),
"output_dir": str(output_dir),
"source_dir": summary["source_dir"],
"resource_dir": summary["resource_dir"],
"threads": threads,
}
return self.create_result(
findings=[],
status="success",
summary=summary,
metadata=metadata,
)
except Exception as exc:
logger.error("Jadx module failed: %s", exc)
return self.create_result(
findings=[],
status="failed",
error=str(exc),
)

View File

@@ -0,0 +1,293 @@
from pathlib import Path
from typing import Dict, Any
from toolbox.modules.base import BaseModule, ModuleResult, ModuleMetadata, ModuleFinding
import requests
import os
import time
import json
from collections import Counter
"""
TODO:
* Configure workspace storage for apk and reports
* Think about mobsf repo implementation inside workflow
* Curl mobsf pdf report
* Save Json mobsf report
* Export Web server interface from the Workflow docker
"""
class MobSFModule(BaseModule):
def __init__(self):
self.mobsf_url = "http://localhost:8877"
self.file_path = ""
self.api_key = ""
self.scan_id = None
self.scan_hash = ""
self.report_file = ""
self._metadata = self.get_metadata()
self.start_timer() # <-- Add this line
def upload_file(self):
"""
Upload file to MobSF VM
Returns scan hash if upload succeeded
"""
# Ensure file_path is set and valid
if not self.file_path or not os.path.isfile(self.file_path):
raise ValueError("Invalid or missing file_path for upload.")
# Don't set Content-Type manually - let requests handle it
# MobSF expects API key in X-Mobsf-Api-Key header
headers = {'X-Mobsf-Api-Key': self.api_key}
# Keep the file open during the entire request
with open(self.file_path, 'rb') as f:
f.seek(0)
# Extract just the filename from the full path
filename = os.path.basename(self.file_path)
files = {'file': (filename, f, 'application/vnd.android.package-archive')}
# Make the request while the file is still open
response = requests.post(f"{self.mobsf_url}/api/v1/upload", files=files, headers=headers)
if response.status_code == 200:
resp_json = response.json()
if resp_json.get('hash'):
print("[+] Upload succeeded, scan hash:", resp_json['hash'])
return resp_json['hash']
else:
raise Exception(f"File upload failed: {resp_json}")
else:
raise Exception(f"Failed to upload file: {response.text}")
def start_scan(self, re_scan: int = 0, max_attempts: int = 10, delay: int = 3):
"""
Scan file that is already uploaded. Retries if scan is not ready.
Returns scan result or raises Exception after max_attempts.
"""
print("[+] Starting scan for hash", self.scan_hash)
data = {'hash': self.scan_hash}
headers = {'X-Mobsf-Api-Key': self.api_key}
response = requests.post(f"{self.mobsf_url}/api/v1/scan", data=data, headers=headers)
if response.status_code == 200:
try:
result = response.json()
# Heuristic: check for expected keys in result
if result:
print("[+] Scan succeeded for hash", self.scan_hash)
return result
except Exception as e:
print(f"Error parsing scan result: {e}")
def get_json_results(self):
"""
Retrieve JSON results for the scanned file
"""
headers = {'X-Mobsf-Api-Key': self.api_key}
data = {'hash': self.scan_hash}
response = requests.post(f"{self.mobsf_url}/api/v1/report_json", data=data, headers=headers)
if response.status_code == 200:
f = open('dump.json', 'w').write(json.dumps(response.json(), indent=2))
print("[+] Retrieved JSON results")
return response.json()
else:
raise Exception(f"Failed to retrieve JSON results: {response.text}")
def create_summary(self, findings):
"""
Summarize findings by severity.
Returns a dict like {'high': 3, 'info': 2, ...}
"""
severity_counter = Counter()
for finding in findings:
sev = getattr(finding, "severity", None)
if sev is None and isinstance(finding, dict):
sev = finding.get("severity")
if sev:
severity_counter[sev] += 1
res = dict(severity_counter)
print("Total Findings:", len(findings))
print("Severity counts:")
print(res)
return res
def parse_json_results(self):
if self.report_file=="" or not os.path.isfile(self.report_file):
raise ValueError("Invalid or missing report_file for parsing.")
f = open(self.report_file, 'r')
data = json.load(f)
findings = []
# Check specific sections
sections_to_parse = ['permissions', 'manifest_analysis', 'code_analysis', 'behaviour']
for section_name in sections_to_parse:
if section_name in data:
section = data[section_name]
#Permissions
if section_name == 'permissions':
for name, attrs in section.items():
findings.append(self.create_finding(
title=name,
description=attrs.get('description'),
severity=attrs.get('status'),
category="permission",
metadata={
'info': attrs.get('info'),
}
))
#Manifest Analysis
elif section_name == 'manifest_analysis':
findings_list = section.get('manifest_findings', [])
for x in findings_list:
findings.append(self.create_finding(
title=attrs.get('title') or attrs.get('name') or "unknown",
description=attrs.get('description', "No description"),
severity=attrs.get('severity', "unknown"),
category=section_name,
metadata={
'tag': attrs.get('rule')
}))
#Code Analysis
elif section_name == 'code_analysis':
findings_list = section.get('findings', [])
for name, attrs in findings_list.items():
metadata = attrs.get('metadata', {})
findings.append(self.create_finding(
title=name,
description=metadata.get('description'),
severity=metadata.get('severity'),
category="code_analysis",
metadata={
'cwe': metadata.get('cwe'),
'owasp': metadata.get('owasp'),
'files': attrs.get('file')
}))
#Behaviour
elif section_name == 'behaviour':
finding = []
for key, value in data['behaviour'].items():
metadata = value.get('metadata', {})
findings.append(self.create_finding(
title="behaviour_"+metadata.get('label')[0],
description=metadata.get('description'),
severity=metadata.get('severity'),
category="behaviour",
metadata={
'file': value.get('files', {})
}
))
return findings
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
findings = []
#Checking that mobsf server is reachable
self.mobsf_url = config.get("mobsf_url", "")
self.file_path = config.get("file_path", "")
# Get API key from config first, fallback to environment variable
self.api_key = config.get("api_key", "") or os.environ.get("MOBSF_API_KEY", "")
#Checking that the file to scan exists
file_path = config.get("file_path", None)
if not file_path or not os.path.isfile(file_path):
raise ValueError(f"Invalid or missing file_path in configuration: {file_path}")
try:
self.scan_hash = self.upload_file()
except Exception as e:
raise Exception(f"Failed to upload file to MobSF: {e}")
if self.scan_hash=="":
raise Exception("scan_hash not returned after upload.")
try:
scan_result = self.start_scan()
except Exception as e:
raise Exception(f"Failed to scan file in MobSF: {e}")
# Parse scan_result and convert to findings
# This is a placeholder; actual parsing logic will depend on MobSF's JSON structure
# Here we just create a dummy finding for illustration
try:
json_data = self.get_json_results()
except json.JSONDecodeError:
return self.create_result(
findings=[],
status="failed",
summary={"error": "Invalid JSON output from MOBSF"},
metadata={"engine": "mobsf", "file_scanned": file_path, "mobsf_url": root_uri}
)
self.report_file = 'dump.json'
findings = self.parse_json_results()
"""
findings.append(ModuleFinding(
title="MobSF Finding",
description="Finding generated by the MobSF module",
severity="medium",
category="mobsf",
metadata={"scan_result": scan_result}
))
"""
tmp_summary = self.create_summary(findings)
summary = {
"total_findings": len(findings),
"dangerous_severity": tmp_summary.get('dangerous', 0),
"warning_severity": tmp_summary.get('warning', 0),
"high_severity": tmp_summary.get('high', 0),
"medium_severity": tmp_summary.get('medium', 0),
"low_severity": tmp_summary.get('low', 0),
"info_severity": tmp_summary.get('info', 0),
}
metadata={"engine": "mobsf", "file_scanned": file_path, "mobsf_url": self.mobsf_url}#Add: "json_report": str(json_output_path
return self.create_result(findings=findings, status="success",summary=summary, metadata=metadata)
return ModuleResult(findings=findings, status="success",summary=summary, metadata=metadata)
def get_metadata(self) -> ModuleMetadata:
return ModuleMetadata(
name="Mobile Security Framework (MobSF)",
version="1.0.0",
description="Integrates MobSF for mobile app security scanning",
author="FuzzForge Team",
category="scanner",
tags=["mobsf", "mobile", "sast", "scanner"]
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""
Config pattern:
**config
findings: []
"tool_name": "FuzzForge Hello World",
"tool_version": "1.0.0",
"mobsf_uri": "(default: http://localhost:8000)",
"file_path": "(path to the APK or IPA file to scan)"
"""
if "mobsf_url" in config and not isinstance(config["mobsf_url"], str):
return False
# Check that mobsf_url does not render 404 when curling /
if "file_path" in config and not isinstance(config["file_path"], str):
return False
return True
if __name__ == "__main__":
import asyncio
module = MobSFModule()
config = {
"mobsf_url": "http://localhost:8877",
"file_path": "./toolbox/modules/android/beetlebug.apk",
}
workspace = Path("./toolbox/modules/android/")
result = asyncio.run(module.execute(config, workspace))
print(result)

View File

@@ -0,0 +1,411 @@
"""
OpenGrep Static Analysis Module
This module uses OpenGrep (open-source version of Semgrep) for pattern-based
static analysis across multiple programming languages.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import json
import tempfile
from pathlib import Path
from typing import Dict, Any, List
import subprocess
import logging
from ..base import BaseModule, ModuleMetadata, ModuleFinding, ModuleResult
from . import register_module
logger = logging.getLogger(__name__)
@register_module
class OpenGrepModule(BaseModule):
"""OpenGrep static analysis module"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="opengrep",
version="1.45.0",
description="Open-source pattern-based static analysis tool for security vulnerabilities",
author="FuzzForge Team",
category="static_analysis",
tags=["sast", "pattern-matching", "multi-language", "security"],
input_schema={
"type": "object",
"properties": {
"config": {
"type": "string",
"enum": ["auto", "p/security-audit", "p/owasp-top-ten", "p/cwe-top-25"],
"default": "auto",
"description": "Rule configuration to use"
},
"custom_rules_path": {
"type": "string",
"description": "Path to a directory containing custom OpenGrep rules"
},
"languages": {
"type": "array",
"items": {"type": "string"},
"description": "Specific languages to analyze"
},
"include_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "File patterns to include"
},
"exclude_patterns": {
"type": "array",
"items": {"type": "string"},
"description": "File patterns to exclude"
},
"max_target_bytes": {
"type": "integer",
"default": 1000000,
"description": "Maximum file size to analyze (bytes)"
},
"timeout": {
"type": "integer",
"default": 300,
"description": "Analysis timeout in seconds"
},
"severity": {
"type": "array",
"items": {"type": "string", "enum": ["ERROR", "WARNING", "INFO"]},
"default": ["ERROR", "WARNING", "INFO"],
"description": "Minimum severity levels to report"
},
"confidence": {
"type": "array",
"items": {"type": "string", "enum": ["HIGH", "MEDIUM", "LOW"]},
"default": ["HIGH", "MEDIUM", "LOW"],
"description": "Minimum confidence levels to report"
}
}
},
output_schema={
"type": "object",
"properties": {
"findings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"rule_id": {"type": "string"},
"severity": {"type": "string"},
"confidence": {"type": "string"},
"file_path": {"type": "string"},
"line_number": {"type": "integer"}
}
}
}
}
}
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate configuration"""
timeout = config.get("timeout", 300)
if not isinstance(timeout, int) or timeout < 30 or timeout > 3600:
raise ValueError("Timeout must be between 30 and 3600 seconds")
max_bytes = config.get("max_target_bytes", 1000000)
if not isinstance(max_bytes, int) or max_bytes < 1000 or max_bytes > 10000000:
raise ValueError("max_target_bytes must be between 1000 and 10000000")
custom_rules_path = config.get("custom_rules_path")
if custom_rules_path:
if not Path(custom_rules_path).is_dir():
raise ValueError(f"Custom rules path must be a valid directory: {custom_rules_path}")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""Execute OpenGrep static analysis"""
self.start_timer()
try:
# Validate inputs
self.validate_config(config)
self.validate_workspace(workspace)
logger.info(f"Running OpenGrep analysis on {workspace}")
# Build opengrep command
cmd = ["opengrep", "scan", "--json"]
# Add configuration
custom_rules_path = config.get("custom_rules_path")
use_custom_rules = False
if custom_rules_path:
cmd.extend(["--config", custom_rules_path])
use_custom_rules = True
else:
config_type = config.get("config", "auto")
if config_type == "auto":
cmd.extend(["--config", "auto"])
else:
cmd.extend(["--config", config_type])
# Add timeout
cmd.extend(["--timeout", str(config.get("timeout", 300))])
# Add max target bytes
cmd.extend(["--max-target-bytes", str(config.get("max_target_bytes", 1000000))])
# Add languages if specified (but NOT when using custom rules, as rules define their own languages)
if config.get("languages") and not use_custom_rules:
langs = ",".join(config["languages"])
cmd.extend(["--lang", langs])
# Add include patterns
if config.get("include_patterns"):
for pattern in config["include_patterns"]:
cmd.extend(["--include", pattern])
# Add exclude patterns
if config.get("exclude_patterns"):
for pattern in config["exclude_patterns"]:
cmd.extend(["--exclude", pattern])
# Add severity filter only if a single level is requested.
severity_levels = config.get("severity", ["ERROR", "WARNING", "INFO"])
if severity_levels and len(severity_levels) == 1:
cmd.extend(["--severity", severity_levels[0]])
# Add confidence filter (if supported in this version)
confidence_levels = config.get("confidence", ["HIGH", "MEDIUM"])
if confidence_levels and len(confidence_levels) < 3: # Only if not all levels
# Note: confidence filtering might need to be done post-processing
pass
# Disable metrics collection
cmd.append("--disable-version-check")
cmd.append("--no-git-ignore")
# Add target directory
cmd.append(str(workspace))
logger.debug(f"Running command: {' '.join(cmd)}")
# Run OpenGrep
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=workspace
)
stdout, stderr = await process.communicate()
# Parse results
findings = []
if process.returncode in [0, 1]: # 0 = no findings, 1 = findings found
findings = self._parse_opengrep_output(stdout.decode(), workspace, config)
else:
error_msg = stderr.decode()
logger.error(f"OpenGrep failed: {error_msg}")
return self.create_result(
findings=[],
status="failed",
error=f"OpenGrep execution failed: {error_msg}"
)
# Create summary
summary = self._create_summary(findings)
logger.info(f"OpenGrep found {len(findings)} potential issues")
return self.create_result(
findings=findings,
status="success",
summary=summary
)
except Exception as e:
logger.error(f"OpenGrep module failed: {e}")
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
def _parse_opengrep_output(self, output: str, workspace: Path, config: Dict[str, Any]) -> List[ModuleFinding]:
"""Parse OpenGrep JSON output into findings"""
findings = []
if not output.strip():
return findings
try:
data = json.loads(output)
results = data.get("results", [])
logger.debug(f"OpenGrep returned {len(results)} raw results")
# Get filtering criteria
allowed_severities = set(config.get("severity", ["ERROR", "WARNING", "INFO"]))
allowed_confidences = set(config.get("confidence", ["HIGH", "MEDIUM", "LOW"]))
for result in results:
# Extract basic info
rule_id = result.get("check_id", "unknown")
message = result.get("message", "")
extra = result.get("extra", {})
severity = extra.get("severity", "INFO").upper()
# File location info
path_info = result.get("path", "")
start_line = result.get("start", {}).get("line", 0)
end_line = result.get("end", {}).get("line", 0)
start_col = result.get("start", {}).get("col", 0)
end_col = result.get("end", {}).get("col", 0)
# Code snippet
lines = extra.get("lines", "")
# Metadata
rule_metadata = extra.get("metadata", {})
cwe = rule_metadata.get("cwe", [])
owasp = rule_metadata.get("owasp", [])
confidence = extra.get("confidence", rule_metadata.get("confidence", "MEDIUM")).upper()
# Apply severity filter
if severity not in allowed_severities:
continue
# Apply confidence filter
if confidence not in allowed_confidences:
continue
# Make file path relative to workspace
if path_info:
try:
rel_path = Path(path_info).relative_to(workspace)
path_info = str(rel_path)
except ValueError:
pass
# Map severity to our standard levels
finding_severity = self._map_severity(severity)
# Create finding
finding = self.create_finding(
title=f"Security issue: {rule_id}",
description=message or f"OpenGrep rule {rule_id} triggered",
severity=finding_severity,
category=self._get_category(rule_id, extra),
file_path=path_info if path_info else None,
line_start=start_line if start_line > 0 else None,
line_end=end_line if end_line > 0 and end_line != start_line else None,
code_snippet=lines.strip() if lines else None,
recommendation=self._get_recommendation(rule_id, extra),
metadata={
"rule_id": rule_id,
"opengrep_severity": severity,
"confidence": confidence,
"cwe": cwe,
"owasp": owasp,
"fix": extra.get("fix", ""),
"impact": extra.get("impact", ""),
"likelihood": extra.get("likelihood", ""),
"references": extra.get("references", [])
}
)
findings.append(finding)
except json.JSONDecodeError as e:
logger.warning(f"Failed to parse OpenGrep output: {e}. Output snippet: {output[:200]}...")
except Exception as e:
logger.warning(f"Error processing OpenGrep results: {e}")
return findings
def _map_severity(self, opengrep_severity: str) -> str:
"""Map OpenGrep severity to our standard severity levels"""
severity_map = {
"ERROR": "high",
"WARNING": "medium",
"INFO": "low"
}
return severity_map.get(opengrep_severity.upper(), "medium")
def _get_category(self, rule_id: str, extra: Dict[str, Any]) -> str:
"""Determine finding category based on rule and metadata"""
rule_metadata = extra.get("metadata", {})
cwe_list = rule_metadata.get("cwe", [])
owasp_list = rule_metadata.get("owasp", [])
# Check for common security categories
if any("injection" in rule_id.lower() for x in [rule_id]):
return "injection"
elif any("xss" in rule_id.lower() for x in [rule_id]):
return "xss"
elif any("csrf" in rule_id.lower() for x in [rule_id]):
return "csrf"
elif any("auth" in rule_id.lower() for x in [rule_id]):
return "authentication"
elif any("crypto" in rule_id.lower() for x in [rule_id]):
return "cryptography"
elif cwe_list:
return f"cwe-{cwe_list[0]}"
elif owasp_list:
return f"owasp-{owasp_list[0].replace(' ', '-').lower()}"
else:
return "security"
def _get_recommendation(self, rule_id: str, extra: Dict[str, Any]) -> str:
"""Generate recommendation based on rule and metadata"""
fix_suggestion = extra.get("fix", "")
if fix_suggestion:
return fix_suggestion
# Generic recommendations based on rule type
if "injection" in rule_id.lower():
return "Use parameterized queries or prepared statements to prevent injection attacks."
elif "xss" in rule_id.lower():
return "Properly encode/escape user input before displaying it in web pages."
elif "crypto" in rule_id.lower():
return "Use cryptographically secure algorithms and proper key management."
elif "hardcode" in rule_id.lower():
return "Remove hardcoded secrets and use secure configuration management."
else:
return "Review this security issue and apply appropriate fixes based on your security requirements."
def _create_summary(self, findings: List[ModuleFinding]) -> Dict[str, Any]:
"""Create analysis summary"""
severity_counts = {"critical": 0, "high": 0, "medium": 0, "low": 0}
category_counts = {}
rule_counts = {}
for finding in findings:
# Count by severity
severity_counts[finding.severity] += 1
# Count by category
category = finding.category
category_counts[category] = category_counts.get(category, 0) + 1
# Count by rule
rule_id = finding.metadata.get("rule_id", "unknown")
rule_counts[rule_id] = rule_counts.get(rule_id, 0) + 1
return {
"total_findings": len(findings),
"severity_counts": severity_counts,
"category_counts": category_counts,
"top_rules": dict(sorted(rule_counts.items(), key=lambda x: x[1], reverse=True)[:10]),
"files_analyzed": len(set(f.file_path for f in findings if f.file_path))
}

View File

@@ -0,0 +1,272 @@
"""
Base module interface for all FuzzForge modules
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Dict, Any, List, Optional
from pydantic import BaseModel, Field
from datetime import datetime
import logging
logger = logging.getLogger(__name__)
class ModuleMetadata(BaseModel):
"""Metadata describing a module's capabilities and requirements"""
name: str = Field(..., description="Module name")
version: str = Field(..., description="Module version")
description: str = Field(..., description="Module description")
author: Optional[str] = Field(None, description="Module author")
category: str = Field(..., description="Module category (scanner, analyzer, reporter, etc.)")
tags: List[str] = Field(default_factory=list, description="Module tags")
input_schema: Dict[str, Any] = Field(default_factory=dict, description="Expected input schema")
output_schema: Dict[str, Any] = Field(default_factory=dict, description="Output schema")
requires_workspace: bool = Field(True, description="Whether module requires workspace access")
class ModuleFinding(BaseModel):
"""Individual finding from a module"""
id: str = Field(..., description="Unique finding ID")
title: str = Field(..., description="Finding title")
description: str = Field(..., description="Detailed description")
severity: str = Field(..., description="Severity level (info, low, medium, high, critical)")
category: str = Field(..., description="Finding category")
file_path: Optional[str] = Field(None, description="Affected file path relative to workspace")
line_start: Optional[int] = Field(None, description="Starting line number")
line_end: Optional[int] = Field(None, description="Ending line number")
code_snippet: Optional[str] = Field(None, description="Relevant code snippet")
recommendation: Optional[str] = Field(None, description="Remediation recommendation")
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
class ModuleResult(BaseModel):
"""Standard result format from module execution"""
module: str = Field(..., description="Module name")
version: str = Field(..., description="Module version")
status: str = Field(default="success", description="Execution status (success, partial, failed)")
execution_time: float = Field(..., description="Execution time in seconds")
findings: List[ModuleFinding] = Field(default_factory=list, description="List of findings")
summary: Dict[str, Any] = Field(default_factory=dict, description="Summary statistics")
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
error: Optional[str] = Field(None, description="Error message if failed")
sarif: Optional[Dict[str, Any]] = Field(None, description="SARIF report if generated by reporter module")
class BaseModule(ABC):
"""
Base interface for all security testing modules.
All modules must inherit from this class and implement the required methods.
Modules are designed to be stateless and reusable across different workflows.
"""
def __init__(self):
"""Initialize the module"""
self._metadata = self.get_metadata()
self._start_time = None
logger.info(f"Initialized module: {self._metadata.name} v{self._metadata.version}")
@abstractmethod
def get_metadata(self) -> ModuleMetadata:
"""
Get module metadata.
Returns:
ModuleMetadata object describing the module
"""
pass
@abstractmethod
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""
Execute the module with given configuration and workspace.
Args:
config: Module-specific configuration parameters
workspace: Path to the mounted workspace directory
Returns:
ModuleResult containing findings and metadata
"""
pass
@abstractmethod
def validate_config(self, config: Dict[str, Any]) -> bool:
"""
Validate the provided configuration against module requirements.
Args:
config: Configuration to validate
Returns:
True if configuration is valid, False otherwise
Raises:
ValueError: If configuration is invalid with details
"""
pass
def validate_workspace(self, workspace: Path) -> bool:
"""
Validate that the workspace exists and is accessible.
Args:
workspace: Path to the workspace
Returns:
True if workspace is valid
Raises:
ValueError: If workspace is invalid
"""
if not workspace.exists():
raise ValueError(f"Workspace does not exist: {workspace}")
if not workspace.is_dir():
raise ValueError(f"Workspace is not a directory: {workspace}")
return True
def create_finding(
self,
title: str,
description: str,
severity: str,
category: str,
**kwargs
) -> ModuleFinding:
"""
Helper method to create a standardized finding.
Args:
title: Finding title
description: Detailed description
severity: Severity level
category: Finding category
**kwargs: Additional finding fields
Returns:
ModuleFinding object
"""
import uuid
finding_id = str(uuid.uuid4())
return ModuleFinding(
id=finding_id,
title=title,
description=description,
severity=severity,
category=category,
**kwargs
)
def start_timer(self):
"""Start the execution timer"""
from time import time
self._start_time = time()
def get_execution_time(self) -> float:
"""Get the execution time in seconds"""
from time import time
if self._start_time is None:
return 0.0
return time() - self._start_time
def create_result(
self,
findings: List[ModuleFinding],
status: str = "success",
summary: Dict[str, Any] = None,
metadata: Dict[str, Any] = None,
error: str = None
) -> ModuleResult:
"""
Helper method to create a module result.
Args:
findings: List of findings
status: Execution status
summary: Summary statistics
metadata: Additional metadata
error: Error message if failed
Returns:
ModuleResult object
"""
return ModuleResult(
module=self._metadata.name,
version=self._metadata.version,
status=status,
execution_time=self.get_execution_time(),
findings=findings,
summary=summary or self._generate_summary(findings),
metadata=metadata or {},
error=error
)
def _generate_summary(self, findings: List[ModuleFinding]) -> Dict[str, Any]:
"""
Generate summary statistics from findings.
Args:
findings: List of findings
Returns:
Summary dictionary
"""
severity_counts = {
"info": 0,
"low": 0,
"medium": 0,
"high": 0,
"critical": 0
}
category_counts = {}
for finding in findings:
# Count by severity
if finding.severity in severity_counts:
severity_counts[finding.severity] += 1
# Count by category
if finding.category not in category_counts:
category_counts[finding.category] = 0
category_counts[finding.category] += 1
return {
"total_findings": len(findings),
"severity_counts": severity_counts,
"category_counts": category_counts,
"highest_severity": self._get_highest_severity(findings)
}
def _get_highest_severity(self, findings: List[ModuleFinding]) -> str:
"""
Get the highest severity from findings.
Args:
findings: List of findings
Returns:
Highest severity level
"""
severity_order = ["critical", "high", "medium", "low", "info"]
for severity in severity_order:
if any(f.severity == severity for f in findings):
return severity
return "none"

View File

@@ -0,0 +1,14 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from .sarif_reporter import SARIFReporter
__all__ = ["SARIFReporter"]

View File

@@ -0,0 +1,401 @@
"""
SARIF Reporter Module - Generates SARIF-formatted security reports
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
from pathlib import Path
from typing import Dict, Any, List
from datetime import datetime
import json
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
except ImportError:
try:
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
except ImportError:
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
class SARIFReporter(BaseModule):
"""
Generates SARIF (Static Analysis Results Interchange Format) reports.
This module:
- Converts findings to SARIF format
- Aggregates results from multiple modules
- Adds metadata and context
- Provides actionable recommendations
"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="sarif_reporter",
version="1.0.0",
description="Generates SARIF-formatted security reports",
author="FuzzForge Team",
category="reporter",
tags=["reporting", "sarif", "output"],
input_schema={
"findings": {
"type": "array",
"description": "List of findings to report",
"required": True
},
"tool_name": {
"type": "string",
"description": "Name of the tool",
"default": "FuzzForge Security Assessment"
},
"tool_version": {
"type": "string",
"description": "Tool version",
"default": "1.0.0"
},
"include_code_flows": {
"type": "boolean",
"description": "Include code flow information",
"default": False
}
},
output_schema={
"sarif": {
"type": "object",
"description": "SARIF 2.1.0 formatted report"
}
},
requires_workspace=False # Reporter doesn't need direct workspace access
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate module configuration"""
if "findings" not in config and "modules_results" not in config:
raise ValueError("Either 'findings' or 'modules_results' must be provided")
return True
async def execute(self, config: Dict[str, Any], workspace: Path = None) -> ModuleResult:
"""
Execute the SARIF reporter module.
Args:
config: Module configuration with findings
workspace: Optional workspace path for context
Returns:
ModuleResult with SARIF report
"""
self.start_timer()
self.validate_config(config)
# Get configuration
tool_name = config.get("tool_name", "FuzzForge Security Assessment")
tool_version = config.get("tool_version", "1.0.0")
include_code_flows = config.get("include_code_flows", False)
# Collect findings from either direct findings or module results
all_findings = []
if "findings" in config:
# Direct findings provided
all_findings = config["findings"]
if isinstance(all_findings, list) and all(isinstance(f, dict) for f in all_findings):
# Convert dict findings to ModuleFinding objects
all_findings = [ModuleFinding(**f) if isinstance(f, dict) else f for f in all_findings]
elif "modules_results" in config:
# Aggregate from module results
for module_result in config["modules_results"]:
if isinstance(module_result, dict):
findings = module_result.get("findings", [])
all_findings.extend(findings)
elif hasattr(module_result, "findings"):
all_findings.extend(module_result.findings)
logger.info(f"Generating SARIF report for {len(all_findings)} findings")
try:
# Generate SARIF report
sarif_report = self._generate_sarif(
findings=all_findings,
tool_name=tool_name,
tool_version=tool_version,
include_code_flows=include_code_flows,
workspace_path=str(workspace) if workspace else None
)
# Create summary
summary = self._generate_report_summary(all_findings)
return ModuleResult(
module=self.get_metadata().name,
version=self.get_metadata().version,
status="success",
execution_time=self.get_execution_time(),
findings=[], # Reporter doesn't generate new findings
summary=summary,
metadata={
"tool_name": tool_name,
"tool_version": tool_version,
"report_format": "SARIF 2.1.0",
"total_findings": len(all_findings)
},
error=None,
sarif=sarif_report # Add SARIF as custom field
)
except Exception as e:
logger.error(f"SARIF reporter failed: {e}")
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
def _generate_sarif(
self,
findings: List[ModuleFinding],
tool_name: str,
tool_version: str,
include_code_flows: bool,
workspace_path: str = None
) -> Dict[str, Any]:
"""
Generate SARIF 2.1.0 formatted report.
Args:
findings: List of findings to report
tool_name: Name of the tool
tool_version: Tool version
include_code_flows: Whether to include code flow information
workspace_path: Optional workspace path
Returns:
SARIF formatted dictionary
"""
# Create rules from unique finding types
rules = self._create_rules(findings)
# Create results from findings
results = self._create_results(findings, include_code_flows)
# Build SARIF structure
sarif = {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": tool_name,
"version": tool_version,
"informationUri": "https://fuzzforge.io",
"rules": rules
}
},
"results": results,
"invocations": [
{
"executionSuccessful": True,
"endTimeUtc": datetime.utcnow().isoformat() + "Z"
}
]
}
]
}
# Add workspace information if available
if workspace_path:
sarif["runs"][0]["originalUriBaseIds"] = {
"WORKSPACE": {
"uri": f"file://{workspace_path}/",
"description": "The workspace root directory"
}
}
return sarif
def _create_rules(self, findings: List[ModuleFinding]) -> List[Dict[str, Any]]:
"""
Create SARIF rules from findings.
Args:
findings: List of findings
Returns:
List of SARIF rule objects
"""
rules_dict = {}
for finding in findings:
rule_id = f"{finding.category}_{finding.severity}"
if rule_id not in rules_dict:
rules_dict[rule_id] = {
"id": rule_id,
"name": finding.category.replace("_", " ").title(),
"shortDescription": {
"text": f"{finding.category} vulnerability"
},
"fullDescription": {
"text": f"Detection rule for {finding.category} vulnerabilities with {finding.severity} severity"
},
"defaultConfiguration": {
"level": self._severity_to_sarif_level(finding.severity)
},
"properties": {
"category": finding.category,
"severity": finding.severity,
"tags": ["security", finding.category, finding.severity]
}
}
return list(rules_dict.values())
def _create_results(
self, findings: List[ModuleFinding], include_code_flows: bool
) -> List[Dict[str, Any]]:
"""
Create SARIF results from findings.
Args:
findings: List of findings
include_code_flows: Whether to include code flows
Returns:
List of SARIF result objects
"""
results = []
for finding in findings:
result = {
"ruleId": f"{finding.category}_{finding.severity}",
"level": self._severity_to_sarif_level(finding.severity),
"message": {
"text": finding.description
},
"locations": []
}
# Add location information if available
if finding.file_path:
location = {
"physicalLocation": {
"artifactLocation": {
"uri": finding.file_path,
"uriBaseId": "WORKSPACE"
}
}
}
# Add line information if available
if finding.line_start:
location["physicalLocation"]["region"] = {
"startLine": finding.line_start
}
if finding.line_end:
location["physicalLocation"]["region"]["endLine"] = finding.line_end
# Add code snippet if available
if finding.code_snippet:
location["physicalLocation"]["region"]["snippet"] = {
"text": finding.code_snippet
}
result["locations"].append(location)
# Add fix suggestions if available
if finding.recommendation:
result["fixes"] = [
{
"description": {
"text": finding.recommendation
}
}
]
# Add properties
result["properties"] = {
"findingId": finding.id,
"title": finding.title,
"metadata": finding.metadata
}
results.append(result)
return results
def _severity_to_sarif_level(self, severity: str) -> str:
"""
Convert severity to SARIF level.
Args:
severity: Finding severity
Returns:
SARIF level string
"""
mapping = {
"critical": "error",
"high": "error",
"medium": "warning",
"low": "note",
"info": "none"
}
return mapping.get(severity.lower(), "warning")
def _generate_report_summary(self, findings: List[ModuleFinding]) -> Dict[str, Any]:
"""
Generate summary statistics for the report.
Args:
findings: List of findings
Returns:
Summary dictionary
"""
severity_counts = {
"critical": 0,
"high": 0,
"medium": 0,
"low": 0,
"info": 0
}
category_counts = {}
affected_files = set()
for finding in findings:
# Count by severity
if finding.severity in severity_counts:
severity_counts[finding.severity] += 1
# Count by category
if finding.category not in category_counts:
category_counts[finding.category] = 0
category_counts[finding.category] += 1
# Track affected files
if finding.file_path:
affected_files.add(finding.file_path)
return {
"total_findings": len(findings),
"severity_distribution": severity_counts,
"category_distribution": category_counts,
"affected_files": len(affected_files),
"report_format": "SARIF 2.1.0",
"generated_at": datetime.utcnow().isoformat()
}

View File

@@ -0,0 +1,14 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from .file_scanner import FileScanner
__all__ = ["FileScanner"]

View File

@@ -0,0 +1,315 @@
"""
File Scanner Module - Scans and enumerates files in the workspace
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import logging
import mimetypes
from pathlib import Path
from typing import Dict, Any, List
import hashlib
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
except ImportError:
try:
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
except ImportError:
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
class FileScanner(BaseModule):
"""
Scans files in the mounted workspace and collects information.
This module:
- Enumerates files based on patterns
- Detects file types
- Calculates file hashes
- Identifies potentially sensitive files
"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="file_scanner",
version="1.0.0",
description="Scans and enumerates files in the workspace",
author="FuzzForge Team",
category="scanner",
tags=["files", "enumeration", "discovery"],
input_schema={
"patterns": {
"type": "array",
"items": {"type": "string"},
"description": "File patterns to scan (e.g., ['*.py', '*.js'])",
"default": ["*"]
},
"max_file_size": {
"type": "integer",
"description": "Maximum file size to scan in bytes",
"default": 10485760 # 10MB
},
"check_sensitive": {
"type": "boolean",
"description": "Check for sensitive file patterns",
"default": True
},
"calculate_hashes": {
"type": "boolean",
"description": "Calculate SHA256 hashes for files",
"default": False
}
},
output_schema={
"findings": {
"type": "array",
"description": "List of discovered files with metadata"
}
},
requires_workspace=True
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate module configuration"""
patterns = config.get("patterns", ["*"])
if not isinstance(patterns, list):
raise ValueError("patterns must be a list")
max_size = config.get("max_file_size", 10485760)
if not isinstance(max_size, int) or max_size <= 0:
raise ValueError("max_file_size must be a positive integer")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""
Execute the file scanning module.
Args:
config: Module configuration
workspace: Path to the workspace directory
Returns:
ModuleResult with file findings
"""
self.start_timer()
self.validate_workspace(workspace)
self.validate_config(config)
findings = []
file_count = 0
total_size = 0
file_types = {}
# Get configuration
patterns = config.get("patterns", ["*"])
max_file_size = config.get("max_file_size", 10485760)
check_sensitive = config.get("check_sensitive", True)
calculate_hashes = config.get("calculate_hashes", False)
logger.info(f"Scanning workspace with patterns: {patterns}")
try:
# Scan for each pattern
for pattern in patterns:
for file_path in workspace.rglob(pattern):
if not file_path.is_file():
continue
file_count += 1
relative_path = file_path.relative_to(workspace)
# Get file stats
try:
stats = file_path.stat()
file_size = stats.st_size
total_size += file_size
# Skip large files
if file_size > max_file_size:
logger.warning(f"Skipping large file: {relative_path} ({file_size} bytes)")
continue
# Detect file type
file_type = self._detect_file_type(file_path)
if file_type not in file_types:
file_types[file_type] = 0
file_types[file_type] += 1
# Check for sensitive files
if check_sensitive and self._is_sensitive_file(file_path):
findings.append(self.create_finding(
title=f"Potentially sensitive file: {relative_path.name}",
description=f"Found potentially sensitive file at {relative_path}",
severity="medium",
category="sensitive_file",
file_path=str(relative_path),
metadata={
"file_size": file_size,
"file_type": file_type
}
))
# Calculate hash if requested
file_hash = None
if calculate_hashes and file_size < 1048576: # Only hash files < 1MB
file_hash = self._calculate_hash(file_path)
# Create informational finding for each file
findings.append(self.create_finding(
title=f"File discovered: {relative_path.name}",
description=f"File: {relative_path}",
severity="info",
category="file_enumeration",
file_path=str(relative_path),
metadata={
"file_size": file_size,
"file_type": file_type,
"file_hash": file_hash
}
))
except Exception as e:
logger.error(f"Error processing file {relative_path}: {e}")
# Create summary
summary = {
"total_files": file_count,
"total_size_bytes": total_size,
"file_types": file_types,
"patterns_scanned": patterns
}
return self.create_result(
findings=findings,
status="success",
summary=summary,
metadata={
"workspace": str(workspace),
"config": config
}
)
except Exception as e:
logger.error(f"File scanner failed: {e}")
return self.create_result(
findings=findings,
status="failed",
error=str(e)
)
def _detect_file_type(self, file_path: Path) -> str:
"""
Detect the type of a file.
Args:
file_path: Path to the file
Returns:
File type string
"""
# Try to determine from extension
mime_type, _ = mimetypes.guess_type(str(file_path))
if mime_type:
return mime_type
# Check by extension
ext = file_path.suffix.lower()
type_map = {
'.py': 'text/x-python',
'.js': 'application/javascript',
'.java': 'text/x-java',
'.cpp': 'text/x-c++',
'.c': 'text/x-c',
'.go': 'text/x-go',
'.rs': 'text/x-rust',
'.rb': 'text/x-ruby',
'.php': 'text/x-php',
'.yaml': 'text/yaml',
'.yml': 'text/yaml',
'.json': 'application/json',
'.xml': 'text/xml',
'.md': 'text/markdown',
'.txt': 'text/plain',
'.sh': 'text/x-shellscript',
'.bat': 'text/x-batch',
'.ps1': 'text/x-powershell'
}
return type_map.get(ext, 'application/octet-stream')
def _is_sensitive_file(self, file_path: Path) -> bool:
"""
Check if a file might contain sensitive information.
Args:
file_path: Path to the file
Returns:
True if potentially sensitive
"""
sensitive_patterns = [
'.env',
'.env.local',
'.env.production',
'credentials',
'password',
'secret',
'private_key',
'id_rsa',
'id_dsa',
'.pem',
'.key',
'.pfx',
'.p12',
'wallet',
'.ssh',
'token',
'api_key',
'config.json',
'settings.json',
'.git-credentials',
'.npmrc',
'.pypirc',
'.docker/config.json'
]
file_name_lower = file_path.name.lower()
for pattern in sensitive_patterns:
if pattern in file_name_lower:
return True
return False
def _calculate_hash(self, file_path: Path) -> str:
"""
Calculate SHA256 hash of a file.
Args:
file_path: Path to the file
Returns:
Hex string of SHA256 hash
"""
try:
sha256_hash = hashlib.sha256()
with open(file_path, "rb") as f:
for byte_block in iter(lambda: f.read(4096), b""):
sha256_hash.update(byte_block)
return sha256_hash.hexdigest()
except Exception as e:
logger.error(f"Failed to calculate hash for {file_path}: {e}")
return None

View File

@@ -0,0 +1,36 @@
"""
Secret Detection Modules
This package contains modules for detecting secrets, credentials, and sensitive information
in codebases and repositories.
Available modules:
- TruffleHog: Comprehensive secret detection with verification
- Gitleaks: Git-specific secret scanning and leak detection
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from typing import List, Type
from ..base import BaseModule
# Module registry for automatic discovery
SECRET_DETECTION_MODULES: List[Type[BaseModule]] = []
def register_module(module_class: Type[BaseModule]):
"""Register a secret detection module"""
SECRET_DETECTION_MODULES.append(module_class)
return module_class
def get_available_modules() -> List[Type[BaseModule]]:
"""Get all available secret detection modules"""
return SECRET_DETECTION_MODULES.copy()

View File

@@ -0,0 +1,351 @@
"""
Gitleaks Secret Detection Module
This module uses Gitleaks to detect secrets and sensitive information in Git repositories
and file systems.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import json
from pathlib import Path
from typing import Dict, Any, List
import subprocess
import logging
from ..base import BaseModule, ModuleMetadata, ModuleFinding, ModuleResult
from . import register_module
logger = logging.getLogger(__name__)
@register_module
class GitleaksModule(BaseModule):
"""Gitleaks secret detection module"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="gitleaks",
version="8.18.0",
description="Git-specific secret scanning and leak detection using Gitleaks",
author="FuzzForge Team",
category="secret_detection",
tags=["secrets", "git", "leak-detection", "credentials"],
input_schema={
"type": "object",
"properties": {
"scan_mode": {
"type": "string",
"enum": ["detect", "protect"],
"default": "detect",
"description": "Scan mode: detect (entire repo history) or protect (staged changes)"
},
"config_file": {
"type": "string",
"description": "Path to custom Gitleaks configuration file"
},
"baseline_file": {
"type": "string",
"description": "Path to baseline file to ignore known findings"
},
"max_target_megabytes": {
"type": "integer",
"default": 100,
"description": "Maximum size of files to scan (in MB)"
},
"redact": {
"type": "boolean",
"default": True,
"description": "Redact secrets in output"
},
"no_git": {
"type": "boolean",
"default": False,
"description": "Scan files without Git context"
}
}
},
output_schema={
"type": "object",
"properties": {
"findings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"rule_id": {"type": "string"},
"category": {"type": "string"},
"file_path": {"type": "string"},
"line_number": {"type": "integer"},
"secret": {"type": "string"}
}
}
}
}
}
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate configuration"""
scan_mode = config.get("scan_mode", "detect")
if scan_mode not in ["detect", "protect"]:
raise ValueError("scan_mode must be 'detect' or 'protect'")
max_size = config.get("max_target_megabytes", 100)
if not isinstance(max_size, int) or max_size < 1 or max_size > 1000:
raise ValueError("max_target_megabytes must be between 1 and 1000")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""Execute Gitleaks secret detection"""
self.start_timer()
try:
# Validate inputs
self.validate_config(config)
self.validate_workspace(workspace)
logger.info(f"Running Gitleaks on {workspace}")
# Build Gitleaks command
scan_mode = config.get("scan_mode", "detect")
cmd = ["gitleaks", scan_mode]
# Add source path
cmd.extend(["--source", str(workspace)])
# Create temp file for JSON output
import tempfile
output_file = tempfile.NamedTemporaryFile(mode='w+', suffix='.json', delete=False)
output_path = output_file.name
output_file.close()
# Add report format and output file
cmd.extend(["--report-format", "json"])
cmd.extend(["--report-path", output_path])
# Add redact option
if config.get("redact", True):
cmd.append("--redact")
# Add max target size
max_size = config.get("max_target_megabytes", 100)
cmd.extend(["--max-target-megabytes", str(max_size)])
# Add config file if specified
if config.get("config_file"):
config_path = Path(config["config_file"])
if config_path.exists():
cmd.extend(["--config", str(config_path)])
# Add baseline file if specified
if config.get("baseline_file"):
baseline_path = Path(config["baseline_file"])
if baseline_path.exists():
cmd.extend(["--baseline-path", str(baseline_path)])
# Add no-git flag if specified
if config.get("no_git", False):
cmd.append("--no-git")
# Add verbose output
cmd.append("--verbose")
logger.debug(f"Running command: {' '.join(cmd)}")
# Run Gitleaks
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=workspace
)
stdout, stderr = await process.communicate()
# Parse results
findings = []
try:
# Read the JSON output from file
with open(output_path, 'r') as f:
output_content = f.read()
if process.returncode == 0:
# No secrets found
logger.info("No secrets detected by Gitleaks")
elif process.returncode == 1:
# Secrets found - parse from file content
findings = self._parse_gitleaks_output(output_content, workspace)
else:
# Error occurred
error_msg = stderr.decode()
logger.error(f"Gitleaks failed: {error_msg}")
return self.create_result(
findings=[],
status="failed",
error=f"Gitleaks execution failed: {error_msg}"
)
finally:
# Clean up temp file
import os
try:
os.unlink(output_path)
except:
pass
# Create summary
summary = {
"total_leaks": len(findings),
"unique_rules": len(set(f.metadata.get("rule_id", "") for f in findings)),
"files_with_leaks": len(set(f.file_path for f in findings if f.file_path)),
"scan_mode": scan_mode
}
logger.info(f"Gitleaks found {len(findings)} potential leaks")
return self.create_result(
findings=findings,
status="success",
summary=summary
)
except Exception as e:
logger.error(f"Gitleaks module failed: {e}")
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
def _parse_gitleaks_output(self, output: str, workspace: Path) -> List[ModuleFinding]:
"""Parse Gitleaks JSON output into findings"""
findings = []
if not output.strip():
return findings
try:
# Gitleaks outputs JSON array
results = json.loads(output)
if not isinstance(results, list):
logger.warning("Unexpected Gitleaks output format")
return findings
for result in results:
# Extract information
rule_id = result.get("RuleID", "unknown")
description = result.get("Description", "")
file_path = result.get("File", "")
line_number = result.get("LineNumber", 0)
secret = result.get("Secret", "")
match_text = result.get("Match", "")
# Commit info (if available)
commit = result.get("Commit", "")
author = result.get("Author", "")
email = result.get("Email", "")
date = result.get("Date", "")
# Make file path relative to workspace
if file_path:
try:
rel_path = Path(file_path).relative_to(workspace)
file_path = str(rel_path)
except ValueError:
# If file is outside workspace, keep absolute path
pass
# Determine severity based on rule type
severity = self._get_leak_severity(rule_id, description)
# Create finding
finding = self.create_finding(
title=f"Secret leak detected: {rule_id}",
description=self._get_leak_description(rule_id, description, commit),
severity=severity,
category="secret_leak",
file_path=file_path if file_path else None,
line_start=line_number if line_number > 0 else None,
code_snippet=match_text if match_text else secret,
recommendation=self._get_leak_recommendation(rule_id),
metadata={
"rule_id": rule_id,
"secret_type": description,
"commit": commit,
"author": author,
"email": email,
"date": date,
"entropy": result.get("Entropy", 0),
"fingerprint": result.get("Fingerprint", "")
}
)
findings.append(finding)
except json.JSONDecodeError as e:
logger.warning(f"Failed to parse Gitleaks output: {e}")
except Exception as e:
logger.warning(f"Error processing Gitleaks results: {e}")
return findings
def _get_leak_severity(self, rule_id: str, description: str) -> str:
"""Determine severity based on secret type"""
critical_patterns = [
"aws", "amazon", "gcp", "google", "azure", "microsoft",
"private_key", "rsa", "ssh", "certificate", "database",
"password", "auth", "token", "secret", "key"
]
rule_lower = rule_id.lower()
desc_lower = description.lower()
# Check for critical patterns
for pattern in critical_patterns:
if pattern in rule_lower or pattern in desc_lower:
if any(x in rule_lower for x in ["aws", "gcp", "azure"]):
return "critical"
elif any(x in rule_lower for x in ["private", "key", "password"]):
return "high"
else:
return "medium"
return "low"
def _get_leak_description(self, rule_id: str, description: str, commit: str) -> str:
"""Get description for the leak finding"""
base_desc = f"Gitleaks detected a potential secret leak matching rule '{rule_id}'"
if description:
base_desc += f" ({description})"
if commit:
base_desc += f" in commit {commit[:8]}"
base_desc += ". This may indicate sensitive information has been committed to version control."
return base_desc
def _get_leak_recommendation(self, rule_id: str) -> str:
"""Get remediation recommendation"""
base_rec = "Remove the secret from the codebase and Git history. "
if any(pattern in rule_id.lower() for pattern in ["aws", "gcp", "azure"]):
base_rec += "Revoke the cloud credentials immediately and rotate them. "
base_rec += "Consider using Git history rewriting tools (git-filter-branch, BFG) " \
"to remove sensitive data from commit history. Implement pre-commit hooks " \
"to prevent future secret commits."
return base_rec

View File

@@ -0,0 +1,294 @@
"""
TruffleHog Secret Detection Module
This module uses TruffleHog to detect secrets, credentials, and sensitive information
with verification capabilities.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import asyncio
import json
import tempfile
from pathlib import Path
from typing import Dict, Any, List
import subprocess
import logging
from ..base import BaseModule, ModuleMetadata, ModuleFinding, ModuleResult
from . import register_module
logger = logging.getLogger(__name__)
@register_module
class TruffleHogModule(BaseModule):
"""TruffleHog secret detection module"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="trufflehog",
version="3.63.2",
description="Comprehensive secret detection with verification using TruffleHog",
author="FuzzForge Team",
category="secret_detection",
tags=["secrets", "credentials", "sensitive-data", "verification"],
input_schema={
"type": "object",
"properties": {
"verify": {
"type": "boolean",
"default": False,
"description": "Verify discovered secrets"
},
"include_detectors": {
"type": "array",
"items": {"type": "string"},
"description": "Specific detectors to include"
},
"exclude_detectors": {
"type": "array",
"items": {"type": "string"},
"description": "Specific detectors to exclude"
},
"max_depth": {
"type": "integer",
"default": 10,
"description": "Maximum directory depth to scan"
},
"concurrency": {
"type": "integer",
"default": 10,
"description": "Number of concurrent workers"
}
}
},
output_schema={
"type": "object",
"properties": {
"findings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"detector": {"type": "string"},
"verified": {"type": "boolean"},
"file_path": {"type": "string"},
"line": {"type": "integer"},
"secret": {"type": "string"}
}
}
}
}
}
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate configuration"""
# Check concurrency bounds
concurrency = config.get("concurrency", 10)
if not isinstance(concurrency, int) or concurrency < 1 or concurrency > 50:
raise ValueError("Concurrency must be between 1 and 50")
# Check max_depth bounds
max_depth = config.get("max_depth", 10)
if not isinstance(max_depth, int) or max_depth < 1 or max_depth > 20:
raise ValueError("Max depth must be between 1 and 20")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""Execute TruffleHog secret detection"""
self.start_timer()
try:
# Validate inputs
self.validate_config(config)
self.validate_workspace(workspace)
logger.info(f"Running TruffleHog on {workspace}")
# Build TruffleHog command
cmd = ["trufflehog", "filesystem", str(workspace)]
# Add verification flag
if config.get("verify", False):
cmd.append("--verify")
# Add JSON output
cmd.extend(["--json", "--no-update"])
# Add concurrency
cmd.extend(["--concurrency", str(config.get("concurrency", 10))])
# Add max depth
cmd.extend(["--max-depth", str(config.get("max_depth", 10))])
# Add include/exclude detectors
if config.get("include_detectors"):
cmd.extend(["--include-detectors", ",".join(config["include_detectors"])])
if config.get("exclude_detectors"):
cmd.extend(["--exclude-detectors", ",".join(config["exclude_detectors"])])
logger.debug(f"Running command: {' '.join(cmd)}")
# Run TruffleHog
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=workspace
)
stdout, stderr = await process.communicate()
# Parse results
findings = []
if process.returncode == 0 or process.returncode == 1: # 1 indicates secrets found
findings = self._parse_trufflehog_output(stdout.decode(), workspace)
else:
error_msg = stderr.decode()
logger.error(f"TruffleHog failed: {error_msg}")
return self.create_result(
findings=[],
status="failed",
error=f"TruffleHog execution failed: {error_msg}"
)
# Create summary
summary = {
"total_secrets": len(findings),
"verified_secrets": len([f for f in findings if f.metadata.get("verified", False)]),
"detectors_triggered": len(set(f.metadata.get("detector", "") for f in findings)),
"files_with_secrets": len(set(f.file_path for f in findings if f.file_path))
}
logger.info(f"TruffleHog found {len(findings)} secrets")
return self.create_result(
findings=findings,
status="success",
summary=summary
)
except Exception as e:
logger.error(f"TruffleHog module failed: {e}")
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
def _parse_trufflehog_output(self, output: str, workspace: Path) -> List[ModuleFinding]:
"""Parse TruffleHog JSON output into findings"""
findings = []
for line in output.strip().split('\n'):
if not line.strip():
continue
try:
result = json.loads(line)
# Extract information
detector = result.get("DetectorName", "unknown")
verified = result.get("Verified", False)
raw_secret = result.get("Raw", "")
# Source info
source_metadata = result.get("SourceMetadata", {})
source_data = source_metadata.get("Data", {})
file_path = source_data.get("Filesystem", {}).get("file", "")
line_num = source_data.get("Filesystem", {}).get("line", 0)
# Make file path relative to workspace
if file_path:
try:
rel_path = Path(file_path).relative_to(workspace)
file_path = str(rel_path)
except ValueError:
# If file is outside workspace, keep absolute path
pass
# Determine severity based on verification and detector type
severity = self._get_secret_severity(detector, verified, raw_secret)
# Create finding
finding = self.create_finding(
title=f"{detector} secret detected",
description=self._get_secret_description(detector, verified),
severity=severity,
category="secret_detection",
file_path=file_path if file_path else None,
line_start=line_num if line_num > 0 else None,
code_snippet=self._truncate_secret(raw_secret),
recommendation=self._get_secret_recommendation(detector, verified),
metadata={
"detector": detector,
"verified": verified,
"detector_type": result.get("DetectorType", ""),
"decoder_type": result.get("DecoderType", ""),
"structured_data": result.get("StructuredData", {})
}
)
findings.append(finding)
except json.JSONDecodeError as e:
logger.warning(f"Failed to parse TruffleHog output line: {e}")
continue
except Exception as e:
logger.warning(f"Error processing TruffleHog result: {e}")
continue
return findings
def _get_secret_severity(self, detector: str, verified: bool, secret: str) -> str:
"""Determine severity based on secret type and verification status"""
if verified:
# Verified secrets are always high risk
critical_detectors = ["aws", "gcp", "azure", "github", "gitlab", "database"]
if any(crit in detector.lower() for crit in critical_detectors):
return "critical"
return "high"
# Unverified secrets
high_risk_detectors = ["private_key", "certificate", "password", "token"]
if any(high in detector.lower() for high in high_risk_detectors):
return "medium"
return "low"
def _get_secret_description(self, detector: str, verified: bool) -> str:
"""Get description for the secret finding"""
verification_status = "verified and active" if verified else "unverified"
return f"A {detector} secret was detected and is {verification_status}. " \
f"This may represent a security risk if the credential is valid."
def _get_secret_recommendation(self, detector: str, verified: bool) -> str:
"""Get remediation recommendation"""
if verified:
return f"IMMEDIATE ACTION REQUIRED: This {detector} secret is verified and active. " \
f"Revoke the credential immediately, remove it from the codebase, and " \
f"implement proper secret management practices."
else:
return f"Review this {detector} secret to determine if it's valid. " \
f"If real, revoke the credential and remove it from the codebase. " \
f"Consider implementing secret scanning in CI/CD pipelines."
def _truncate_secret(self, secret: str, max_length: int = 50) -> str:
"""Truncate secret for display purposes"""
if len(secret) <= max_length:
return secret
return secret[:max_length] + "..."

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,59 @@
FROM prefecthq/prefect:3-python3.11
WORKDIR /app
# Install system dependencies for MobSF and Jadx
RUN apt-get update && apt-get install -y \
git \
default-jdk \
wget \
unzip \
xfonts-75dpi \
xfonts-base \
&& rm -rf /var/lib/apt/lists/* \
&& wget https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6.1-3/wkhtmltox_0.12.6.1-3.bookworm_amd64.deb \
&& apt-get update \
&& apt-get install -y ./wkhtmltox_0.12.6.1-3.bookworm_amd64.deb \
&& rm wkhtmltox_0.12.6.1-3.bookworm_amd64.deb \
&& rm -rf /var/lib/apt/lists/*
# Install Jadx
RUN wget https://github.com/skylot/jadx/releases/download/v1.5.0/jadx-1.5.0.zip -O /tmp/jadx.zip \
&& unzip /tmp/jadx.zip -d /opt/jadx \
&& rm /tmp/jadx.zip \
&& ln -s /opt/jadx/bin/jadx /usr/local/bin/jadx
# The upstream OpenGrep CLI is not yet published on PyPI. Use semgrep (the
# engine that OpenGrep builds upon) and expose it under the `opengrep` name so
# the workflow module can invoke it transparently.
RUN pip install --no-cache-dir semgrep==1.45.0 \
&& ln -sf /usr/local/bin/semgrep /usr/local/bin/opengrep
# Clone and setup MobSF
RUN git clone https://github.com/MobSF/Mobile-Security-Framework-MobSF.git /app/mobsf \
&& cd /app/mobsf \
&& git checkout v3.9.7 \
&& ./setup.sh
# Force rebuild after this point
ARG CACHEBUST=2
# Copy the entire toolbox directory structure
COPY . /app/toolbox
# Copy Android custom rules to a well-known location
COPY ./modules/android/custom_rules /app/custom_opengrep_rules
ENV PYTHONPATH=/app/toolbox:$PYTHONPATH
ENV MOBSF_PORT=8877
# Create startup script to launch MobSF in background and then Prefect
RUN echo '#!/bin/bash\n\
cd /app/mobsf && ./run.sh 127.0.0.1:8877 &\n\
echo "Waiting for MobSF to start..."\n\
sleep 10\n\
echo "Starting Prefect engine..."\n\
exec python -m prefect.engine\n\
' > /app/start.sh && chmod +x /app/start.sh
CMD ["/app/start.sh"]

View File

@@ -0,0 +1,16 @@
# Use existing image with MobSF already installed
FROM localhost:5001/fuzzforge/android_static_analysis:latest
# Install unzip and Jadx
RUN apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/* \
&& wget https://github.com/skylot/jadx/releases/download/v1.5.0/jadx-1.5.0.zip \
&& unzip -o jadx-1.5.0.zip -d /opt/jadx \
&& rm jadx-1.5.0.zip \
&& chmod +x /opt/jadx/bin/jadx \
&& ln -sf /opt/jadx/bin/jadx /usr/local/bin/jadx
# Copy updated toolbox files
COPY . /app/toolbox
# Copy Android custom rules
COPY ./modules/android/custom_rules /app/custom_opengrep_rules

View File

@@ -0,0 +1,6 @@
"""
Android Static Analysis Security Testing (SAST) Workflow
This package contains the Android SAST workflow that combines
multiple static analysis tools optimized for Java code security.
"""

View File

@@ -0,0 +1,135 @@
name: android_static_analysis
version: "1.0.0"
description: "Perform static analysis on Android applications using OpenGrep and MobSF."
author: "FuzzForge Team"
category: "specialized"
tags:
- "android"
- "static-analysis"
- "security"
- "opengrep"
- "semgrep"
- "mobsf"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace/android_test"
requirements:
tools:
- "opengrep"
- "mobsf"
- "sarif_reporter"
resources:
memory: "2Gi"
cpu: "2000m"
timeout: 3600
environment:
python: "3.11"
has_docker: true
default_parameters:
target_path: "/workspace/android_test"
volume_mode: "ro"
apk_path: ""
opengrep_config: {}
custom_rules_path: "/app/custom_opengrep_rules"
reporter_config: {}
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace/android_test"
description: "Path to the decompiled Android source code for OpenGrep analysis."
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode for the attached workspace."
apk_path:
type: string
default: ""
description: "Path to the APK file for MobSF analysis (relative to workspace parent or absolute). If empty, MobSF analysis will be skipped."
opengrep_config:
type: object
description: "Configuration object forwarded to the OpenGrep module."
properties:
config:
type: string
enum: ["auto", "p/security-audit", "p/owasp-top-ten", "p/cwe-top-25"]
description: "Preset OpenGrep ruleset to run."
custom_rules_path:
type: string
description: "Directory that contains custom OpenGrep rules."
languages:
type: array
items:
type: string
description: "Restrict analysis to specific languages."
include_patterns:
type: array
items:
type: string
description: "File patterns to include in the scan."
exclude_patterns:
type: array
items:
type: string
description: "File patterns to exclude from the scan."
max_target_bytes:
type: integer
description: "Maximum file size to analyze (bytes)."
timeout:
type: integer
description: "Analysis timeout in seconds."
severity:
type: array
items:
type: string
enum: ["ERROR", "WARNING", "INFO"]
description: "Severities to include in the results."
confidence:
type: array
items:
type: string
enum: ["HIGH", "MEDIUM", "LOW"]
description: "Confidence levels to include in the results."
custom_rules_path:
type:
- string
- "null"
default: "/app/custom_opengrep_rules"
description: "Optional in-container path pointing to custom OpenGrep rules."
reporter_config:
type: object
description: "Configuration overrides for the SARIF reporter."
properties:
include_code_flows:
type: boolean
description: "Include code flow information in the SARIF output."
logical_id:
type: string
description: "Custom identifier to attach to the generated SARIF report."
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted findings produced by the workflow."
summary:
type: object
description: "Summary information about the analysis execution."
properties:
total_findings:
type: integer
severity_counts:
type: object
tool_metadata:
type: object

View File

@@ -0,0 +1,2 @@
requests
pydantic

View File

@@ -0,0 +1,280 @@
"""
Android Static Analysis Workflow - Analyze APKs using Jadx, MobSF, and OpenGrep
"""
import sys
import os
import logging
import subprocess
import time
import signal
from pathlib import Path
from typing import Dict, Any
from prefect import flow, task
# S'assurer que /app est dans le PYTHONPATH (exécutions Docker)
sys.path.insert(0, "/app")
# Import des modules internes
from toolbox.modules.android.jadx import JadxModule
from toolbox.modules.android.opengrep import OpenGrepModule
from toolbox.modules.reporter import SARIFReporter
from toolbox.modules.android.mobsf import MobSFModule
# Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# ---------------------- TASKS ---------------------- #
@task(name="jadx_decompilation")
async def run_jadx_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
print("Running Jadx APK decompilation")
print(f" APK file: {config.get('apk_path')}")
print(f" Output dir: {config.get('output_dir')}")
module = JadxModule()
result = await module.execute(config, workspace)
print(f"Jadx completed: {result.status}")
if result.error:
print(f"Jadx error: {result.error}")
if result.status == "success":
print(f"Jadx decompiled {result.summary.get('java_files', 0)} Java files")
print(f"Source dir: {result.summary.get('source_dir')}")
return result.dict()
@task(name="opengrep_analysis")
async def run_opengrep_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
print("Running OpenGrep static analysis")
print(f" Workspace: {workspace}")
print(f" Config: {config}")
module = OpenGrepModule()
result = await module.execute(config, workspace)
print(f"OpenGrep completed: {result.status}")
print(f"OpenGrep findings count: {len(result.findings)}")
print(f"OpenGrep summary: {result.summary}")
return result.dict()
@task(name="mobsf_analysis")
async def run_mobsf_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
print("Running MobSF static analysis")
print(f" APK file: {config.get('file_path')}")
print(f" MobSF URL: {config.get('mobsf_url')}")
module = MobSFModule()
result = await module.execute(config, workspace)
print(f"MobSF scan completed: {result.status}")
print(f"MobSF findings count: {len(result.findings)}")
return result.dict()
@task(name="android_report_generation")
async def generate_android_sarif_report(
opengrep_result: Dict[str, Any],
mobsf_result: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
logger.info("Generating SARIF report for Android scan")
reporter = SARIFReporter()
all_findings = []
all_findings.extend(opengrep_result.get("findings", []))
# Add MobSF findings if available
if mobsf_result:
all_findings.extend(mobsf_result.get("findings", []))
reporter_config = {
**(config or {}),
"findings": all_findings,
"tool_name": "FuzzForge Android Static Analysis",
"tool_version": "1.0.0",
}
result = await reporter.execute(reporter_config, workspace)
# Le reporter renvoie typiquement {"sarif": {...}} dans result.dict()
return result.dict().get("sarif", {})
# ---------------------- FLOW ---------------------- #
@flow(name="android_static_analysis", log_prints=True)
async def main_flow(
target_path: str = os.getenv("FF_TARGET_PATH", "/workspace/android_test"),
volume_mode: str = "ro",
apk_path: str = "",
opengrep_config: Dict[str, Any] = {},
custom_rules_path: str = None,
reporter_config: Dict[str, Any] = {},
) -> Dict[str, Any]:
"""
Android static analysis workflow using OpenGrep and MobSF.
Args:
target_path: Path to decompiled source code (for OpenGrep analysis)
volume_mode: Volume mount mode (ro/rw)
apk_path: Path to APK file for MobSF analysis (relative to workspace or absolute)
opengrep_config: Configuration for OpenGrep module
custom_rules_path: Path to custom OpenGrep rules
reporter_config: Configuration for SARIF reporter
"""
print("📱 Starting Android Static Analysis Workflow")
print(f"Workspace: {target_path} (mode: {volume_mode})")
workspace = Path(target_path)
# Start MobSF server in background if APK analysis is needed
mobsf_process = None
if apk_path:
print("🚀 Starting MobSF server in background...")
try:
mobsf_process = subprocess.Popen(
["bash", "-c", "cd /app/mobsf && ./run.sh 127.0.0.1:8877"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
print("⏳ Waiting for MobSF to initialize (45 seconds)...")
time.sleep(45)
print("✅ MobSF should be ready now")
# Retrieve MobSF API key from secret file
print("🔑 Retrieving MobSF API key...")
try:
secret_file = Path("/root/.MobSF/secret")
if secret_file.exists():
secret = secret_file.read_text().strip()
if secret:
# API key is SHA256 hash of the secret file contents
import hashlib
api_key = hashlib.sha256(secret.encode()).hexdigest()
os.environ["MOBSF_API_KEY"] = api_key
print(f"✅ MobSF API key retrieved")
else:
print("⚠️ API key file is empty")
else:
print(f"⚠️ API key file not found at {secret_file}")
except Exception as e:
print(f"⚠️ Error retrieving API key: {e}")
except Exception as e:
print(f"⚠️ Failed to start MobSF: {e}")
mobsf_process = None
# Resolve APK path if provided
# Note: target_path gets mounted as /workspace/ in the execution container
# So all paths should be relative to /workspace/
apk_file_path = None
if apk_path:
apk_path_obj = Path(apk_path)
if apk_path_obj.is_absolute():
apk_file_path = str(apk_path_obj)
else:
# Relative paths are relative to /workspace/ (the mounted target directory)
apk_file_path = f"/workspace/{apk_path}"
print(f"APK path resolved to: {apk_file_path}")
print(f"Checking if APK exists in target: {(Path(target_path) / apk_path).exists()}")
# Set default Android-specific configuration if not provided
if not opengrep_config:
opengrep_config = {
"languages": ["java", "kotlin"], # Focus on Android languages
}
# Use custom Android rules if available, otherwise use custom_rules_path param
if custom_rules_path:
opengrep_config["custom_rules_path"] = custom_rules_path
elif "custom_rules_path" not in opengrep_config:
# Default to custom Android security rules
opengrep_config["custom_rules_path"] = "/app/custom_opengrep_rules"
try:
# --- Phase 1 : Jadx Decompilation ---
jadx_result = None
actual_workspace = workspace
if apk_file_path:
print(f"Phase 1: Jadx decompilation of APK: {apk_file_path}")
jadx_config = {
"apk_path": apk_file_path,
"output_dir": "jadx_output",
"overwrite": True,
"threads": 4,
}
jadx_result = await run_jadx_task(workspace, jadx_config)
if jadx_result.get("status") == "success":
# Use Jadx source output as workspace for OpenGrep
source_dir = jadx_result.get("summary", {}).get("source_dir")
if source_dir:
actual_workspace = Path(source_dir)
print(f"✅ Jadx decompiled {jadx_result.get('summary', {}).get('java_files', 0)} Java files")
print(f" OpenGrep will analyze: {source_dir}")
else:
print(f"⚠️ Jadx failed: {jadx_result.get('error', 'unknown error')}")
else:
print("Phase 1: Jadx decompilation skipped (no APK provided)")
# --- Phase 2 : OpenGrep ---
print("Phase 2: OpenGrep analysis on source code")
print(f"Using config: {opengrep_config}")
opengrep_result = await run_opengrep_task(actual_workspace, opengrep_config)
# --- Phase 3 : MobSF ---
mobsf_result = None
if apk_file_path:
print(f"Phase 3: MobSF analysis on APK: {apk_file_path}")
mobsf_config = {
"mobsf_url": "http://localhost:8877",
"file_path": apk_file_path,
"api_key": os.environ.get("MOBSF_API_KEY", "")
}
print(f"Using MobSF config (api_key={mobsf_config['api_key'][:10]}...): {mobsf_config}")
mobsf_result = await run_mobsf_task(workspace, mobsf_config)
print(f"MobSF result: {mobsf_result}")
else:
print(f"Phase 3: MobSF analysis skipped (apk_path='{apk_path}' empty)")
# --- Phase 4 : Rapport SARIF ---
print("Phase 4: SARIF report generation")
sarif_report = await generate_android_sarif_report(
opengrep_result, mobsf_result, reporter_config or {}, workspace
)
findings = sarif_report.get("runs", [{}])[0].get("results", []) if sarif_report else []
print(f"✅ Workflow complete with {len(findings)} findings")
return sarif_report
except Exception as e:
logger.error(f"Workflow failed: {e}")
print(f"❌ Workflow failed: {e}")
# Retourner un squelette SARIF minimal en cas d'échec
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {"driver": {"name": "FuzzForge Android Static Analysis"}},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e),
}
],
}
],
}
finally:
# Cleanup: Stop MobSF if it was started
if mobsf_process:
print("🛑 Stopping MobSF server...")
try:
mobsf_process.terminate()
mobsf_process.wait(timeout=5)
print("✅ MobSF stopped")
except Exception as e:
print(f"⚠️ Error stopping MobSF: {e}")
try:
mobsf_process.kill()
except:
pass

View File

@@ -0,0 +1,12 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,47 @@
# Secret Detection Workflow Dockerfile
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog (use direct binary download to avoid install script issues)
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
&& tar -xzf trufflehog.tar.gz \
&& mv trufflehog /usr/local/bin/ \
&& rm trufflehog.tar.gz
# Install Gitleaks (use specific version to avoid API rate limiting)
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_8.18.2_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create toolbox directory structure
RUN mkdir -p /opt/prefect/toolbox
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# The toolbox code will be mounted at runtime from the backend container
# This includes:
# - /opt/prefect/toolbox/modules/base.py
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
VOLUME /opt/prefect/toolbox
# Set working directory for execution
WORKDIR /opt/prefect

View File

@@ -0,0 +1,58 @@
# Secret Detection Workflow Dockerfile - Self-Contained Version
# This version copies all required modules into the image for complete isolation
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
# Install Gitleaks
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
&& tar -xzf gitleaks_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create directory structure
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
/opt/prefect/toolbox/modules/reporter \
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
# Copy the base module and required modules
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
# Copy the workflow code
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
# Copy toolbox init files
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
# Install Python dependencies for the modules
RUN pip install --no-cache-dir \
pydantic \
asyncio
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# Set default command (can be overridden)
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]

View File

@@ -0,0 +1,130 @@
# Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple industry-standard tools:
- **TruffleHog**: Comprehensive secret detection with verification capabilities
- **Gitleaks**: Git-specific secret scanning and leak detection
## Features
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
- **Deduplication**: Automatically removes duplicate findings across tools
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
- **Configurable**: Supports extensive configuration for both tools
## Dependencies
### Required Modules
- `toolbox.modules.secret_detection.trufflehog`
- `toolbox.modules.secret_detection.gitleaks`
- `toolbox.modules.reporter` (SARIF reporter)
- `toolbox.modules.base` (Base module interface)
### External Tools
- TruffleHog v3.63.2+
- Gitleaks v8.18.0+
## Docker Deployment
This workflow provides two Docker deployment approaches:
### 1. Volume-Based Approach (Default: `Dockerfile`)
**Advantages:**
- Live code updates without rebuilding images
- Smaller image sizes
- Consistent module versions across workflows
- Faster development iteration
**How it works:**
- Docker image contains only external tools (TruffleHog, Gitleaks)
- Python modules are mounted at runtime from the backend container
- Backend manages code synchronization via shared volumes
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
**Advantages:**
- Complete isolation and reproducibility
- No runtime dependencies on backend code
- Can run independently of FuzzForge platform
- Better for CI/CD integration
**How it works:**
- All required Python modules are copied into the Docker image
- Image is completely self-contained
- Larger image size but fully portable
## Configuration
### TruffleHog Configuration
```json
{
"trufflehog_config": {
"verify": true, // Verify discovered secrets
"concurrency": 10, // Number of concurrent workers
"max_depth": 10, // Maximum directory depth
"include_detectors": [], // Specific detectors to include
"exclude_detectors": [] // Specific detectors to exclude
}
}
```
### Gitleaks Configuration
```json
{
"gitleaks_config": {
"scan_mode": "detect", // "detect" or "protect"
"redact": true, // Redact secrets in output
"max_target_megabytes": 100, // Maximum file size (MB)
"no_git": false, // Scan without Git context
"config_file": "", // Custom Gitleaks config
"baseline_file": "" // Baseline file for known findings
}
}
```
## Usage Example
```bash
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/path/to/scan",
"volume_mode": "ro",
"parameters": {
"trufflehog_config": {
"verify": true,
"concurrency": 15
},
"gitleaks_config": {
"scan_mode": "detect",
"max_target_megabytes": 200
}
}
}'
```
## Output Format
The workflow generates a SARIF report containing:
- All unique findings from both tools
- Severity levels mapped to standard scale
- File locations and line numbers
- Detailed descriptions and recommendations
- Tool-specific metadata
## Performance Considerations
- **TruffleHog**: CPU-intensive with verification enabled
- **Gitleaks**: Memory-intensive for large repositories
- **Recommended Resources**: 512Mi memory, 500m CPU
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
## Security Notes
- Secrets are redacted in output by default
- Verified secrets are marked with higher severity
- Both tools support custom rules and exclusions
- Consider using baseline files for known false positives

View File

@@ -0,0 +1,17 @@
"""
Secret Detection Scan Workflow
This package contains the comprehensive secret detection workflow that combines
multiple secret detection tools for thorough analysis.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

View File

@@ -0,0 +1,113 @@
name: secret_detection_scan
version: "2.0.0"
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "secrets"
- "credentials"
- "detection"
- "trufflehog"
- "gitleaks"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "trufflehog"
- "gitleaks"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
trufflehog_config: {}
gitleaks_config: {}
reporter_config: {}
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
trufflehog_config:
type: object
description: "TruffleHog configuration"
properties:
verify:
type: boolean
description: "Verify discovered secrets"
concurrency:
type: integer
description: "Number of concurrent workers"
max_depth:
type: integer
description: "Maximum directory depth to scan"
include_detectors:
type: array
items:
type: string
description: "Specific detectors to include"
exclude_detectors:
type: array
items:
type: string
description: "Specific detectors to exclude"
gitleaks_config:
type: object
description: "Gitleaks configuration"
properties:
scan_mode:
type: string
enum: ["detect", "protect"]
description: "Scan mode"
redact:
type: boolean
description: "Redact secrets in output"
max_target_megabytes:
type: integer
description: "Maximum file size to scan (MB)"
no_git:
type: boolean
description: "Scan files without Git context"
config_file:
type: string
description: "Path to custom configuration file"
baseline_file:
type: string
description: "Path to baseline file"
reporter_config:
type: object
description: "SARIF reporter configuration"
properties:
output_file:
type: string
description: "Output SARIF file name"
include_code_flows:
type: boolean
description: "Include code flow information"
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted security findings"

View File

@@ -0,0 +1,290 @@
"""
Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple tools:
- TruffleHog: Comprehensive secret detection with verification
- Gitleaks: Git-specific secret scanning
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from typing import Dict, Any, List, Optional
from prefect import flow, task
from prefect.artifacts import create_markdown_artifact, create_table_artifact
import asyncio
import json
# Add modules to path
sys.path.insert(0, '/app')
# Import modules
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
from toolbox.modules.reporter import SARIFReporter
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="trufflehog_scan")
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run TruffleHog secret detection.
Args:
workspace: Path to the workspace
config: TruffleHog configuration
Returns:
TruffleHog results
"""
logger.info("Running TruffleHog secret detection")
module = TruffleHogModule()
result = await module.execute(config, workspace)
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
return result.dict()
@task(name="gitleaks_scan")
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run Gitleaks secret detection.
Args:
workspace: Path to the workspace
config: Gitleaks configuration
Returns:
Gitleaks results
"""
logger.info("Running Gitleaks secret detection")
module = GitleaksModule()
result = await module.execute(config, workspace)
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
return result.dict()
@task(name="aggregate_findings")
async def aggregate_findings_task(
trufflehog_results: Dict[str, Any],
gitleaks_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to aggregate findings from all secret detection tools.
Args:
trufflehog_results: Results from TruffleHog
gitleaks_results: Results from Gitleaks
config: Reporter configuration
workspace: Path to workspace
Returns:
Aggregated SARIF report
"""
logger.info("Aggregating secret detection findings")
# Combine all findings
all_findings = []
# Add TruffleHog findings
trufflehog_findings = trufflehog_results.get("findings", [])
all_findings.extend(trufflehog_findings)
# Add Gitleaks findings
gitleaks_findings = gitleaks_results.get("findings", [])
all_findings.extend(gitleaks_findings)
# Deduplicate findings based on file path and line number
unique_findings = []
seen_signatures = set()
for finding in all_findings:
# Create signature for deduplication
signature = (
finding.get("file_path", ""),
finding.get("line_start", 0),
finding.get("title", "").lower()[:50] # First 50 chars of title
)
if signature not in seen_signatures:
seen_signatures.add(signature)
unique_findings.append(finding)
else:
logger.debug(f"Deduplicated finding: {signature}")
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
# Generate SARIF report
reporter = SARIFReporter()
reporter_config = {
**config,
"findings": unique_findings,
"tool_name": "FuzzForge Secret Detection",
"tool_version": "1.0.0",
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
}
result = await reporter.execute(reporter_config, workspace)
return result.dict().get("sarif", {})
@flow(name="secret_detection_scan", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
trufflehog_config: Optional[Dict[str, Any]] = None,
gitleaks_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main secret detection workflow.
This workflow:
1. Runs TruffleHog for comprehensive secret detection
2. Runs Gitleaks for Git-specific secret detection
3. Aggregates and deduplicates findings
4. Generates a unified SARIF report
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
trufflehog_config: Configuration for TruffleHog
gitleaks_config: Configuration for Gitleaks
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
"""
logger.info("Starting comprehensive secret detection workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
# Default configurations - merge with provided configs to ensure defaults are always applied
default_trufflehog_config = {
"verify": False,
"concurrency": 10,
"max_depth": 10,
"no_git": True # Add no_git for filesystem scanning
}
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
default_gitleaks_config = {
"scan_mode": "detect",
"redact": True,
"max_target_megabytes": 100,
"no_git": True # Critical for non-git directories
}
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
default_reporter_config = {
"include_code_flows": False
}
reporter_config = {**default_reporter_config, **(reporter_config or {})}
try:
# Run secret detection tools in parallel
logger.info("Phase 1: Running secret detection tools")
# Create tasks for parallel execution
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
# Wait for both to complete
trufflehog_results, gitleaks_results = await asyncio.gather(
trufflehog_task_result,
gitleaks_task_result,
return_exceptions=True
)
# Handle any exceptions
if isinstance(trufflehog_results, Exception):
logger.error(f"TruffleHog failed: {trufflehog_results}")
trufflehog_results = {"findings": [], "status": "failed"}
if isinstance(gitleaks_results, Exception):
logger.error(f"Gitleaks failed: {gitleaks_results}")
gitleaks_results = {"findings": [], "status": "failed"}
# Aggregate findings
logger.info("Phase 2: Aggregating findings")
sarif_report = await aggregate_findings_task(
trufflehog_results,
gitleaks_results,
reporter_config,
workspace
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
# Log tool-specific stats
trufflehog_count = len(trufflehog_results.get("findings", []))
gitleaks_count = len(gitleaks_results.get("findings", []))
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
else:
logger.info("Workflow completed successfully with no findings")
return sarif_report
except Exception as e:
logger.error(f"Secret detection workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Secret Detection",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
}
if __name__ == "__main__":
# For local testing
import asyncio
asyncio.run(main_flow(
target_path="/tmp/test",
trufflehog_config={"verify": True, "max_depth": 5},
gitleaks_config={"scan_mode": "detect"}
))

View File

@@ -0,0 +1,204 @@
"""
Manual Workflow Registry for Prefect Deployment
This file contains the manual registry of all workflows that can be deployed.
Developers MUST add their workflows here after creating them.
This approach is required because:
1. Prefect cannot deploy dynamically imported flows
2. Docker deployment needs static flow references
3. Explicit registration provides better control and visibility
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from typing import Dict, Any, Callable
import logging
logger = logging.getLogger(__name__)
# Import only essential workflows
# Import each workflow individually to handle failures gracefully
security_assessment_flow = None
secret_detection_flow = None
android_static_analysis_flow = None
# Try to import each workflow individually
try:
from .security_assessment.workflow import main_flow as security_assessment_flow
except ImportError as e:
logger.warning(f"Failed to import security_assessment workflow: {e}")
try:
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
except ImportError as e:
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
try:
from .android_static_analysis.workflow import main_flow as android_static_analysis_flow
except ImportError as e:
logger.warning(f"Failed to import android_static_analysis workflow: {e}")
# Manual registry - developers add workflows here after creation
# Only include workflows that were successfully imported
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
# Add workflows that were successfully imported
if security_assessment_flow is not None:
WORKFLOW_REGISTRY["security_assessment"] = {
"flow": security_assessment_flow,
"module_path": "toolbox.workflows.security_assessment.workflow",
"function_name": "main_flow",
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
}
if secret_detection_flow is not None:
WORKFLOW_REGISTRY["secret_detection_scan"] = {
"flow": secret_detection_flow,
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
"function_name": "main_flow",
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
}
if android_static_analysis_flow is not None:
WORKFLOW_REGISTRY["android_static_analysis"] = {
"flow": android_static_analysis_flow,
"module_path": "toolbox.workflows.android_static_analysis.workflow",
"function_name": "main_flow",
"description": "Perform static analysis on Android applications using OpenGrep",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["android", "static-analysis", "security", "opengrep", "semgrep"]
}
#
# To add a new workflow, follow this pattern:
#
# "my_new_workflow": {
# "flow": my_new_flow_function, # Import the flow function above
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
# "function_name": "my_new_flow_function",
# "description": "Description of what this workflow does",
# "version": "1.0.0",
# "author": "Developer Name",
# "tags": ["tag1", "tag2"]
# }
def get_workflow_flow(workflow_name: str) -> Callable:
"""
Get the flow function for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Flow function
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}. "
f"Please add the workflow to toolbox/workflows/registry.py"
)
return WORKFLOW_REGISTRY[workflow_name]["flow"]
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information dictionary
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}"
)
return WORKFLOW_REGISTRY[workflow_name]
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
"""
Get all registered workflows.
Returns:
Dictionary of all workflow registry entries
"""
return WORKFLOW_REGISTRY.copy()
def validate_registry() -> bool:
"""
Validate the workflow registry for consistency.
Returns:
True if valid, raises exceptions if not
Raises:
ValueError: If registry is invalid
"""
if not WORKFLOW_REGISTRY:
raise ValueError("Workflow registry is empty")
required_fields = ["flow", "module_path", "function_name", "description"]
for name, entry in WORKFLOW_REGISTRY.items():
# Check required fields
missing_fields = [field for field in required_fields if field not in entry]
if missing_fields:
raise ValueError(
f"Workflow '{name}' missing required fields: {missing_fields}"
)
# Check if flow is callable
if not callable(entry["flow"]):
raise ValueError(f"Workflow '{name}' flow is not callable")
# Check if flow has the required Prefect attributes
if not hasattr(entry["flow"], "deploy"):
raise ValueError(
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
)
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
return True
# Validate registry on import
try:
validate_registry()
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
except Exception as e:
logger.error(f"Workflow registry validation failed: {e}")
raise

View File

@@ -0,0 +1,30 @@
FROM prefecthq/prefect:3-python3.11
WORKDIR /app
# Create toolbox directory structure to match expected import paths
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
# Copy base module infrastructure
COPY modules/__init__.py /app/toolbox/modules/
COPY modules/base.py /app/toolbox/modules/
# Copy only required modules (manual selection)
COPY modules/scanner /app/toolbox/modules/scanner
COPY modules/analyzer /app/toolbox/modules/analyzer
COPY modules/reporter /app/toolbox/modules/reporter
# Copy this workflow
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
# Install workflow-specific requirements if they exist
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
# Install common requirements
RUN pip install --no-cache-dir pyyaml
# Set Python path
ENV PYTHONPATH=/app:$PYTHONPATH
# Create workspace directory
RUN mkdir -p /workspace

View File

@@ -0,0 +1,11 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.

Some files were not shown because too many files have changed in this diff Show More