mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-04-11 21:38:30 +02:00
Compare commits
1 Commits
feat/skill
...
feat/andro
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5da3f1e071 |
48
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
48
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
name: 🐛 Bug Report
|
||||
about: Create a report to help us improve FuzzForge
|
||||
title: "[BUG] "
|
||||
labels: bug
|
||||
assignees: ''
|
||||
---
|
||||
|
||||
## Description
|
||||
A clear and concise description of the bug you encountered.
|
||||
|
||||
## Environment
|
||||
Please provide details about your environment:
|
||||
- **OS**: (e.g., macOS 14.0, Ubuntu 22.04, Windows 11)
|
||||
- **Python version**: (e.g., 3.9.7)
|
||||
- **Docker version**: (e.g., 24.0.6)
|
||||
- **FuzzForge version**: (e.g., 0.6.0)
|
||||
|
||||
## Steps to Reproduce
|
||||
Clear steps to recreate the issue:
|
||||
|
||||
1. Go to '...'
|
||||
2. Run command '...'
|
||||
3. Click on '...'
|
||||
4. See error
|
||||
|
||||
## Expected Behavior
|
||||
A clear and concise description of what should happen.
|
||||
|
||||
## Actual Behavior
|
||||
A clear and concise description of what actually happens.
|
||||
|
||||
## Logs
|
||||
Please include relevant error messages and stack traces:
|
||||
|
||||
```
|
||||
Paste logs here
|
||||
```
|
||||
|
||||
## Screenshots
|
||||
If applicable, add screenshots to help explain your problem.
|
||||
|
||||
## Additional Context
|
||||
Add any other context about the problem here (workflow used, specific target, configuration, etc.).
|
||||
|
||||
---
|
||||
|
||||
💬 **Need help?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) for real-time support.
|
||||
8
.github/ISSUE_TEMPLATE/config.yml
vendored
Normal file
8
.github/ISSUE_TEMPLATE/config.yml
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
blank_issues_enabled: false
|
||||
contact_links:
|
||||
- name: 💬 Community Discord
|
||||
url: https://discord.com/invite/acqv9FVG
|
||||
about: Join our Discord to discuss ideas, workflows, and security research with the community.
|
||||
- name: 📖 Documentation
|
||||
url: https://github.com/FuzzingLabs/fuzzforge_ai/tree/main/docs
|
||||
about: Check our documentation for guides, tutorials, and API reference.
|
||||
38
.github/ISSUE_TEMPLATE/feature_request.md
vendored
Normal file
38
.github/ISSUE_TEMPLATE/feature_request.md
vendored
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
name: ✨ Feature Request
|
||||
about: Suggest an idea for FuzzForge
|
||||
title: "[FEATURE] "
|
||||
labels: enhancement
|
||||
assignees: ''
|
||||
---
|
||||
|
||||
## Use Case
|
||||
Why is this feature needed? Describe the problem you're trying to solve or the improvement you'd like to see.
|
||||
|
||||
## Proposed Solution
|
||||
How should it work? Describe your ideal solution in detail.
|
||||
|
||||
## Alternatives
|
||||
What other approaches have you considered? List any alternative solutions or features you've thought about.
|
||||
|
||||
## Implementation
|
||||
**(Optional)** Do you have any technical considerations or implementation ideas?
|
||||
|
||||
## Category
|
||||
What area of FuzzForge would this feature enhance?
|
||||
|
||||
- [ ] 🤖 AI Agents for Security
|
||||
- [ ] 🛠 Workflow Automation
|
||||
- [ ] 📈 Vulnerability Research
|
||||
- [ ] 🔗 Fuzzer Integration
|
||||
- [ ] 🌐 Community Marketplace
|
||||
- [ ] 🔒 Enterprise Features
|
||||
- [ ] 📚 Documentation
|
||||
- [ ] 🎯 Other
|
||||
|
||||
## Additional Context
|
||||
Add any other context, screenshots, references, or examples about the feature request here.
|
||||
|
||||
---
|
||||
|
||||
💬 **Want to discuss this idea?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) to collaborate with other contributors!
|
||||
67
.github/ISSUE_TEMPLATE/workflow_submission.md
vendored
Normal file
67
.github/ISSUE_TEMPLATE/workflow_submission.md
vendored
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
name: 🔄 Workflow Submission
|
||||
about: Contribute a security workflow or module to the FuzzForge community
|
||||
title: "[WORKFLOW] "
|
||||
labels: workflow, community
|
||||
assignees: ''
|
||||
---
|
||||
|
||||
## Workflow Name
|
||||
Provide a short, descriptive name for your workflow.
|
||||
|
||||
## Description
|
||||
Explain what this workflow does and what security problems it solves.
|
||||
|
||||
## Category
|
||||
What type of security workflow is this?
|
||||
|
||||
- [ ] 🛡️ **Security Assessment** - Static analysis, vulnerability scanning
|
||||
- [ ] 🔍 **Secret Detection** - Credential and secret scanning
|
||||
- [ ] 🎯 **Fuzzing** - Dynamic testing and fuzz testing
|
||||
- [ ] 🔄 **Reverse Engineering** - Binary analysis and decompilation
|
||||
- [ ] 🌐 **Infrastructure Security** - Container, cloud, network security
|
||||
- [ ] 🔒 **Penetration Testing** - Offensive security testing
|
||||
- [ ] 📋 **Other** - Please describe
|
||||
|
||||
## Files
|
||||
Please attach or provide links to your workflow files:
|
||||
|
||||
- [ ] `workflow.py` - Main Prefect flow implementation
|
||||
- [ ] `Dockerfile` - Container definition
|
||||
- [ ] `metadata.yaml` - Workflow metadata
|
||||
- [ ] Test files or examples
|
||||
- [ ] Documentation
|
||||
|
||||
## Testing
|
||||
How did you test this workflow? Please describe:
|
||||
|
||||
- **Test targets used**: (e.g., vulnerable_app, custom test cases)
|
||||
- **Expected outputs**: (e.g., SARIF format, specific vulnerabilities detected)
|
||||
- **Validation results**: (e.g., X vulnerabilities found, Y false positives)
|
||||
|
||||
## SARIF Compliance
|
||||
- [ ] My workflow outputs results in SARIF format
|
||||
- [ ] Results include severity levels and descriptions
|
||||
- [ ] Code flow information is provided where applicable
|
||||
|
||||
## Security Guidelines
|
||||
- [ ] This workflow focuses on **defensive security** purposes only
|
||||
- [ ] I have not included any malicious tools or capabilities
|
||||
- [ ] All secrets/credentials are parameterized (no hardcoded values)
|
||||
- [ ] I have followed responsible disclosure practices
|
||||
|
||||
## Registry Integration
|
||||
Have you updated the workflow registry?
|
||||
|
||||
- [ ] Added import statement to `backend/toolbox/workflows/registry.py`
|
||||
- [ ] Added registry entry with proper metadata
|
||||
- [ ] Tested workflow registration and deployment
|
||||
|
||||
## Additional Notes
|
||||
Anything else the maintainers should know about this workflow?
|
||||
|
||||
---
|
||||
|
||||
🚀 **Thank you for contributing to FuzzForge!** Your workflow will help the security community automate and scale their testing efforts.
|
||||
|
||||
💬 **Questions?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) to discuss your contribution!
|
||||
70
.github/workflows/ci-python.yml
vendored
Normal file
70
.github/workflows/ci-python.yml
vendored
Normal file
@@ -0,0 +1,70 @@
|
||||
name: Python CI
|
||||
|
||||
# This is a dumb Ci to ensure that the python client and backend builds correctly
|
||||
# It could be optimized to run faster, building, testing and linting only changed code
|
||||
# but for now it is good enough. It runs on every push and PR to any branch.
|
||||
# It also runs on demand.
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
push:
|
||||
paths:
|
||||
- "ai/**"
|
||||
- "backend/**"
|
||||
- "cli/**"
|
||||
- "sdk/**"
|
||||
- "src/**"
|
||||
pull_request:
|
||||
paths:
|
||||
- "ai/**"
|
||||
- "backend/**"
|
||||
- "cli/**"
|
||||
- "sdk/**"
|
||||
- "src/**"
|
||||
|
||||
jobs:
|
||||
ci:
|
||||
name: ci
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
|
||||
- name: Setup uv
|
||||
uses: astral-sh/setup-uv@v6
|
||||
with:
|
||||
enable-cache: true
|
||||
|
||||
- name: Set up Python
|
||||
run: uv python install
|
||||
|
||||
# Validate no obvious issues
|
||||
# Quick hack because CLI returns non-zero exit code when no args are provided
|
||||
- name: Run base command
|
||||
run: |
|
||||
set +e
|
||||
uv run ff
|
||||
if [ $? -ne 2 ]; then
|
||||
echo "Expected exit code 2 from 'uv run ff', got $?"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Build fuzzforge_ai package
|
||||
run: uv build
|
||||
|
||||
- name: Build ai package
|
||||
working-directory: ai
|
||||
run: uv build
|
||||
|
||||
- name: Build cli package
|
||||
working-directory: cli
|
||||
run: uv build
|
||||
|
||||
- name: Build sdk package
|
||||
working-directory: sdk
|
||||
run: uv build
|
||||
|
||||
- name: Build backend package
|
||||
working-directory: backend
|
||||
run: uv build
|
||||
86
.github/workflows/ci.yml
vendored
86
.github/workflows/ci.yml
vendored
@@ -1,86 +0,0 @@
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main, dev, feature/*]
|
||||
pull_request:
|
||||
branches: [main, dev]
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
lint-and-typecheck:
|
||||
name: Lint & Type Check
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v5
|
||||
with:
|
||||
version: "latest"
|
||||
|
||||
- name: Set up Python
|
||||
run: uv python install 3.14
|
||||
|
||||
- name: Install dependencies
|
||||
run: uv sync
|
||||
|
||||
- name: Ruff check (fuzzforge-cli)
|
||||
run: |
|
||||
cd fuzzforge-cli
|
||||
uv run --extra lints ruff check src/
|
||||
|
||||
- name: Ruff check (fuzzforge-mcp)
|
||||
run: |
|
||||
cd fuzzforge-mcp
|
||||
uv run --extra lints ruff check src/
|
||||
|
||||
- name: Ruff check (fuzzforge-common)
|
||||
run: |
|
||||
cd fuzzforge-common
|
||||
uv run --extra lints ruff check src/
|
||||
|
||||
- name: Mypy type check (fuzzforge-cli)
|
||||
run: |
|
||||
cd fuzzforge-cli
|
||||
uv run --extra lints mypy src/
|
||||
|
||||
- name: Mypy type check (fuzzforge-mcp)
|
||||
run: |
|
||||
cd fuzzforge-mcp
|
||||
uv run --extra lints mypy src/
|
||||
|
||||
# NOTE: Mypy check for fuzzforge-common temporarily disabled
|
||||
# due to 37 pre-existing type errors in legacy code.
|
||||
# TODO: Fix type errors and re-enable strict checking
|
||||
#- name: Mypy type check (fuzzforge-common)
|
||||
# run: |
|
||||
# cd fuzzforge-common
|
||||
# uv run --extra lints mypy src/
|
||||
|
||||
test:
|
||||
name: Tests
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v5
|
||||
with:
|
||||
version: "latest"
|
||||
|
||||
- name: Set up Python
|
||||
run: uv python install 3.14
|
||||
|
||||
- name: Install dependencies
|
||||
run: uv sync --all-extras
|
||||
|
||||
- name: Run MCP tests
|
||||
run: |
|
||||
cd fuzzforge-mcp
|
||||
uv run --extra tests pytest -v
|
||||
|
||||
- name: Run common tests
|
||||
run: |
|
||||
cd fuzzforge-common
|
||||
uv run --extra tests pytest -v
|
||||
57
.github/workflows/docs-deploy.yml
vendored
Normal file
57
.github/workflows/docs-deploy.yml
vendored
Normal file
@@ -0,0 +1,57 @@
|
||||
name: Deploy Docusaurus to GitHub Pages
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
paths:
|
||||
- "docs/**"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
name: Build Docusaurus
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ./docs
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 24
|
||||
cache: npm
|
||||
cache-dependency-path: "**/package-lock.json"
|
||||
|
||||
- name: Install dependencies
|
||||
run: npm ci
|
||||
- name: Build website
|
||||
run: npm run build
|
||||
|
||||
- name: Upload Build Artifact
|
||||
uses: actions/upload-pages-artifact@v3
|
||||
with:
|
||||
path: ./docs/build
|
||||
|
||||
deploy:
|
||||
name: Deploy to GitHub Pages
|
||||
needs: build
|
||||
|
||||
# Grant GITHUB_TOKEN the permissions required to make a Pages deployment
|
||||
permissions:
|
||||
pages: write # to deploy to Pages
|
||||
id-token: write # to verify the deployment originates from an appropriate source
|
||||
|
||||
# Deploy to the github-pages environment
|
||||
environment:
|
||||
name: github-pages
|
||||
url: ${{ steps.deployment.outputs.page_url }}
|
||||
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Deploy to GitHub Pages
|
||||
id: deployment
|
||||
uses: actions/deploy-pages@v4
|
||||
33
.github/workflows/docs-test-deploy.yml
vendored
Normal file
33
.github/workflows/docs-test-deploy.yml
vendored
Normal file
@@ -0,0 +1,33 @@
|
||||
name: Docusaurus test deployment
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
push:
|
||||
paths:
|
||||
- "docs/**"
|
||||
pull_request:
|
||||
paths:
|
||||
- "docs/**"
|
||||
|
||||
jobs:
|
||||
test-deploy:
|
||||
name: Test deployment
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ./docs
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 24
|
||||
cache: npm
|
||||
cache-dependency-path: "**/package-lock.json"
|
||||
|
||||
- name: Install dependencies
|
||||
run: npm ci
|
||||
- name: Test build website
|
||||
run: npm run build
|
||||
49
.github/workflows/mcp-server.yml
vendored
49
.github/workflows/mcp-server.yml
vendored
@@ -1,49 +0,0 @@
|
||||
name: MCP Server Smoke Test
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main, dev]
|
||||
pull_request:
|
||||
branches: [main, dev]
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
mcp-server:
|
||||
name: MCP Server Test
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Install uv
|
||||
uses: astral-sh/setup-uv@v5
|
||||
with:
|
||||
version: "latest"
|
||||
|
||||
- name: Set up Python
|
||||
run: uv python install 3.14
|
||||
|
||||
- name: Install dependencies
|
||||
run: uv sync --all-extras
|
||||
|
||||
- name: Start MCP server in background
|
||||
run: |
|
||||
cd fuzzforge-mcp
|
||||
nohup uv run python -m fuzzforge_mcp.server > server.log 2>&1 &
|
||||
echo $! > server.pid
|
||||
sleep 3
|
||||
|
||||
- name: Run MCP tool tests
|
||||
run: |
|
||||
cd fuzzforge-mcp
|
||||
uv run --extra tests pytest tests/test_resources.py -v
|
||||
|
||||
- name: Stop MCP server
|
||||
if: always()
|
||||
run: |
|
||||
if [ -f fuzzforge-mcp/server.pid ]; then
|
||||
kill $(cat fuzzforge-mcp/server.pid) || true
|
||||
fi
|
||||
|
||||
- name: Show server logs
|
||||
if: failure()
|
||||
run: cat fuzzforge-mcp/server.log || true
|
||||
298
.gitignore
vendored
298
.gitignore
vendored
@@ -1,15 +1,291 @@
|
||||
*.egg-info
|
||||
*.whl
|
||||
# ========================================
|
||||
# FuzzForge Platform .gitignore
|
||||
# ========================================
|
||||
|
||||
# -------------------- Python --------------------
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# C extensions
|
||||
*.so
|
||||
|
||||
# Distribution / packaging
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
pip-wheel-metadata/
|
||||
share/python-wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
MANIFEST
|
||||
|
||||
# PyInstaller
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
# Installer logs
|
||||
pip-log.txt
|
||||
pip-delete-this-directory.txt
|
||||
|
||||
# Unit test / coverage reports
|
||||
htmlcov/
|
||||
.tox/
|
||||
.nox/
|
||||
.coverage
|
||||
.coverage.*
|
||||
.cache
|
||||
nosetests.xml
|
||||
coverage.xml
|
||||
*.cover
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
|
||||
# Environments
|
||||
.env
|
||||
.mypy_cache
|
||||
.pytest_cache
|
||||
.ruff_cache
|
||||
.venv
|
||||
.vscode
|
||||
__pycache__
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
env.bak/
|
||||
venv.bak/
|
||||
.python-version
|
||||
|
||||
# Podman/Docker container storage artifacts
|
||||
~/.fuzzforge/
|
||||
# UV package manager
|
||||
uv.lock
|
||||
# But allow uv.lock in CLI and SDK for reproducible builds
|
||||
!cli/uv.lock
|
||||
!sdk/uv.lock
|
||||
!backend/uv.lock
|
||||
|
||||
# User-specific hub config (generated at runtime)
|
||||
hub-config.json
|
||||
# MyPy
|
||||
.mypy_cache/
|
||||
.dmypy.json
|
||||
dmypy.json
|
||||
|
||||
# Pyre type checker
|
||||
.pyre/
|
||||
|
||||
# pytype static type analyzer
|
||||
.pytype/
|
||||
|
||||
# Cython debug symbols
|
||||
cython_debug/
|
||||
|
||||
# -------------------- IDE / Editor --------------------
|
||||
# VSCode
|
||||
.vscode/
|
||||
*.code-workspace
|
||||
|
||||
# PyCharm
|
||||
.idea/
|
||||
|
||||
# Vim
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# Emacs
|
||||
*~
|
||||
\#*\#
|
||||
/.emacs.desktop
|
||||
/.emacs.desktop.lock
|
||||
*.elc
|
||||
auto-save-list
|
||||
tramp
|
||||
.\#*
|
||||
|
||||
# Sublime Text
|
||||
*.sublime-project
|
||||
*.sublime-workspace
|
||||
|
||||
# -------------------- Operating System --------------------
|
||||
# macOS
|
||||
.DS_Store
|
||||
.AppleDouble
|
||||
.LSOverride
|
||||
Icon
|
||||
._*
|
||||
.DocumentRevisions-V100
|
||||
.fseventsd
|
||||
.Spotlight-V100
|
||||
.TemporaryItems
|
||||
.Trashes
|
||||
.VolumeIcon.icns
|
||||
.com.apple.timemachine.donotpresent
|
||||
.AppleDB
|
||||
.AppleDesktop
|
||||
Network Trash Folder
|
||||
Temporary Items
|
||||
.apdisk
|
||||
|
||||
# Windows
|
||||
Thumbs.db
|
||||
Thumbs.db:encryptable
|
||||
ehthumbs.db
|
||||
ehthumbs_vista.db
|
||||
*.stackdump
|
||||
[Dd]esktop.ini
|
||||
$RECYCLE.BIN/
|
||||
*.cab
|
||||
*.msi
|
||||
*.msix
|
||||
*.msm
|
||||
*.msp
|
||||
*.lnk
|
||||
|
||||
# Linux
|
||||
*~
|
||||
.fuse_hidden*
|
||||
.directory
|
||||
.Trash-*
|
||||
.nfs*
|
||||
|
||||
# -------------------- Docker --------------------
|
||||
# Docker volumes and data
|
||||
docker-volumes/
|
||||
.dockerignore.bak
|
||||
|
||||
# Docker Compose override files
|
||||
docker-compose.override.yml
|
||||
docker-compose.override.yaml
|
||||
|
||||
# -------------------- Database --------------------
|
||||
# SQLite
|
||||
*.sqlite
|
||||
*.sqlite3
|
||||
*.db
|
||||
*.db-journal
|
||||
*.db-shm
|
||||
*.db-wal
|
||||
|
||||
# PostgreSQL
|
||||
*.sql.backup
|
||||
|
||||
# -------------------- Logs --------------------
|
||||
# General logs
|
||||
*.log
|
||||
logs/
|
||||
*.log.*
|
||||
|
||||
# -------------------- FuzzForge Specific --------------------
|
||||
# FuzzForge project directories (user projects should manage their own .gitignore)
|
||||
.fuzzforge/
|
||||
|
||||
# Test project databases and configurations
|
||||
test_projects/*/.fuzzforge/
|
||||
test_projects/*/findings.db*
|
||||
test_projects/*/config.yaml
|
||||
test_projects/*/.gitignore
|
||||
|
||||
# Local development configurations
|
||||
local_config.yaml
|
||||
dev_config.yaml
|
||||
.env.local
|
||||
.env.development
|
||||
|
||||
# Generated reports and outputs
|
||||
reports/
|
||||
output/
|
||||
findings/
|
||||
*.sarif.json
|
||||
*.html.report
|
||||
security_report.*
|
||||
|
||||
# Temporary files
|
||||
tmp/
|
||||
temp/
|
||||
*.tmp
|
||||
*.temp
|
||||
|
||||
# Backup files
|
||||
*.bak
|
||||
*.backup
|
||||
*~
|
||||
|
||||
# -------------------- Node.js (for any JS tooling) --------------------
|
||||
node_modules/
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
.npm
|
||||
|
||||
# -------------------- Security --------------------
|
||||
# Never commit these files
|
||||
*.pem
|
||||
*.key
|
||||
*.p12
|
||||
*.pfx
|
||||
secret*
|
||||
secrets/
|
||||
credentials*
|
||||
api_keys*
|
||||
.env.production
|
||||
.env.staging
|
||||
|
||||
# AWS credentials
|
||||
.aws/
|
||||
|
||||
# -------------------- Build Artifacts --------------------
|
||||
# Python builds
|
||||
build/
|
||||
dist/
|
||||
*.wheel
|
||||
|
||||
# Documentation builds
|
||||
docs/_build/
|
||||
site/
|
||||
|
||||
# -------------------- Miscellaneous --------------------
|
||||
# Jupyter Notebook checkpoints
|
||||
.ipynb_checkpoints
|
||||
|
||||
# IPython history
|
||||
.ipython/
|
||||
|
||||
# Rope project settings
|
||||
.ropeproject
|
||||
|
||||
# spyderproject
|
||||
.spyderproject
|
||||
.spyproject
|
||||
|
||||
# mkdocs documentation
|
||||
/site
|
||||
|
||||
# Local Netlify folder
|
||||
.netlify
|
||||
|
||||
# -------------------- Project Specific Overrides --------------------
|
||||
# Allow specific test project files that should be tracked
|
||||
!test_projects/*/src/
|
||||
!test_projects/*/scripts/
|
||||
!test_projects/*/config/
|
||||
!test_projects/*/data/
|
||||
!test_projects/*/README.md
|
||||
!test_projects/*/*.py
|
||||
!test_projects/*/*.js
|
||||
!test_projects/*/*.php
|
||||
!test_projects/*/*.java
|
||||
|
||||
# But exclude their sensitive content
|
||||
test_projects/*/.env
|
||||
test_projects/*/private_key.pem
|
||||
test_projects/*/wallet.json
|
||||
test_projects/*/.npmrc
|
||||
test_projects/*/.git-credentials
|
||||
test_projects/*/credentials.*
|
||||
test_projects/*/api_keys.*
|
||||
@@ -1 +0,0 @@
|
||||
3.14.2
|
||||
520
CONTRIBUTING.md
520
CONTRIBUTING.md
@@ -1,21 +1,17 @@
|
||||
# Contributing to FuzzForge AI
|
||||
# Contributing to FuzzForge 🤝
|
||||
|
||||
Thank you for your interest in contributing to FuzzForge AI! We welcome contributions from the community and are excited to collaborate with you.
|
||||
Thank you for your interest in contributing to FuzzForge! We welcome contributions from the community and are excited to collaborate with you.
|
||||
|
||||
**Our Vision**: FuzzForge aims to be a **universal platform for security research** across all cybersecurity domains. Through our modular architecture, any security tool—from fuzzing engines to cloud scanners, from mobile app analyzers to IoT security tools—can be integrated as a containerized module and controlled via AI agents.
|
||||
## 🌟 Ways to Contribute
|
||||
|
||||
## Ways to Contribute
|
||||
- 🐛 **Bug Reports** - Help us identify and fix issues
|
||||
- 💡 **Feature Requests** - Suggest new capabilities and improvements
|
||||
- 🔧 **Code Contributions** - Submit bug fixes, features, and enhancements
|
||||
- 📚 **Documentation** - Improve guides, tutorials, and API documentation
|
||||
- 🧪 **Testing** - Help test new features and report issues
|
||||
- 🛡️ **Security Workflows** - Contribute new security analysis workflows
|
||||
|
||||
- **Security Modules** - Create modules for any cybersecurity domain (AppSec, NetSec, Cloud, IoT, etc.)
|
||||
- **Bug Reports** - Help us identify and fix issues
|
||||
- **Feature Requests** - Suggest new capabilities and improvements
|
||||
- **Core Features** - Contribute to the MCP server, runner, or CLI
|
||||
- **Documentation** - Improve guides, tutorials, and module documentation
|
||||
- **Testing** - Help test new features and report issues
|
||||
- **AI Integration** - Improve MCP tools and AI agent interactions
|
||||
- **Tool Integrations** - Wrap existing security tools as FuzzForge modules
|
||||
|
||||
## Contribution Guidelines
|
||||
## 📋 Contribution Guidelines
|
||||
|
||||
### Code Style
|
||||
|
||||
@@ -48,10 +44,9 @@ We use conventional commits for clear history:
|
||||
|
||||
**Examples:**
|
||||
```
|
||||
feat(modules): add cloud security scanner module
|
||||
fix(mcp): resolve module listing timeout
|
||||
docs(sdk): update module development guide
|
||||
test(runner): add container execution tests
|
||||
feat(workflows): add new static analysis workflow for Go
|
||||
fix(api): resolve authentication timeout issue
|
||||
docs(readme): update installation instructions
|
||||
```
|
||||
|
||||
### Pull Request Process
|
||||
@@ -70,14 +65,9 @@ test(runner): add container execution tests
|
||||
|
||||
3. **Test Your Changes**
|
||||
```bash
|
||||
# Test modules
|
||||
FUZZFORGE_MODULES_PATH=./fuzzforge-modules uv run fuzzforge modules list
|
||||
|
||||
# Run a module
|
||||
uv run fuzzforge modules run your-module --assets ./test-assets
|
||||
|
||||
# Test MCP integration (if applicable)
|
||||
uv run fuzzforge mcp status
|
||||
# Test workflows
|
||||
cd test_projects/vulnerable_app/
|
||||
ff workflow security_assessment .
|
||||
```
|
||||
|
||||
4. **Submit Pull Request**
|
||||
@@ -86,353 +76,64 @@ test(runner): add container execution tests
|
||||
- Link related issues using `Fixes #123` or `Closes #123`
|
||||
- Ensure all CI checks pass
|
||||
|
||||
## Module Development
|
||||
## 🛡️ Security Workflow Development
|
||||
|
||||
FuzzForge uses a modular architecture where security tools run as isolated containers. The `fuzzforge-modules-sdk` provides everything you need to create new modules.
|
||||
### Creating New Workflows
|
||||
|
||||
**Documentation:**
|
||||
- [Module SDK Documentation](fuzzforge-modules/fuzzforge-modules-sdk/README.md) - Complete SDK reference
|
||||
- [Module Template](fuzzforge-modules/fuzzforge-module-template/) - Starting point for new modules
|
||||
- [USAGE Guide](USAGE.md) - Setup and installation instructions
|
||||
|
||||
### Creating a New Module
|
||||
|
||||
1. **Use the Module Template**
|
||||
```bash
|
||||
# Generate a new module from template
|
||||
cd fuzzforge-modules/
|
||||
cp -r fuzzforge-module-template my-new-module
|
||||
cd my-new-module
|
||||
1. **Workflow Structure**
|
||||
```
|
||||
backend/toolbox/workflows/your_workflow/
|
||||
├── __init__.py
|
||||
├── workflow.py # Main Prefect flow
|
||||
├── metadata.yaml # Workflow metadata
|
||||
└── Dockerfile # Container definition
|
||||
```
|
||||
|
||||
2. **Module Structure**
|
||||
```
|
||||
my-new-module/
|
||||
├── Dockerfile # Container definition
|
||||
├── Makefile # Build commands
|
||||
├── README.md # Module documentation
|
||||
├── pyproject.toml # Python dependencies
|
||||
├── mypy.ini # Type checking config
|
||||
├── ruff.toml # Linting config
|
||||
└── src/
|
||||
└── module/
|
||||
├── __init__.py
|
||||
├── __main__.py # Entry point
|
||||
├── mod.py # Main module logic
|
||||
├── models.py # Pydantic models
|
||||
└── settings.py # Configuration
|
||||
```
|
||||
|
||||
3. **Implement Your Module**
|
||||
|
||||
Edit `src/module/mod.py`:
|
||||
2. **Register Your Workflow**
|
||||
Add your workflow to `backend/toolbox/workflows/registry.py`:
|
||||
```python
|
||||
from fuzzforge_modules_sdk.api.modules import BaseModule
|
||||
from fuzzforge_modules_sdk.api.models import ModuleResult
|
||||
from .models import MyModuleConfig, MyModuleOutput
|
||||
|
||||
class MyModule(BaseModule[MyModuleConfig, MyModuleOutput]):
|
||||
"""Your module description."""
|
||||
|
||||
def execute(self) -> ModuleResult[MyModuleOutput]:
|
||||
"""Main execution logic."""
|
||||
# Access input assets
|
||||
assets = self.input_path
|
||||
|
||||
# Your security tool logic here
|
||||
results = self.run_analysis(assets)
|
||||
|
||||
# Return structured results
|
||||
return ModuleResult(
|
||||
success=True,
|
||||
output=MyModuleOutput(
|
||||
findings=results,
|
||||
summary="Analysis complete"
|
||||
)
|
||||
)
|
||||
# Import your workflow
|
||||
from .your_workflow.workflow import main_flow as your_workflow_flow
|
||||
|
||||
# Add to registry
|
||||
WORKFLOW_REGISTRY["your_workflow"] = {
|
||||
"flow": your_workflow_flow,
|
||||
"module_path": "toolbox.workflows.your_workflow.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Description of your workflow",
|
||||
"version": "1.0.0",
|
||||
"author": "Your Name",
|
||||
"tags": ["tag1", "tag2"]
|
||||
}
|
||||
```
|
||||
|
||||
4. **Define Configuration Models**
|
||||
|
||||
Edit `src/module/models.py`:
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
from fuzzforge_modules_sdk.api.models import BaseModuleConfig, BaseModuleOutput
|
||||
|
||||
class MyModuleConfig(BaseModuleConfig):
|
||||
"""Configuration for your module."""
|
||||
timeout: int = Field(default=300, description="Timeout in seconds")
|
||||
max_iterations: int = Field(default=1000, description="Max iterations")
|
||||
|
||||
class MyModuleOutput(BaseModuleOutput):
|
||||
"""Output from your module."""
|
||||
findings: list[dict] = Field(default_factory=list)
|
||||
coverage: float = Field(default=0.0)
|
||||
```
|
||||
|
||||
5. **Build Your Module**
|
||||
```bash
|
||||
# Build the SDK first (if not already done)
|
||||
cd ../fuzzforge-modules-sdk
|
||||
uv build
|
||||
mkdir -p .wheels
|
||||
cp ../../dist/fuzzforge_modules_sdk-*.whl .wheels/
|
||||
cd ../..
|
||||
docker build -t localhost/fuzzforge-modules-sdk:0.1.0 fuzzforge-modules/fuzzforge-modules-sdk/
|
||||
|
||||
# Build your module
|
||||
cd fuzzforge-modules/my-new-module
|
||||
docker build -t fuzzforge-my-new-module:0.1.0 .
|
||||
```
|
||||
|
||||
6. **Test Your Module**
|
||||
```bash
|
||||
# Run with test assets
|
||||
uv run fuzzforge modules run my-new-module --assets ./test-assets
|
||||
|
||||
# Check module info
|
||||
uv run fuzzforge modules info my-new-module
|
||||
```
|
||||
|
||||
### Module Development Guidelines
|
||||
|
||||
**Important Conventions:**
|
||||
- **Input/Output**: Use `/fuzzforge/input` for assets and `/fuzzforge/output` for results
|
||||
- **Configuration**: Support JSON configuration via stdin or file
|
||||
- **Logging**: Use structured logging (structlog is pre-configured)
|
||||
- **Error Handling**: Return proper exit codes and error messages
|
||||
- **Security**: Run as non-root user when possible
|
||||
- **Documentation**: Include clear README with usage examples
|
||||
- **Dependencies**: Minimize container size, use multi-stage builds
|
||||
|
||||
**See also:**
|
||||
- [Module SDK API Reference](fuzzforge-modules/fuzzforge-modules-sdk/src/fuzzforge_modules_sdk/api/)
|
||||
- [Dockerfile Best Practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
|
||||
|
||||
### Module Types
|
||||
|
||||
FuzzForge is designed to support modules across **all cybersecurity domains**. The modular architecture allows any security tool to be containerized and integrated. Here are the main categories:
|
||||
|
||||
**Application Security**
|
||||
- Fuzzing engines (coverage-guided, grammar-based, mutation-based)
|
||||
- Static analysis (SAST, code quality, dependency scanning)
|
||||
- Dynamic analysis (DAST, runtime analysis, instrumentation)
|
||||
- Test validation and coverage analysis
|
||||
- Crash analysis and exploit detection
|
||||
|
||||
**Network & Infrastructure Security**
|
||||
- Network scanning and service enumeration
|
||||
- Protocol analysis and fuzzing
|
||||
- Firewall and configuration testing
|
||||
- Cloud security (AWS/Azure/GCP misconfiguration detection, IAM analysis)
|
||||
- Container security (image scanning, Kubernetes security)
|
||||
|
||||
**Web & API Security**
|
||||
- Web vulnerability scanners (XSS, SQL injection, CSRF)
|
||||
- Authentication and session testing
|
||||
- API security (REST/GraphQL/gRPC testing, fuzzing)
|
||||
- SSL/TLS analysis
|
||||
|
||||
**Binary & Reverse Engineering**
|
||||
- Binary analysis and disassembly
|
||||
- Malware sandboxing and behavior analysis
|
||||
- Exploit development tools
|
||||
- Firmware extraction and analysis
|
||||
|
||||
**Mobile & IoT Security**
|
||||
- Mobile app analysis (Android/iOS static/dynamic analysis)
|
||||
- IoT device security and firmware analysis
|
||||
- SCADA/ICS and industrial protocol testing
|
||||
- Automotive security (CAN bus, ECU testing)
|
||||
|
||||
**Data & Compliance**
|
||||
- Database security testing
|
||||
- Encryption and cryptography analysis
|
||||
- Secrets and credential detection
|
||||
- Privacy tools (PII detection, GDPR compliance)
|
||||
- Compliance checkers (PCI-DSS, HIPAA, SOC2, ISO27001)
|
||||
|
||||
**Threat Intelligence & Risk**
|
||||
- OSINT and reconnaissance tools
|
||||
- Threat hunting and IOC correlation
|
||||
- Risk assessment and attack surface mapping
|
||||
- Security audit and policy validation
|
||||
|
||||
**Emerging Technologies**
|
||||
- AI/ML security (model poisoning, adversarial testing)
|
||||
- Blockchain and smart contract analysis
|
||||
- Quantum-safe cryptography testing
|
||||
|
||||
**Custom & Integration**
|
||||
- Domain-specific security tools
|
||||
- Bridges to existing security tools
|
||||
- Multi-tool orchestration and result aggregation
|
||||
|
||||
### Example: Simple Security Scanner Module
|
||||
|
||||
```python
|
||||
# src/module/mod.py
|
||||
from pathlib import Path
|
||||
from fuzzforge_modules_sdk.api.modules import BaseModule
|
||||
from fuzzforge_modules_sdk.api.models import ModuleResult
|
||||
from .models import ScannerConfig, ScannerOutput
|
||||
|
||||
class SecurityScanner(BaseModule[ScannerConfig, ScannerOutput]):
|
||||
"""Scans for common security issues in code."""
|
||||
|
||||
def execute(self) -> ModuleResult[ScannerOutput]:
|
||||
findings = []
|
||||
|
||||
# Scan all source files
|
||||
for file_path in self.input_path.rglob("*"):
|
||||
if file_path.is_file():
|
||||
findings.extend(self.scan_file(file_path))
|
||||
|
||||
return ModuleResult(
|
||||
success=True,
|
||||
output=ScannerOutput(
|
||||
findings=findings,
|
||||
files_scanned=len(list(self.input_path.rglob("*")))
|
||||
)
|
||||
)
|
||||
|
||||
def scan_file(self, path: Path) -> list[dict]:
|
||||
"""Scan a single file for security issues."""
|
||||
# Your scanning logic here
|
||||
return []
|
||||
```
|
||||
|
||||
### Testing Modules
|
||||
|
||||
Create tests in `tests/`:
|
||||
```python
|
||||
import pytest
|
||||
from module.mod import MyModule
|
||||
from module.models import MyModuleConfig
|
||||
|
||||
def test_module_execution():
|
||||
config = MyModuleConfig(timeout=60)
|
||||
module = MyModule(config=config, input_path=Path("test_assets"))
|
||||
result = module.execute()
|
||||
|
||||
assert result.success
|
||||
assert len(result.output.findings) >= 0
|
||||
```
|
||||
|
||||
Run tests:
|
||||
```bash
|
||||
uv run pytest
|
||||
```
|
||||
3. **Testing Workflows**
|
||||
- Create test cases in `test_projects/vulnerable_app/`
|
||||
- Ensure SARIF output format compliance
|
||||
- Test with various input scenarios
|
||||
|
||||
### Security Guidelines
|
||||
|
||||
**Critical Requirements:**
|
||||
- Never commit secrets, API keys, or credentials
|
||||
- Focus on **defensive security** tools and analysis
|
||||
- Do not create tools for malicious purposes
|
||||
- Test modules thoroughly before submission
|
||||
- Follow responsible disclosure for security issues
|
||||
- Use minimal, secure base images for containers
|
||||
- Avoid running containers as root when possible
|
||||
- 🔐 Never commit secrets, API keys, or credentials
|
||||
- 🛡️ Focus on **defensive security** tools and analysis
|
||||
- ⚠️ Do not create tools for malicious purposes
|
||||
- 🧪 Test workflows thoroughly before submission
|
||||
- 📋 Follow responsible disclosure for security issues
|
||||
|
||||
**Security Resources:**
|
||||
- [OWASP Container Security](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html)
|
||||
- [CIS Docker Benchmarks](https://www.cisecurity.org/benchmark/docker)
|
||||
|
||||
## Contributing to Core Features
|
||||
|
||||
Beyond modules, you can contribute to FuzzForge's core components.
|
||||
|
||||
**Useful Resources:**
|
||||
- [Project Structure](README.md) - Overview of the codebase
|
||||
- [USAGE Guide](USAGE.md) - Installation and setup
|
||||
- Python best practices: [PEP 8](https://pep8.org/)
|
||||
|
||||
### Core Components
|
||||
|
||||
- **fuzzforge-mcp** - MCP server for AI agent integration
|
||||
- **fuzzforge-runner** - Module execution engine
|
||||
- **fuzzforge-cli** - Command-line interface
|
||||
- **fuzzforge-common** - Shared utilities and sandbox engines
|
||||
- **fuzzforge-types** - Type definitions and schemas
|
||||
|
||||
### Development Setup
|
||||
|
||||
1. **Clone and Install**
|
||||
```bash
|
||||
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
|
||||
cd fuzzforge_ai
|
||||
uv sync --all-extras
|
||||
```
|
||||
|
||||
2. **Run Tests**
|
||||
```bash
|
||||
# Run all tests
|
||||
make test
|
||||
|
||||
# Run specific package tests
|
||||
cd fuzzforge-mcp
|
||||
uv run pytest
|
||||
```
|
||||
|
||||
3. **Type Checking**
|
||||
```bash
|
||||
# Type check all packages
|
||||
make typecheck
|
||||
|
||||
# Type check specific package
|
||||
cd fuzzforge-runner
|
||||
uv run mypy .
|
||||
```
|
||||
|
||||
4. **Linting and Formatting**
|
||||
```bash
|
||||
# Format code
|
||||
make format
|
||||
|
||||
# Lint code
|
||||
make lint
|
||||
```
|
||||
|
||||
## Bug Reports
|
||||
## 🐛 Bug Reports
|
||||
|
||||
When reporting bugs, please include:
|
||||
|
||||
- **Environment**: OS, Python version, Docker version, uv version
|
||||
- **FuzzForge Version**: Output of `uv run fuzzforge --version`
|
||||
- **Module**: Which module or component is affected
|
||||
- **Environment**: OS, Python version, Docker version
|
||||
- **Steps to Reproduce**: Clear steps to recreate the issue
|
||||
- **Expected Behavior**: What should happen
|
||||
- **Actual Behavior**: What actually happens
|
||||
- **Logs**: Relevant error messages and stack traces
|
||||
- **Container Logs**: For module issues, include Docker/Podman logs
|
||||
- **Screenshots**: If applicable
|
||||
|
||||
**Example:**
|
||||
```markdown
|
||||
**Environment:**
|
||||
- OS: Ubuntu 22.04
|
||||
- Python: 3.14.2
|
||||
- Docker: 24.0.7
|
||||
- uv: 0.5.13
|
||||
Use our [Bug Report Template](.github/ISSUE_TEMPLATE/bug_report.md).
|
||||
|
||||
**Module:** my-custom-scanner
|
||||
|
||||
**Steps to Reproduce:**
|
||||
1. Run `uv run fuzzforge modules run my-scanner --assets ./test-target`
|
||||
2. Module fails with timeout error
|
||||
|
||||
**Expected:** Module completes analysis
|
||||
**Actual:** Times out after 30 seconds
|
||||
|
||||
**Logs:**
|
||||
```
|
||||
ERROR: Module execution timeout
|
||||
...
|
||||
```
|
||||
```
|
||||
|
||||
## Feature Requests
|
||||
## 💡 Feature Requests
|
||||
|
||||
For new features, please provide:
|
||||
|
||||
@@ -440,124 +141,33 @@ For new features, please provide:
|
||||
- **Proposed Solution**: How should it work?
|
||||
- **Alternatives**: Other approaches considered
|
||||
- **Implementation**: Technical considerations (optional)
|
||||
- **Module vs Core**: Should this be a module or core feature?
|
||||
|
||||
**Example Feature Requests:**
|
||||
- New module for cloud security posture management (CSPM)
|
||||
- Module for analyzing smart contract vulnerabilities
|
||||
- MCP tool for orchestrating multi-module workflows
|
||||
- CLI command for batch module execution across multiple targets
|
||||
- Support for distributed fuzzing campaigns
|
||||
- Integration with CI/CD pipelines
|
||||
- Module marketplace/registry features
|
||||
Use our [Feature Request Template](.github/ISSUE_TEMPLATE/feature_request.md).
|
||||
|
||||
## Documentation
|
||||
## 📚 Documentation
|
||||
|
||||
Help improve our documentation:
|
||||
|
||||
- **Module Documentation**: Document your modules in their README.md
|
||||
- **API Documentation**: Update docstrings and type hints
|
||||
- **User Guides**: Improve USAGE.md and tutorial content
|
||||
- **Module SDK Guides**: Help document the SDK for module developers
|
||||
- **MCP Integration**: Document AI agent integration patterns
|
||||
- **Examples**: Add practical usage examples and workflows
|
||||
- **User Guides**: Create tutorials and how-to guides
|
||||
- **Workflow Documentation**: Document new security workflows
|
||||
- **Examples**: Add practical usage examples
|
||||
|
||||
### Documentation Standards
|
||||
|
||||
- Use clear, concise language
|
||||
- Include code examples
|
||||
- Add command-line examples with expected output
|
||||
- Document all configuration options
|
||||
- Explain error messages and troubleshooting
|
||||
|
||||
### Module README Template
|
||||
|
||||
```markdown
|
||||
# Module Name
|
||||
|
||||
Brief description of what this module does.
|
||||
|
||||
## Features
|
||||
|
||||
- Feature 1
|
||||
- Feature 2
|
||||
|
||||
## Configuration
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| timeout | int | 300 | Timeout in seconds |
|
||||
|
||||
## Usage
|
||||
|
||||
\`\`\`bash
|
||||
uv run fuzzforge modules run module-name --assets ./path/to/assets
|
||||
\`\`\`
|
||||
|
||||
## Output
|
||||
|
||||
Describes the output structure and format.
|
||||
|
||||
## Examples
|
||||
|
||||
Practical usage examples.
|
||||
```
|
||||
|
||||
## Recognition
|
||||
## 🙏 Recognition
|
||||
|
||||
Contributors will be:
|
||||
|
||||
- Listed in our [Contributors](CONTRIBUTORS.md) file
|
||||
- Mentioned in release notes for significant contributions
|
||||
- Credited in module documentation (for module authors)
|
||||
- Invited to join our [Discord community](https://discord.gg/8XEX33UUwZ)
|
||||
- Invited to join our Discord community
|
||||
- Eligible for FuzzingLabs Academy courses and swag
|
||||
|
||||
## Module Submission Checklist
|
||||
## 📜 License
|
||||
|
||||
Before submitting a new module:
|
||||
|
||||
- [ ] Module follows SDK structure and conventions
|
||||
- [ ] Dockerfile builds successfully
|
||||
- [ ] Module executes without errors
|
||||
- [ ] Configuration options are documented
|
||||
- [ ] README.md is complete with examples
|
||||
- [ ] Tests are included (pytest)
|
||||
- [ ] Type hints are used throughout
|
||||
- [ ] Linting passes (ruff)
|
||||
- [ ] Security best practices followed
|
||||
- [ ] No secrets or credentials in code
|
||||
- [ ] License headers included
|
||||
|
||||
## Review Process
|
||||
|
||||
1. **Initial Review** - Maintainers review for completeness
|
||||
2. **Technical Review** - Code quality and security assessment
|
||||
3. **Testing** - Module tested in isolated environment
|
||||
4. **Documentation Review** - Ensure docs are clear and complete
|
||||
5. **Approval** - Module merged and included in next release
|
||||
|
||||
## License
|
||||
|
||||
By contributing to FuzzForge AI, you agree that your contributions will be licensed under the same license as the project (see [LICENSE](LICENSE)).
|
||||
|
||||
For module contributions:
|
||||
- Modules you create remain under the project license
|
||||
- You retain credit as the module author
|
||||
- Your module may be used by others under the project license terms
|
||||
By contributing to FuzzForge, you agree that your contributions will be licensed under the same [Business Source License 1.1](LICENSE) as the project.
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
**Thank you for making FuzzForge better! 🚀**
|
||||
|
||||
Need help contributing?
|
||||
|
||||
- Join our [Discord](https://discord.gg/8XEX33UUwZ)
|
||||
- Read the [Module SDK Documentation](fuzzforge-modules/fuzzforge-modules-sdk/README.md)
|
||||
- Check the module template for examples
|
||||
- Contact: contact@fuzzinglabs.com
|
||||
|
||||
---
|
||||
|
||||
**Thank you for making FuzzForge better!**
|
||||
|
||||
Every contribution, no matter how small, helps build a stronger security research platform. Whether you're creating a module for web security, cloud scanning, mobile analysis, or any other cybersecurity domain, your work makes FuzzForge more powerful and versatile for the entire security community!
|
||||
Every contribution, no matter how small, helps build a stronger security community.
|
||||
|
||||
78
Makefile
78
Makefile
@@ -1,78 +0,0 @@
|
||||
.PHONY: help install sync format lint typecheck test build-hub-images clean
|
||||
|
||||
SHELL := /bin/bash
|
||||
|
||||
# Default target
|
||||
help:
|
||||
@echo "FuzzForge AI Development Commands"
|
||||
@echo ""
|
||||
@echo " make install - Install all dependencies"
|
||||
@echo " make sync - Sync shared packages from upstream"
|
||||
@echo " make format - Format code with ruff"
|
||||
@echo " make lint - Lint code with ruff"
|
||||
@echo " make typecheck - Type check with mypy"
|
||||
@echo " make test - Run all tests"
|
||||
@echo " make build-hub-images - Build all mcp-security-hub images"
|
||||
@echo " make clean - Clean build artifacts"
|
||||
@echo ""
|
||||
|
||||
# Install all dependencies
|
||||
install:
|
||||
uv sync
|
||||
|
||||
# Sync shared packages from upstream fuzzforge-core
|
||||
sync:
|
||||
@if [ -z "$(UPSTREAM)" ]; then \
|
||||
echo "Usage: make sync UPSTREAM=/path/to/fuzzforge-core"; \
|
||||
exit 1; \
|
||||
fi
|
||||
./scripts/sync-upstream.sh $(UPSTREAM)
|
||||
|
||||
# Format all packages
|
||||
format:
|
||||
@for pkg in packages/fuzzforge-*/; do \
|
||||
if [ -f "$$pkg/pyproject.toml" ]; then \
|
||||
echo "Formatting $$pkg..."; \
|
||||
cd "$$pkg" && uv run ruff format . && cd -; \
|
||||
fi \
|
||||
done
|
||||
|
||||
# Lint all packages
|
||||
lint:
|
||||
@for pkg in packages/fuzzforge-*/; do \
|
||||
if [ -f "$$pkg/pyproject.toml" ]; then \
|
||||
echo "Linting $$pkg..."; \
|
||||
cd "$$pkg" && uv run ruff check . && cd -; \
|
||||
fi \
|
||||
done
|
||||
|
||||
# Type check all packages
|
||||
typecheck:
|
||||
@for pkg in packages/fuzzforge-*/; do \
|
||||
if [ -f "$$pkg/pyproject.toml" ] && [ -f "$$pkg/mypy.ini" ]; then \
|
||||
echo "Type checking $$pkg..."; \
|
||||
cd "$$pkg" && uv run mypy . && cd -; \
|
||||
fi \
|
||||
done
|
||||
|
||||
# Run all tests
|
||||
test:
|
||||
@for pkg in packages/fuzzforge-*/; do \
|
||||
if [ -f "$$pkg/pytest.ini" ]; then \
|
||||
echo "Testing $$pkg..."; \
|
||||
cd "$$pkg" && uv run pytest && cd -; \
|
||||
fi \
|
||||
done
|
||||
|
||||
# Build all mcp-security-hub images for the firmware analysis pipeline
|
||||
build-hub-images:
|
||||
@bash scripts/build-hub-images.sh
|
||||
|
||||
# Clean build artifacts
|
||||
clean:
|
||||
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
|
||||
find . -type d -name ".pytest_cache" -exec rm -rf {} + 2>/dev/null || true
|
||||
find . -type d -name ".mypy_cache" -exec rm -rf {} + 2>/dev/null || true
|
||||
find . -type d -name ".ruff_cache" -exec rm -rf {} + 2>/dev/null || true
|
||||
find . -type d -name "*.egg-info" -exec rm -rf {} + 2>/dev/null || true
|
||||
find . -type f -name "*.pyc" -delete 2>/dev/null || true
|
||||
341
README.md
341
README.md
@@ -1,266 +1,215 @@
|
||||
<h1 align="center"> FuzzForge AI</h1>
|
||||
<h3 align="center">AI-Powered Security Research Orchestration via MCP</h3>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://discord.gg/8XEX33UUwZ"><img src="https://img.shields.io/discord/1420767905255133267?logo=discord&label=Discord" alt="Discord"></a>
|
||||
<a href="LICENSE"><img src="https://img.shields.io/badge/license-BSL%201.1-blue" alt="License: BSL 1.1"></a>
|
||||
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="Python 3.12+"/></a>
|
||||
<a href="https://modelcontextprotocol.io"><img src="https://img.shields.io/badge/MCP-compatible-green" alt="MCP Compatible"/></a>
|
||||
<a href="https://fuzzforge.ai"><img src="https://img.shields.io/badge/Website-fuzzforge.ai-purple" alt="Website"/></a>
|
||||
<img src="docs/static/img/fuzzforge_banner_github.png" alt="FuzzForge Banner" width="100%">
|
||||
</p>
|
||||
<h1 align="center">🚧 FuzzForge is under active development</h1>
|
||||
|
||||
<p align="center"><strong>AI-powered workflow automation and AI Agents for AppSec, Fuzzing & Offensive Security</strong></p>
|
||||
|
||||
<p align="center">
|
||||
<strong>Let AI agents orchestrate your security research workflows locally</strong>
|
||||
<a href="https://discord.com/invite/acqv9FVG"><img src="https://img.shields.io/discord/1420767905255133267?logo=discord&label=Discord" alt="Discord"></a>
|
||||
<a href="LICENSE"><img src="https://img.shields.io/badge/license-BSL%20%2B%20Apache-orange" alt="License: BSL + Apache"></a>
|
||||
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11%2B-blue" alt="Python 3.11+"/></a>
|
||||
<a href="https://fuzzforge.ai"><img src="https://img.shields.io/badge/Website-fuzzforge.ai-blue" alt="Website"/></a>
|
||||
<img src="https://img.shields.io/badge/version-0.6.0-green" alt="Version">
|
||||
<a href="https://github.com/FuzzingLabs/fuzzforge_ai/stargazers"><img src="https://img.shields.io/github/stars/FuzzingLabs/fuzzforge_ai?style=social" alt="GitHub Stars"></a>
|
||||
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<sub>
|
||||
<a href="#-overview"><b>Overview</b></a> •
|
||||
<a href="#-features"><b>Features</b></a> •
|
||||
<a href="#-mcp-security-hub"><b>Security Hub</b></a> •
|
||||
<a href="#-installation"><b>Installation</b></a> •
|
||||
<a href="USAGE.md"><b>Usage Guide</b></a> •
|
||||
<a href="#-contributing"><b>Contributing</b></a>
|
||||
<a href="#-overview"><b>Overview</b></a>
|
||||
• <a href="#-key-features"><b>Features</b></a>
|
||||
• <a href="#-installation"><b>Installation</b></a>
|
||||
• <a href="#-quickstart"><b>Quickstart</b></a>
|
||||
• <a href="#ai-powered-workflow-execution"><b>AI Demo</b></a>
|
||||
• <a href="#-contributing"><b>Contributing</b></a>
|
||||
• <a href="#%EF%B8%8F-roadmap"><b>Roadmap</b></a>
|
||||
</sub>
|
||||
</p>
|
||||
|
||||
---
|
||||
|
||||
> 🚧 **FuzzForge AI is under active development.** Expect breaking changes and new features!
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Overview
|
||||
|
||||
**FuzzForge AI** is an open-source MCP server that enables AI agents (GitHub Copilot, Claude, etc.) to orchestrate security research workflows through the **Model Context Protocol (MCP)**.
|
||||
**FuzzForge** helps security researchers and engineers automate **application security** and **offensive security** workflows with the power of AI and fuzzing frameworks.
|
||||
|
||||
FuzzForge connects your AI assistant to **MCP tool hubs** — collections of containerized security tools that the agent can discover, chain, and execute autonomously. Instead of manually running security tools, describe what you want and let your AI assistant handle it.
|
||||
- Orchestrate static & dynamic analysis
|
||||
- Automate vulnerability research
|
||||
- Scale AppSec testing with AI agents
|
||||
- Build, share & reuse workflows across teams
|
||||
|
||||
### The Core: Hub Architecture
|
||||
FuzzForge is **open source**, built to empower security teams, researchers, and the community.
|
||||
|
||||
FuzzForge acts as a **meta-MCP server** — a single MCP endpoint that gives your AI agent access to tools from multiple MCP hub servers. Each hub server is a containerized security tool (Binwalk, YARA, Radare2, Nmap, etc.) that the agent can discover at runtime.
|
||||
|
||||
- **🔍 Discovery**: The agent lists available hub servers and discovers their tools
|
||||
- **🤖 AI-Native**: Hub tools provide agent context — usage tips, workflow guidance, and domain knowledge
|
||||
- **🔗 Composable**: Chain tools from different hubs into automated pipelines
|
||||
- **📦 Extensible**: Add your own MCP servers to the hub registry
|
||||
|
||||
### 🎬 Use Case: Firmware Vulnerability Research
|
||||
|
||||
> **Scenario**: Analyze a firmware image to find security vulnerabilities — fully automated by an AI agent.
|
||||
|
||||
```
|
||||
User: "Search for vulnerabilities in firmware.bin"
|
||||
|
||||
Agent → Binwalk: Extract filesystem from firmware image
|
||||
Agent → YARA: Scan extracted files for vulnerability patterns
|
||||
Agent → Radare2: Trace dangerous function calls in prioritized binaries
|
||||
Agent → Report: 8 vulnerabilities found (2 critical, 4 high, 2 medium)
|
||||
```
|
||||
|
||||
### 🎬 Use Case: Rust Fuzzing Pipeline
|
||||
|
||||
> **Scenario**: Fuzz a Rust crate to discover vulnerabilities using AI-assisted harness generation and parallel fuzzing.
|
||||
|
||||
```
|
||||
User: "Fuzz the blurhash crate for vulnerabilities"
|
||||
|
||||
Agent → Rust Analyzer: Identify fuzzable functions and attack surface
|
||||
Agent → Harness Gen: Generate and validate fuzzing harnesses
|
||||
Agent → Cargo Fuzzer: Run parallel coverage-guided fuzzing sessions
|
||||
Agent → Crash Analysis: Deduplicate and triage discovered crashes
|
||||
```
|
||||
> 🚧 FuzzForge is under active development. Expect breaking changes.
|
||||
|
||||
---
|
||||
|
||||
## ⭐ Support the Project
|
||||
|
||||
If you find FuzzForge useful, please **star the repo** to support development! 🚀
|
||||
|
||||
<a href="https://github.com/FuzzingLabs/fuzzforge_ai/stargazers">
|
||||
<img src="https://img.shields.io/github/stars/FuzzingLabs/fuzzforge_ai?style=social" alt="GitHub Stars">
|
||||
</a>
|
||||
|
||||
---
|
||||
|
||||
## ✨ Features
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| 🤖 **AI-Native** | Built for MCP — works with GitHub Copilot, Claude, and any MCP-compatible agent |
|
||||
| 🔌 **Hub System** | Connect to MCP tool hubs — each hub brings dozens of containerized security tools |
|
||||
| 🔍 **Tool Discovery** | Agents discover available tools at runtime with built-in usage guidance |
|
||||
| 🔗 **Pipelines** | Chain tools from different hubs into automated multi-step workflows |
|
||||
| 🔄 **Persistent Sessions** | Long-running tools (Radare2, fuzzers) with stateful container sessions |
|
||||
| 🏠 **Local First** | All execution happens on your machine — no cloud required |
|
||||
| 🔒 **Sandboxed** | Every tool runs in an isolated container via Docker or Podman |
|
||||
If you find FuzzForge useful, please star the repo to support development 🚀
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Architecture
|
||||
## ✨ Key Features
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ AI Agent (Copilot/Claude) │
|
||||
└───────────────────────────┬─────────────────────────────────────┘
|
||||
│ MCP Protocol (stdio)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ FuzzForge MCP Server │
|
||||
│ │
|
||||
│ Projects Hub Discovery Hub Execution │
|
||||
│ ┌──────────────┐ ┌──────────────────┐ ┌───────────────────┐ │
|
||||
│ │init_project │ │list_hub_servers │ │execute_hub_tool │ │
|
||||
│ │set_assets │ │discover_hub_tools│ │start_hub_server │ │
|
||||
│ │list_results │ │get_tool_schema │ │stop_hub_server │ │
|
||||
│ └──────────────┘ └──────────────────┘ └───────────────────┘ │
|
||||
└───────────────────────────┬─────────────────────────────────────┘
|
||||
│ Docker/Podman
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ MCP Hub Servers │
|
||||
│ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ Binwalk │ │ YARA │ │ Radare2 │ │ Nmap │ │
|
||||
│ │ 6 tools │ │ 5 tools │ │ 32 tools │ │ 8 tools │ │
|
||||
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ Nuclei │ │ SQLMap │ │ Trivy │ │ ... │ │
|
||||
│ │ 7 tools │ │ 8 tools │ │ 7 tools │ │ 36 hubs │ │
|
||||
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 MCP Security Hub
|
||||
|
||||
FuzzForge ships with built-in support for the **[MCP Security Hub](https://github.com/FuzzingLabs/mcp-security-hub)** — a collection of 36 production-ready, Dockerized MCP servers covering offensive security:
|
||||
|
||||
| Category | Servers | Examples |
|
||||
|----------|---------|----------|
|
||||
| 🔍 **Reconnaissance** | 8 | Nmap, Masscan, Shodan, WhatWeb |
|
||||
| 🌐 **Web Security** | 6 | Nuclei, SQLMap, ffuf, Nikto |
|
||||
| 🔬 **Binary Analysis** | 6 | Radare2, Binwalk, YARA, Capa, Ghidra |
|
||||
| ⛓️ **Blockchain** | 3 | Medusa, Solazy, DAML Viewer |
|
||||
| ☁️ **Cloud Security** | 3 | Trivy, Prowler, RoadRecon |
|
||||
| 💻 **Code Security** | 1 | Semgrep |
|
||||
| 🔑 **Secrets Detection** | 1 | Gitleaks |
|
||||
| 💥 **Exploitation** | 1 | SearchSploit |
|
||||
| 🎯 **Fuzzing** | 2 | Boofuzz, Dharma |
|
||||
| 🕵️ **OSINT** | 2 | Maigret, DNSTwist |
|
||||
| 🛡️ **Threat Intel** | 2 | VirusTotal, AlienVault OTX |
|
||||
| 🏰 **Active Directory** | 1 | BloodHound |
|
||||
|
||||
> 185+ individual tools accessible through a single MCP connection.
|
||||
|
||||
The hub is open source and can be extended with your own MCP servers. See the [mcp-security-hub repository](https://github.com/FuzzingLabs/mcp-security-hub) for details.
|
||||
- 🤖 **AI Agents for Security** – Specialized agents for AppSec, reversing, and fuzzing
|
||||
- 🛠 **Workflow Automation** – Define & execute AppSec workflows as code
|
||||
- 📈 **Vulnerability Research at Scale** – Rediscover 1-days & find 0-days with automation
|
||||
- 🔗 **Fuzzer Integration** – AFL, Honggfuzz, AFLnet, StateAFL & more
|
||||
- 🌐 **Community Marketplace** – Share workflows, corpora, PoCs, and modules
|
||||
- 🔒 **Enterprise Ready** – Team/Corp cloud tiers for scaling offensive security
|
||||
|
||||
---
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
### Prerequisites
|
||||
### Requirements
|
||||
|
||||
- **Python 3.12+**
|
||||
- **[uv](https://docs.astral.sh/uv/)** package manager
|
||||
- **Docker** ([Install Docker](https://docs.docker.com/get-docker/)) or Podman
|
||||
**Python 3.11+**
|
||||
Python 3.11 or higher is required.
|
||||
|
||||
### Quick Install
|
||||
**uv Package Manager**
|
||||
|
||||
```bash
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
```
|
||||
|
||||
**Docker**
|
||||
For containerized workflows, see the [Docker Installation Guide](https://docs.docker.com/get-docker/).
|
||||
|
||||
#### Configure Docker Daemon
|
||||
|
||||
Before running `docker compose up`, configure Docker to allow insecure registries (required for the local registry).
|
||||
|
||||
Add the following to your Docker daemon configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"insecure-registries": [
|
||||
"localhost:5000",
|
||||
"host.docker.internal:5001",
|
||||
"registry:5000"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**macOS (Docker Desktop):**
|
||||
1. Open Docker Desktop
|
||||
2. Go to Settings → Docker Engine
|
||||
3. Add the `insecure-registries` configuration to the JSON
|
||||
4. Click "Apply & Restart"
|
||||
|
||||
**Linux:**
|
||||
1. Edit `/etc/docker/daemon.json` (create if it doesn't exist):
|
||||
```bash
|
||||
sudo nano /etc/docker/daemon.json
|
||||
```
|
||||
2. Add the configuration above
|
||||
3. Restart Docker:
|
||||
```bash
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
### CLI Installation
|
||||
|
||||
After installing the requirements, install the FuzzForge CLI:
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
|
||||
git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
|
||||
cd fuzzforge_ai
|
||||
|
||||
# Install dependencies
|
||||
uv sync
|
||||
# Install CLI with uv (from the root directory)
|
||||
uv tool install --python python3.12 .
|
||||
```
|
||||
|
||||
### Link the Security Hub
|
||||
|
||||
```bash
|
||||
# Clone the MCP Security Hub
|
||||
git clone https://github.com/FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
|
||||
|
||||
# Build the Docker images for the hub tools
|
||||
./scripts/build-hub-images.sh
|
||||
```
|
||||
|
||||
Or use the terminal UI (`uv run fuzzforge ui`) to link hubs interactively.
|
||||
|
||||
### Configure MCP for Your AI Agent
|
||||
|
||||
```bash
|
||||
# For GitHub Copilot
|
||||
uv run fuzzforge mcp install copilot
|
||||
|
||||
# For Claude Code (CLI)
|
||||
uv run fuzzforge mcp install claude-code
|
||||
|
||||
# For Claude Desktop (standalone app)
|
||||
uv run fuzzforge mcp install claude-desktop
|
||||
|
||||
# Verify installation
|
||||
uv run fuzzforge mcp status
|
||||
```
|
||||
|
||||
**Restart your editor** and your AI agent will have access to FuzzForge tools!
|
||||
|
||||
---
|
||||
|
||||
## 🧑💻 Usage
|
||||
## ⚡ Quickstart
|
||||
|
||||
Once installed, just talk to your AI agent:
|
||||
Run your first workflow :
|
||||
|
||||
```
|
||||
"What security tools are available?"
|
||||
"Scan this firmware image for vulnerabilities"
|
||||
"Analyze this binary with radare2"
|
||||
"Run nuclei against https://example.com"
|
||||
```bash
|
||||
# 1. Clone the repo
|
||||
git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
|
||||
cd fuzzforge_ai
|
||||
|
||||
# 2. Build & run with Docker
|
||||
# Set registry host for your OS (local registry is mandatory)
|
||||
# macOS/Windows (Docker Desktop):
|
||||
export REGISTRY_HOST=host.docker.internal
|
||||
# Linux (default):
|
||||
# export REGISTRY_HOST=localhost
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The agent will use FuzzForge to discover the right hub tools, chain them into a pipeline, and return results — all without you touching a terminal.
|
||||
> The first launch can take 5-10 minutes due to Docker image building - a good time for a coffee break ☕
|
||||
|
||||
See the [Usage Guide](USAGE.md) for detailed setup and advanced workflows.
|
||||
```bash
|
||||
# 3. Run your first workflow
|
||||
cd test_projects/vulnerable_app/ # Go into the test directory
|
||||
fuzzforge init # Init a fuzzforge project
|
||||
ff workflow run security_assessment . # Start a workflow (you can also use ff command)
|
||||
```
|
||||
|
||||
### Manual Workflow Setup
|
||||
|
||||

|
||||
|
||||
_Setting up and running security workflows through the interface_
|
||||
|
||||
👉 More installation options in the [Documentation](https://docs.fuzzforge.ai).
|
||||
|
||||
---
|
||||
|
||||
## 📁 Project Structure
|
||||
## AI-Powered Workflow Execution
|
||||
|
||||
```
|
||||
fuzzforge_ai/
|
||||
├── fuzzforge-mcp/ # MCP server — the core of FuzzForge
|
||||
├── fuzzforge-cli/ # Command-line interface & terminal UI
|
||||
├── fuzzforge-common/ # Shared abstractions (containers, storage)
|
||||
├── fuzzforge-runner/ # Container execution engine (Docker/Podman)
|
||||
├── fuzzforge-tests/ # Integration tests
|
||||
├── mcp-security-hub/ # Default hub: 36 offensive security MCP servers
|
||||
└── scripts/ # Hub image build scripts
|
||||
```
|
||||

|
||||
|
||||
_AI agents automatically analyzing code and providing security insights_
|
||||
|
||||
## 📚 Resources
|
||||
|
||||
- 🌐 [Website](https://fuzzforge.ai)
|
||||
- 📖 [Documentation](https://docs.fuzzforge.ai)
|
||||
- 💬 [Community Discord](https://discord.com/invite/acqv9FVG)
|
||||
- 🎓 [FuzzingLabs Academy](https://academy.fuzzinglabs.com/?coupon=GITHUB_FUZZFORGE)
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
We welcome contributions from the community!
|
||||
We welcome contributions from the community!
|
||||
There are many ways to help:
|
||||
|
||||
- 🐛 Report bugs via [GitHub Issues](../../issues)
|
||||
- 💡 Suggest features or improvements
|
||||
- 🔧 Submit pull requests
|
||||
- 🔌 Add new MCP servers to the [Security Hub](https://github.com/FuzzingLabs/mcp-security-hub)
|
||||
- Report bugs by opening an [issue](../../issues)
|
||||
- Suggest new features or improvements
|
||||
- Submit pull requests with fixes or enhancements
|
||||
- Share workflows, corpora, or modules with the community
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
||||
See our [Contributing Guide](CONTRIBUTING.md) for details.
|
||||
|
||||
---
|
||||
|
||||
## 📄 License
|
||||
## 🗺️ Roadmap
|
||||
|
||||
BSL 1.1 - See [LICENSE](LICENSE) for details.
|
||||
Planned features and improvements:
|
||||
|
||||
- 📦 Public workflow & module marketplace
|
||||
- 🤖 New specialized AI agents (Rust, Go, Android, Automotive)
|
||||
- 🔗 Expanded fuzzer integrations (LibFuzzer, Jazzer, more network fuzzers)
|
||||
- ☁️ Multi-tenant SaaS platform with team collaboration
|
||||
- 📊 Advanced reporting & analytics
|
||||
|
||||
👉 Follow updates in the [GitHub issues](../../issues) and [Discord](https://discord.com/invite/acqv9FVG).
|
||||
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
<strong>Maintained by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
|
||||
<br>
|
||||
</p>
|
||||
## 📜 License
|
||||
|
||||
FuzzForge is released under the **Business Source License (BSL) 1.1**, with an automatic fallback to **Apache 2.0** after 4 years.
|
||||
See [LICENSE](LICENSE) and [LICENSE-APACHE](LICENSE-APACHE) for details.
|
||||
|
||||
125
ROADMAP.md
125
ROADMAP.md
@@ -1,125 +0,0 @@
|
||||
# FuzzForge AI Roadmap
|
||||
|
||||
This document outlines the planned features and development direction for FuzzForge AI.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Upcoming Features
|
||||
|
||||
### 1. MCP Security Hub Integration
|
||||
|
||||
**Status:** 🔄 Planned
|
||||
|
||||
Integrate [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) tools into FuzzForge, giving AI agents access to 28 MCP servers and 163+ security tools through a unified interface.
|
||||
|
||||
#### How It Works
|
||||
|
||||
Unlike native FuzzForge modules (built with the SDK), mcp-security-hub tools are **standalone MCP servers**. The integration will bridge these tools so they can be:
|
||||
|
||||
- Discovered via `list_modules` alongside native modules
|
||||
- Executed through FuzzForge's orchestration layer
|
||||
- Chained with native modules in workflows
|
||||
|
||||
| Aspect | Native Modules | MCP Hub Tools |
|
||||
|--------|----------------|---------------|
|
||||
| **Runtime** | FuzzForge SDK container | Standalone MCP server container |
|
||||
| **Protocol** | Direct execution | MCP-to-MCP bridge |
|
||||
| **Configuration** | Module config | Tool-specific args |
|
||||
| **Output** | FuzzForge results format | Tool-native format (normalized) |
|
||||
|
||||
#### Goals
|
||||
|
||||
- Unified discovery of all available tools (native + hub)
|
||||
- Orchestrate hub tools through FuzzForge's workflow engine
|
||||
- Normalize outputs for consistent result handling
|
||||
- No modification required to mcp-security-hub tools
|
||||
|
||||
#### Planned Tool Categories
|
||||
|
||||
| Category | Tools | Example Use Cases |
|
||||
|----------|-------|-------------------|
|
||||
| **Reconnaissance** | nmap, masscan, whatweb, shodan | Network scanning, service discovery |
|
||||
| **Web Security** | nuclei, sqlmap, ffuf, nikto | Vulnerability scanning, fuzzing |
|
||||
| **Binary Analysis** | radare2, binwalk, yara, capa, ghidra | Reverse engineering, malware analysis |
|
||||
| **Cloud Security** | trivy, prowler | Container scanning, cloud auditing |
|
||||
| **Secrets Detection** | gitleaks | Credential scanning |
|
||||
| **OSINT** | maigret, dnstwist | Username tracking, typosquatting |
|
||||
| **Threat Intel** | virustotal, otx | Malware analysis, IOC lookup |
|
||||
|
||||
#### Example Workflow
|
||||
|
||||
```
|
||||
You: "Scan example.com for vulnerabilities and analyze any suspicious binaries"
|
||||
|
||||
AI Agent:
|
||||
1. Uses nmap module for port discovery
|
||||
2. Uses nuclei module for vulnerability scanning
|
||||
3. Uses binwalk module to extract firmware
|
||||
4. Uses yara module for malware detection
|
||||
5. Generates consolidated report
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. User Interface
|
||||
|
||||
**Status:** 🔄 Planned
|
||||
|
||||
A graphical interface to manage FuzzForge without the command line.
|
||||
|
||||
#### Goals
|
||||
|
||||
- Provide an alternative to CLI for users who prefer visual tools
|
||||
- Make configuration and monitoring more accessible
|
||||
- Complement (not replace) the CLI experience
|
||||
|
||||
#### Planned Capabilities
|
||||
|
||||
| Capability | Description |
|
||||
|------------|-------------|
|
||||
| **Configuration** | Change MCP server settings, engine options, paths |
|
||||
| **Module Management** | Browse, configure, and launch modules |
|
||||
| **Execution Monitoring** | View running tasks, logs, progress, metrics |
|
||||
| **Project Overview** | Manage projects and browse execution results |
|
||||
| **Workflow Management** | Create and run multi-module workflows |
|
||||
|
||||
---
|
||||
|
||||
## 📋 Backlog
|
||||
|
||||
Features under consideration for future releases:
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| **Module Marketplace** | Browse and install community modules |
|
||||
| **Scheduled Executions** | Run modules on a schedule (cron-style) |
|
||||
| **Team Collaboration** | Share projects, results, and workflows |
|
||||
| **Reporting Engine** | Generate PDF/HTML security reports |
|
||||
| **Notifications** | Slack, Discord, email alerts for findings |
|
||||
|
||||
---
|
||||
|
||||
## ✅ Completed
|
||||
|
||||
| Feature | Version | Date |
|
||||
|---------|---------|------|
|
||||
| Docker as default engine | 0.1.0 | Jan 2026 |
|
||||
| MCP server for AI agents | 0.1.0 | Jan 2026 |
|
||||
| CLI for project management | 0.1.0 | Jan 2026 |
|
||||
| Continuous execution mode | 0.1.0 | Jan 2026 |
|
||||
| Workflow orchestration | 0.1.0 | Jan 2026 |
|
||||
|
||||
---
|
||||
|
||||
## 💬 Feedback
|
||||
|
||||
Have suggestions for the roadmap?
|
||||
|
||||
- Open an issue on [GitHub](https://github.com/FuzzingLabs/fuzzforge_ai/issues)
|
||||
- Join our [Discord](https://discord.gg/8XEX33UUwZ)
|
||||
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
<strong>Built with ❤️ by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
|
||||
</p>
|
||||
517
USAGE.md
517
USAGE.md
@@ -1,517 +0,0 @@
|
||||
# FuzzForge AI Usage Guide
|
||||
|
||||
This guide covers everything you need to know to get started with FuzzForge AI — from installation to linking your first MCP hub and running security research workflows with AI.
|
||||
|
||||
> **FuzzForge is designed to be used with AI agents** (GitHub Copilot, Claude, etc.) via MCP.
|
||||
> A terminal UI (`fuzzforge ui`) is provided for managing agents and hubs.
|
||||
> The CLI is available for advanced users but the primary experience is through natural language interaction with your AI assistant.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Quick Start](#quick-start)
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Installation](#installation)
|
||||
- [Terminal UI](#terminal-ui)
|
||||
- [Launching the UI](#launching-the-ui)
|
||||
- [Dashboard](#dashboard)
|
||||
- [Agent Setup](#agent-setup)
|
||||
- [Hub Manager](#hub-manager)
|
||||
- [MCP Hub System](#mcp-hub-system)
|
||||
- [What is an MCP Hub?](#what-is-an-mcp-hub)
|
||||
- [FuzzingLabs Security Hub](#fuzzinglabs-security-hub)
|
||||
- [Linking a Custom Hub](#linking-a-custom-hub)
|
||||
- [Building Hub Images](#building-hub-images)
|
||||
- [MCP Server Configuration (CLI)](#mcp-server-configuration-cli)
|
||||
- [GitHub Copilot](#github-copilot)
|
||||
- [Claude Code (CLI)](#claude-code-cli)
|
||||
- [Claude Desktop](#claude-desktop)
|
||||
- [Using FuzzForge with AI](#using-fuzzforge-with-ai)
|
||||
- [CLI Reference](#cli-reference)
|
||||
- [Environment Variables](#environment-variables)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
> **Prerequisites:** You need [uv](https://docs.astral.sh/uv/) and [Docker](https://docs.docker.com/get-docker/) installed.
|
||||
> See the [Prerequisites](#prerequisites) section for details.
|
||||
|
||||
```bash
|
||||
# 1. Clone and install
|
||||
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
|
||||
cd fuzzforge_ai
|
||||
uv sync
|
||||
|
||||
# 2. Launch the terminal UI
|
||||
uv run fuzzforge ui
|
||||
|
||||
# 3. Press 'h' → "FuzzingLabs Hub" to clone & link the default security hub
|
||||
# 4. Select an agent row and press Enter to install the MCP server for your agent
|
||||
# 5. Build the Docker images for the hub tools (required before tools can run)
|
||||
./scripts/build-hub-images.sh
|
||||
|
||||
# 6. Restart your AI agent and start talking:
|
||||
# "What security tools are available?"
|
||||
# "Scan this binary with binwalk and yara"
|
||||
# "Analyze this Rust crate for fuzzable functions"
|
||||
```
|
||||
|
||||
Or do it entirely from the command line:
|
||||
|
||||
```bash
|
||||
# Install MCP for your AI agent
|
||||
uv run fuzzforge mcp install copilot # For VS Code + GitHub Copilot
|
||||
# OR
|
||||
uv run fuzzforge mcp install claude-code # For Claude Code CLI
|
||||
|
||||
# Clone and link the default security hub
|
||||
git clone git@github.com:FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
|
||||
|
||||
# Build hub tool images (required — tools only run once their image is built)
|
||||
./scripts/build-hub-images.sh
|
||||
|
||||
# Restart your AI agent — done!
|
||||
```
|
||||
|
||||
> **Note:** FuzzForge uses Docker by default. Podman is also supported via `--engine podman`.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before installing FuzzForge AI, ensure you have:
|
||||
|
||||
- **Python 3.12+** — [Download Python](https://www.python.org/downloads/)
|
||||
- **uv** package manager — [Install uv](https://docs.astral.sh/uv/)
|
||||
- **Docker** — Container runtime ([Install Docker](https://docs.docker.com/get-docker/))
|
||||
- **Git** — For cloning hub repositories
|
||||
|
||||
### Installing uv
|
||||
|
||||
```bash
|
||||
# Linux/macOS
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Or with pip
|
||||
pip install uv
|
||||
```
|
||||
|
||||
### Installing Docker
|
||||
|
||||
```bash
|
||||
# Linux (Ubuntu/Debian)
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
sudo usermod -aG docker $USER
|
||||
# Log out and back in for group changes to take effect
|
||||
|
||||
# macOS/Windows
|
||||
# Install Docker Desktop from https://docs.docker.com/get-docker/
|
||||
```
|
||||
|
||||
> **Note:** Podman is also supported. Use `--engine podman` with CLI commands
|
||||
> or set `FUZZFORGE_ENGINE=podman` environment variable.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Clone the Repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
|
||||
cd fuzzforge_ai
|
||||
```
|
||||
|
||||
### 2. Install Dependencies
|
||||
|
||||
```bash
|
||||
uv sync
|
||||
```
|
||||
|
||||
This installs all FuzzForge components in a virtual environment.
|
||||
|
||||
### 3. Verify Installation
|
||||
|
||||
```bash
|
||||
uv run fuzzforge --help
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Terminal UI
|
||||
|
||||
FuzzForge ships with a terminal user interface (TUI) built on [Textual](https://textual.textualize.io/) for managing AI agents and MCP hub servers from a single dashboard.
|
||||
|
||||
### Launching the UI
|
||||
|
||||
```bash
|
||||
uv run fuzzforge ui
|
||||
```
|
||||
|
||||
### Dashboard
|
||||
|
||||
The main screen is split into two panels:
|
||||
|
||||
| Panel | Content |
|
||||
|-------|---------|
|
||||
| **AI Agents** (left) | Shows GitHub Copilot, Claude Desktop, and Claude Code with live link status and config file path |
|
||||
| **Hub Servers** (right) | Shows all configured MCP hub tools with Docker image name, source hub, and build status (✓ Ready / ✗ Not built) |
|
||||
|
||||
### Keyboard Shortcuts
|
||||
|
||||
| Key | Action |
|
||||
|-----|--------|
|
||||
| `Enter` | **Select** — Act on the selected row (setup/unlink an agent) |
|
||||
| `h` | **Hub Manager** — Open the hub management screen |
|
||||
| `r` | **Refresh** — Re-check all agent and hub statuses |
|
||||
| `q` | **Quit** |
|
||||
|
||||
### Agent Setup
|
||||
|
||||
Select an agent row in the AI Agents table and press `Enter`:
|
||||
|
||||
- **If the agent is not linked** → a setup dialog opens asking for your container engine (Docker or Podman), then installs the FuzzForge MCP configuration
|
||||
- **If the agent is already linked** → a confirmation dialog offers to unlink it (removes the `fuzzforge` entry without touching other MCP servers)
|
||||
|
||||
The setup auto-detects:
|
||||
- FuzzForge installation root
|
||||
- Docker/Podman socket path
|
||||
- Hub configuration from `hub-config.json`
|
||||
|
||||
### Hub Manager
|
||||
|
||||
Press `h` to open the hub manager. This is where you manage your MCP hub repositories:
|
||||
|
||||
| Button | Action |
|
||||
|--------|--------|
|
||||
| **FuzzingLabs Hub** | One-click clone of the official [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) repository — clones to `~/.fuzzforge/hubs/mcp-security-hub`, scans for tools, and registers them in `hub-config.json` |
|
||||
| **Link Path** | Link any local directory as a hub — enter a name and path, FuzzForge scans it for `category/tool-name/Dockerfile` patterns |
|
||||
| **Clone URL** | Clone any git repository and link it as a hub |
|
||||
| **Remove** | Unlink the selected hub and remove its servers from the configuration |
|
||||
|
||||
The hub table shows:
|
||||
- **Name** — Hub name (★ prefix for the default hub)
|
||||
- **Path** — Local directory path
|
||||
- **Servers** — Number of MCP tools discovered
|
||||
- **Source** — Git URL or "local"
|
||||
|
||||
---
|
||||
|
||||
## MCP Hub System
|
||||
|
||||
### What is an MCP Hub?
|
||||
|
||||
An MCP hub is a directory containing one or more containerized MCP tools, organized by category:
|
||||
|
||||
```
|
||||
my-hub/
|
||||
├── category-a/
|
||||
│ ├── tool-1/
|
||||
│ │ └── Dockerfile
|
||||
│ └── tool-2/
|
||||
│ └── Dockerfile
|
||||
├── category-b/
|
||||
│ └── tool-3/
|
||||
│ └── Dockerfile
|
||||
└── ...
|
||||
```
|
||||
|
||||
FuzzForge scans for the pattern `category/tool-name/Dockerfile` and auto-generates server configuration entries for each discovered tool.
|
||||
|
||||
### FuzzingLabs Security Hub
|
||||
|
||||
The default MCP hub is [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub), maintained by FuzzingLabs. It includes **40+ security tools** across categories:
|
||||
|
||||
| Category | Tools |
|
||||
|----------|-------|
|
||||
| **Reconnaissance** | nmap, masscan, shodan, zoomeye, whatweb, pd-tools, externalattacker, networksdb |
|
||||
| **Binary Analysis** | binwalk, yara, capa, radare2, ghidra, ida |
|
||||
| **Code Security** | semgrep, rust-analyzer, harness-tester, cargo-fuzzer, crash-analyzer |
|
||||
| **Web Security** | nuclei, nikto, sqlmap, ffuf, burp, waybackurls |
|
||||
| **Fuzzing** | boofuzz, dharma |
|
||||
| **Exploitation** | searchsploit |
|
||||
| **Secrets** | gitleaks |
|
||||
| **Cloud Security** | trivy, prowler, roadrecon |
|
||||
| **OSINT** | maigret, dnstwist |
|
||||
| **Threat Intel** | virustotal, otx |
|
||||
| **Password Cracking** | hashcat |
|
||||
| **Blockchain** | medusa, solazy, daml-viewer |
|
||||
|
||||
**Clone it via the UI:**
|
||||
|
||||
1. `uv run fuzzforge ui`
|
||||
2. Press `h` → click **FuzzingLabs Hub**
|
||||
3. Wait for the clone to finish — servers are auto-registered
|
||||
|
||||
**Or clone manually:**
|
||||
|
||||
```bash
|
||||
git clone git@github.com:FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
|
||||
```
|
||||
|
||||
### Linking a Custom Hub
|
||||
|
||||
You can link any directory that follows the `category/tool-name/Dockerfile` layout:
|
||||
|
||||
**Via the UI:**
|
||||
|
||||
1. Press `h` → **Link Path**
|
||||
2. Enter a name and the directory path
|
||||
|
||||
**Via the CLI (planned):** Not yet available — use the UI.
|
||||
|
||||
### Building Hub Images
|
||||
|
||||
After linking a hub, you need to build the Docker images before the tools can be used:
|
||||
|
||||
```bash
|
||||
# Build all images from the default security hub
|
||||
./scripts/build-hub-images.sh
|
||||
|
||||
# Or build a single tool image
|
||||
docker build -t semgrep-mcp:latest mcp-security-hub/code-security/semgrep-mcp/
|
||||
```
|
||||
|
||||
The dashboard hub table shows ✓ Ready for built images and ✗ Not built for missing ones.
|
||||
|
||||
---
|
||||
|
||||
## MCP Server Configuration (CLI)
|
||||
|
||||
If you prefer the command line over the TUI, you can configure agents directly:
|
||||
|
||||
### GitHub Copilot
|
||||
|
||||
```bash
|
||||
uv run fuzzforge mcp install copilot
|
||||
```
|
||||
|
||||
The command auto-detects:
|
||||
- **FuzzForge root** — Where FuzzForge is installed
|
||||
- **Docker socket** — Auto-detects `/var/run/docker.sock`
|
||||
|
||||
**Optional overrides:**
|
||||
```bash
|
||||
uv run fuzzforge mcp install copilot --engine podman
|
||||
```
|
||||
|
||||
**After installation:** Restart VS Code. FuzzForge tools appear in GitHub Copilot Chat.
|
||||
|
||||
### Claude Code (CLI)
|
||||
|
||||
```bash
|
||||
uv run fuzzforge mcp install claude-code
|
||||
```
|
||||
|
||||
Installs to `~/.claude.json`. FuzzForge tools are available from any directory after restarting Claude.
|
||||
|
||||
### Claude Desktop
|
||||
|
||||
```bash
|
||||
uv run fuzzforge mcp install claude-desktop
|
||||
```
|
||||
|
||||
**After installation:** Restart Claude Desktop.
|
||||
|
||||
### Check Status
|
||||
|
||||
```bash
|
||||
uv run fuzzforge mcp status
|
||||
```
|
||||
|
||||
### Remove Configuration
|
||||
|
||||
```bash
|
||||
uv run fuzzforge mcp uninstall copilot
|
||||
uv run fuzzforge mcp uninstall claude-code
|
||||
uv run fuzzforge mcp uninstall claude-desktop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Using FuzzForge with AI
|
||||
|
||||
Once MCP is configured and hub images are built, interact with FuzzForge through natural language with your AI assistant.
|
||||
|
||||
### Example Conversations
|
||||
|
||||
**Discover available tools:**
|
||||
```
|
||||
You: "What security tools are available in FuzzForge?"
|
||||
AI: Queries hub tools → "I found 15 tools across categories: nmap for
|
||||
port scanning, binwalk for firmware analysis, semgrep for code
|
||||
scanning, cargo-fuzzer for Rust fuzzing..."
|
||||
```
|
||||
|
||||
**Analyze a binary:**
|
||||
```
|
||||
You: "Extract and analyze this firmware image"
|
||||
AI: Uses binwalk to extract → yara for pattern matching → capa for
|
||||
capability detection → "Found 3 embedded filesystems, 2 YARA
|
||||
matches for known vulnerabilities..."
|
||||
```
|
||||
|
||||
**Fuzz Rust code:**
|
||||
```
|
||||
You: "Analyze this Rust crate for functions I should fuzz"
|
||||
AI: Uses rust-analyzer → "Found 3 fuzzable entry points..."
|
||||
|
||||
You: "Start fuzzing parse_input for 10 minutes"
|
||||
AI: Uses cargo-fuzzer → "Fuzzing session started. 2 crashes found..."
|
||||
```
|
||||
|
||||
**Scan for vulnerabilities:**
|
||||
```
|
||||
You: "Scan this codebase with semgrep for security issues"
|
||||
AI: Uses semgrep-mcp → "Found 5 findings: 2 high severity SQL injection
|
||||
patterns, 3 medium severity hardcoded secrets..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Reference
|
||||
|
||||
### UI Command
|
||||
|
||||
```bash
|
||||
uv run fuzzforge ui # Launch the terminal dashboard
|
||||
```
|
||||
|
||||
### MCP Commands
|
||||
|
||||
```bash
|
||||
uv run fuzzforge mcp status # Check agent configuration status
|
||||
uv run fuzzforge mcp install <agent> # Install MCP config (copilot|claude-code|claude-desktop)
|
||||
uv run fuzzforge mcp uninstall <agent> # Remove MCP config
|
||||
uv run fuzzforge mcp generate <agent> # Preview config without installing
|
||||
```
|
||||
|
||||
### Project Commands
|
||||
|
||||
```bash
|
||||
uv run fuzzforge project init # Initialize a project
|
||||
uv run fuzzforge project info # Show project info
|
||||
uv run fuzzforge project executions # List executions
|
||||
uv run fuzzforge project results <id> # Get execution results
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Configure FuzzForge using environment variables:
|
||||
|
||||
```bash
|
||||
# Override the FuzzForge installation root (auto-detected from cwd by default)
|
||||
export FUZZFORGE_ROOT=/path/to/fuzzforge_ai
|
||||
|
||||
# Override the user-global data directory (default: ~/.fuzzforge)
|
||||
# Useful for isolated testing without touching your real installation
|
||||
export FUZZFORGE_USER_DIR=/tmp/my-fuzzforge-test
|
||||
|
||||
# Storage path for projects and execution results (default: <workspace>/.fuzzforge/storage)
|
||||
export FUZZFORGE_STORAGE__PATH=/path/to/storage
|
||||
|
||||
# Container engine (Docker is default)
|
||||
export FUZZFORGE_ENGINE__TYPE=docker # or podman
|
||||
|
||||
# Podman-specific container storage paths
|
||||
export FUZZFORGE_ENGINE__GRAPHROOT=~/.fuzzforge/containers/storage
|
||||
export FUZZFORGE_ENGINE__RUNROOT=~/.fuzzforge/containers/run
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Docker Not Running
|
||||
|
||||
```
|
||||
Error: Cannot connect to Docker daemon
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Linux: Start Docker service
|
||||
sudo systemctl start docker
|
||||
|
||||
# macOS/Windows: Start Docker Desktop application
|
||||
|
||||
# Verify Docker is running
|
||||
docker run --rm hello-world
|
||||
```
|
||||
|
||||
### Permission Denied on Docker Socket
|
||||
|
||||
```
|
||||
Error: Permission denied connecting to Docker socket
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
# Log out and back in, then verify:
|
||||
docker run --rm hello-world
|
||||
```
|
||||
|
||||
### Hub Images Not Built
|
||||
|
||||
The dashboard shows ✗ Not built for tools:
|
||||
|
||||
```bash
|
||||
# Build all hub images
|
||||
./scripts/build-hub-images.sh
|
||||
|
||||
# Or build a single tool
|
||||
docker build -t <tool-name>:latest mcp-security-hub/<category>/<tool-name>/
|
||||
```
|
||||
|
||||
### MCP Server Not Starting
|
||||
|
||||
```bash
|
||||
# Check agent configuration
|
||||
uv run fuzzforge mcp status
|
||||
|
||||
# Verify the config file path exists and contains valid JSON
|
||||
cat ~/.config/Code/User/mcp.json # Copilot
|
||||
cat ~/.claude.json # Claude Code
|
||||
```
|
||||
|
||||
### Using Podman Instead of Docker
|
||||
|
||||
```bash
|
||||
# Install with Podman engine
|
||||
uv run fuzzforge mcp install copilot --engine podman
|
||||
|
||||
# Or set environment variable
|
||||
export FUZZFORGE_ENGINE=podman
|
||||
```
|
||||
|
||||
### Hub Registry
|
||||
|
||||
FuzzForge stores linked hub information in `~/.fuzzforge/hubs.json`. If something goes wrong:
|
||||
|
||||
```bash
|
||||
# View registry
|
||||
cat ~/.fuzzforge/hubs.json
|
||||
|
||||
# Reset registry
|
||||
rm ~/.fuzzforge/hubs.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- 🖥️ Launch `uv run fuzzforge ui` and explore the dashboard
|
||||
- 🔒 Clone the [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) for 40+ security tools
|
||||
- 💬 Join our [Discord](https://discord.gg/8XEX33UUwZ) for support
|
||||
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
<strong>Built with ❤️ by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
|
||||
</p>
|
||||
6
ai/.gitignore
vendored
Normal file
6
ai/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.env
|
||||
__pycache__/
|
||||
*.pyc
|
||||
fuzzforge_sessions.db
|
||||
agentops.log
|
||||
*.log
|
||||
110
ai/README.md
Normal file
110
ai/README.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# FuzzForge AI Module
|
||||
|
||||
FuzzForge AI is the multi-agent layer that lets you operate the FuzzForge security platform through natural language. It orchestrates local tooling, registered Agent-to-Agent (A2A) peers, and the Prefect-powered backend while keeping long-running context in memory and project knowledge graphs.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Initialise a project**
|
||||
```bash
|
||||
cd /path/to/project
|
||||
fuzzforge init
|
||||
```
|
||||
2. **Review environment settings** – copy `.fuzzforge/.env.template` to `.fuzzforge/.env`, then edit the values to match your provider. The template ships with commented defaults for OpenAI-style usage and placeholders for Cognee keys.
|
||||
```env
|
||||
LLM_PROVIDER=openai
|
||||
LITELLM_MODEL=gpt-5-mini
|
||||
OPENAI_API_KEY=sk-your-key
|
||||
FUZZFORGE_MCP_URL=http://localhost:8010/mcp
|
||||
SESSION_PERSISTENCE=sqlite
|
||||
```
|
||||
Optional flags you may want to enable early:
|
||||
```env
|
||||
MEMORY_SERVICE=inmemory
|
||||
AGENTOPS_API_KEY=sk-your-agentops-key # Enable hosted tracing
|
||||
LOG_LEVEL=INFO # CLI / server log level
|
||||
```
|
||||
3. **Populate the knowledge graph**
|
||||
```bash
|
||||
fuzzforge ingest --path . --recursive
|
||||
# alias: fuzzforge rag ingest --path . --recursive
|
||||
```
|
||||
4. **Launch the agent shell**
|
||||
```bash
|
||||
fuzzforge ai agent
|
||||
```
|
||||
Keep the backend running (Prefect API at `FUZZFORGE_MCP_URL`) so workflow commands succeed.
|
||||
|
||||
## Everyday Workflow
|
||||
|
||||
- Run `fuzzforge ai agent` and start with `list available fuzzforge workflows` or `/memory status` to confirm everything is wired.
|
||||
- Use natural prompts for automation (`run fuzzforge workflow …`, `search project knowledge for …`) and fall back to slash commands for precision (`/recall`, `/sendfile`).
|
||||
- Keep `/memory datasets` handy to see which Cognee datasets are available after each ingest.
|
||||
- Start the HTTP surface with `python -m fuzzforge_ai` when external agents need access to artifacts or graph queries. The CLI stays usable at the same time.
|
||||
- Refresh the knowledge graph regularly: `fuzzforge ingest --path . --recursive --force` keeps responses aligned with recent code changes.
|
||||
|
||||
## What the Agent Can Do
|
||||
|
||||
- **Route requests** – automatically selects the right local tool or remote agent using the A2A capability registry.
|
||||
- **Run security workflows** – list, submit, and monitor FuzzForge workflows via MCP wrappers.
|
||||
- **Manage artifacts** – create downloadable files for reports, code edits, and shared attachments.
|
||||
- **Maintain context** – stores session history, semantic recall, and Cognee project graphs.
|
||||
- **Serve over HTTP** – expose the same agent as an A2A server using `python -m fuzzforge_ai`.
|
||||
|
||||
## Essential Commands
|
||||
|
||||
Inside `fuzzforge ai agent` you can mix slash commands and free-form prompts:
|
||||
|
||||
```text
|
||||
/list # Show registered A2A agents
|
||||
/register http://:10201 # Add a remote agent
|
||||
/artifacts # List generated files
|
||||
/sendfile SecurityAgent src/report.md "Please review"
|
||||
You> route_to SecurityAnalyzer: scan ./backend for secrets
|
||||
You> run fuzzforge workflow static_analysis_scan on ./test_projects/demo
|
||||
You> search project knowledge for "prefect status" using INSIGHTS
|
||||
```
|
||||
|
||||
Artifacts created during the conversation are served from `.fuzzforge/artifacts/` and exposed through the A2A HTTP API.
|
||||
|
||||
## Memory & Knowledge
|
||||
|
||||
The module layers three storage systems:
|
||||
|
||||
- **Session persistence** (SQLite or in-memory) for chat transcripts.
|
||||
- **Semantic recall** via the ADK memory service for fuzzy search.
|
||||
- **Cognee graphs** for project-wide knowledge built from ingestion runs.
|
||||
|
||||
Re-run ingestion after major code changes to keep graph answers relevant. If Cognee variables are not set, graph-specific tools automatically respond with a polite "not configured" message.
|
||||
|
||||
## Sample Prompts
|
||||
|
||||
Use these to validate the setup once the agent shell is running:
|
||||
|
||||
- `list available fuzzforge workflows`
|
||||
- `run fuzzforge workflow static_analysis_scan on ./backend with target_branch=main`
|
||||
- `show findings for that run once it finishes`
|
||||
- `refresh the project knowledge graph for ./backend`
|
||||
- `search project knowledge for "prefect readiness" using INSIGHTS`
|
||||
- `/recall terraform secrets`
|
||||
- `/memory status`
|
||||
- `ROUTE_TO SecurityAnalyzer: audit infrastructure_vulnerable`
|
||||
|
||||
## Need More Detail?
|
||||
|
||||
Dive into the dedicated guides under `ai/docs/advanced/`:
|
||||
|
||||
- [Architecture](https://docs.fuzzforge.ai/docs/ai/intro) – High-level architecture with diagrams and component breakdowns.
|
||||
- [Ingestion](https://docs.fuzzforge.ai/docs/ai/ingestion.md) – Command options, Cognee persistence, and prompt examples.
|
||||
- [Configuration](https://docs.fuzzforge.ai/docs/ai/configuration.md) – LLM provider matrix, local model setup, and tracing options.
|
||||
- [Prompts](https://docs.fuzzforge.ai/docs/ai/prompts.md) – Slash commands, workflow prompts, and routing tips.
|
||||
- [A2A Services](https://docs.fuzzforge.ai/docs/ai/a2a-services.md) – HTTP endpoints, agent card, and collaboration flow.
|
||||
- [Memory Persistence](https://docs.fuzzforge.ai/docs/ai/architecture.md#memory--persistence) – Deep dive on memory storage, datasets, and how `/memory status` inspects them.
|
||||
|
||||
## Development Notes
|
||||
|
||||
- Entry point for the CLI: `ai/src/fuzzforge_ai/cli.py`
|
||||
- A2A HTTP server: `ai/src/fuzzforge_ai/a2a_server.py`
|
||||
- Tool routing & workflow glue: `ai/src/fuzzforge_ai/agent_executor.py`
|
||||
- Ingestion helpers: `ai/src/fuzzforge_ai/ingest_utils.py`
|
||||
|
||||
Install the module in editable mode (`pip install -e ai`) while iterating so CLI changes are picked up immediately.
|
||||
93
ai/llm.txt
Normal file
93
ai/llm.txt
Normal file
@@ -0,0 +1,93 @@
|
||||
FuzzForge AI LLM Configuration Guide
|
||||
===================================
|
||||
|
||||
This note summarises the environment variables and libraries that drive LiteLLM (via the Google ADK runtime) inside the FuzzForge AI module. For complete matrices and advanced examples, read `docs/advanced/configuration.md`.
|
||||
|
||||
Core Libraries
|
||||
--------------
|
||||
- `google-adk` – hosts the agent runtime, memory services, and LiteLLM bridge.
|
||||
- `litellm` – provider-agnostic LLM client used by ADK and the executor.
|
||||
- Provider SDKs – install the SDK that matches your target backend (`openai`, `anthropic`, `google-cloud-aiplatform`, `groq`, etc.).
|
||||
- Optional extras: `agentops` for tracing, `cognee[all]` for knowledge-graph ingestion, `ollama` CLI for running local models.
|
||||
|
||||
Quick install foundation::
|
||||
|
||||
```
|
||||
pip install google-adk litellm openai
|
||||
```
|
||||
|
||||
Add any provider-specific SDKs (for example `pip install anthropic groq`) on top of that base.
|
||||
|
||||
Baseline Setup
|
||||
--------------
|
||||
Copy `.fuzzforge/.env.template` to `.fuzzforge/.env` and set the core fields:
|
||||
|
||||
```
|
||||
LLM_PROVIDER=openai
|
||||
LITELLM_MODEL=gpt-5-mini
|
||||
OPENAI_API_KEY=sk-your-key
|
||||
FUZZFORGE_MCP_URL=http://localhost:8010/mcp
|
||||
SESSION_PERSISTENCE=sqlite
|
||||
MEMORY_SERVICE=inmemory
|
||||
```
|
||||
|
||||
LiteLLM Provider Examples
|
||||
-------------------------
|
||||
|
||||
OpenAI-compatible (Azure, etc.)::
|
||||
```
|
||||
LLM_PROVIDER=azure_openai
|
||||
LITELLM_MODEL=gpt-4o-mini
|
||||
LLM_API_KEY=sk-your-azure-key
|
||||
LLM_ENDPOINT=https://your-resource.openai.azure.com
|
||||
```
|
||||
|
||||
Anthropic::
|
||||
```
|
||||
LLM_PROVIDER=anthropic
|
||||
LITELLM_MODEL=claude-3-haiku-20240307
|
||||
ANTHROPIC_API_KEY=sk-your-key
|
||||
```
|
||||
|
||||
Ollama (local)::
|
||||
```
|
||||
LLM_PROVIDER=ollama_chat
|
||||
LITELLM_MODEL=codellama:latest
|
||||
OLLAMA_API_BASE=http://localhost:11434
|
||||
```
|
||||
Run `ollama pull codellama:latest` so the adapter can respond immediately.
|
||||
|
||||
Vertex AI::
|
||||
```
|
||||
LLM_PROVIDER=vertex_ai
|
||||
LITELLM_MODEL=gemini-1.5-pro
|
||||
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
|
||||
```
|
||||
|
||||
Provider Checklist
|
||||
------------------
|
||||
- **OpenAI / Azure OpenAI**: `LLM_PROVIDER`, `LITELLM_MODEL`, API key, optional endpoint + API version (Azure).
|
||||
- **Anthropic**: `LLM_PROVIDER=anthropic`, `LITELLM_MODEL`, `ANTHROPIC_API_KEY`.
|
||||
- **Google Vertex AI**: `LLM_PROVIDER=vertex_ai`, `LITELLM_MODEL`, `GOOGLE_APPLICATION_CREDENTIALS`, `GOOGLE_CLOUD_PROJECT`.
|
||||
- **Groq**: `LLM_PROVIDER=groq`, `LITELLM_MODEL`, `GROQ_API_KEY`.
|
||||
- **Ollama / Local**: `LLM_PROVIDER=ollama_chat`, `LITELLM_MODEL`, `OLLAMA_API_BASE`, and the model pulled locally (`ollama pull <model>`).
|
||||
|
||||
Knowledge Graph Add-ons
|
||||
-----------------------
|
||||
Set these only if you plan to use Cognee project graphs:
|
||||
|
||||
```
|
||||
LLM_COGNEE_PROVIDER=openai
|
||||
LLM_COGNEE_MODEL=gpt-5-mini
|
||||
LLM_COGNEE_API_KEY=sk-your-key
|
||||
```
|
||||
|
||||
Tracing & Debugging
|
||||
-------------------
|
||||
- Provide `AGENTOPS_API_KEY` to enable hosted traces for every conversation.
|
||||
- Set `FUZZFORGE_DEBUG=1` (and optionally `LOG_LEVEL=DEBUG`) for verbose executor output.
|
||||
- Restart the agent after changing environment variables; LiteLLM loads configuration on boot.
|
||||
|
||||
Further Reading
|
||||
---------------
|
||||
`docs/advanced/configuration.md` – provider comparison, debugging flags, and referenced modules.
|
||||
44
ai/pyproject.toml
Normal file
44
ai/pyproject.toml
Normal file
@@ -0,0 +1,44 @@
|
||||
[project]
|
||||
name = "fuzzforge-ai"
|
||||
version = "0.6.0"
|
||||
description = "FuzzForge AI orchestration module"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
dependencies = [
|
||||
"google-adk",
|
||||
"a2a-sdk",
|
||||
"litellm",
|
||||
"python-dotenv",
|
||||
"httpx",
|
||||
"uvicorn",
|
||||
"rich",
|
||||
"agentops",
|
||||
"fastmcp",
|
||||
"mcp",
|
||||
"typing-extensions",
|
||||
"cognee>=0.3.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest",
|
||||
"pytest-asyncio",
|
||||
"black",
|
||||
"ruff",
|
||||
]
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[tool.hatch.build.targets.wheel]
|
||||
packages = ["src/fuzzforge_ai"]
|
||||
|
||||
[tool.hatch.metadata]
|
||||
allow-direct-references = true
|
||||
|
||||
[tool.uv]
|
||||
dev-dependencies = [
|
||||
"pytest",
|
||||
"pytest-asyncio",
|
||||
]
|
||||
24
ai/src/fuzzforge_ai/__init__.py
Normal file
24
ai/src/fuzzforge_ai/__init__.py
Normal file
@@ -0,0 +1,24 @@
|
||||
"""
|
||||
FuzzForge AI Module - Agent-to-Agent orchestration system
|
||||
|
||||
This module integrates the fuzzforge_ai components into FuzzForge,
|
||||
providing intelligent AI agent capabilities for security analysis.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
__version__ = "0.6.0"
|
||||
|
||||
from .agent import FuzzForgeAgent
|
||||
from .config_manager import ConfigManager
|
||||
|
||||
__all__ = ['FuzzForgeAgent', 'ConfigManager']
|
||||
109
ai/src/fuzzforge_ai/__main__.py
Normal file
109
ai/src/fuzzforge_ai/__main__.py
Normal file
@@ -0,0 +1,109 @@
|
||||
"""
|
||||
FuzzForge A2A Server
|
||||
Run this to expose FuzzForge as an A2A-compatible agent
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import os
|
||||
import warnings
|
||||
import logging
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from fuzzforge_ai.config_bridge import ProjectConfigManager
|
||||
|
||||
# Suppress warnings
|
||||
warnings.filterwarnings("ignore")
|
||||
logging.getLogger("google.adk").setLevel(logging.ERROR)
|
||||
logging.getLogger("google.adk.tools.base_authenticated_tool").setLevel(logging.ERROR)
|
||||
|
||||
# Load .env from .fuzzforge directory first, then fallback
|
||||
from pathlib import Path
|
||||
|
||||
# Ensure Cognee logs stay inside the project workspace
|
||||
project_root = Path.cwd()
|
||||
default_log_dir = project_root / ".fuzzforge" / "logs"
|
||||
default_log_dir.mkdir(parents=True, exist_ok=True)
|
||||
log_path = default_log_dir / "cognee.log"
|
||||
os.environ.setdefault("COGNEE_LOG_PATH", str(log_path))
|
||||
fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
|
||||
if fuzzforge_env.exists():
|
||||
load_dotenv(fuzzforge_env, override=True)
|
||||
else:
|
||||
load_dotenv(override=True)
|
||||
|
||||
# Ensure Cognee uses the project-specific storage paths when available
|
||||
try:
|
||||
project_config = ProjectConfigManager()
|
||||
project_config.setup_cognee_environment()
|
||||
except Exception:
|
||||
# Project may not be initialized; fall through with default settings
|
||||
pass
|
||||
|
||||
# Check configuration
|
||||
if not os.getenv('LITELLM_MODEL'):
|
||||
print("[ERROR] LITELLM_MODEL not set in .env file")
|
||||
print("Please set LITELLM_MODEL to your desired model (e.g., gpt-4o-mini)")
|
||||
exit(1)
|
||||
|
||||
from .agent import get_fuzzforge_agent
|
||||
from .a2a_server import create_a2a_app as create_custom_a2a_app
|
||||
|
||||
|
||||
def create_a2a_app():
|
||||
"""Create the A2A application"""
|
||||
# Get configuration
|
||||
port = int(os.getenv('FUZZFORGE_PORT', 10100))
|
||||
|
||||
# Get the FuzzForge agent
|
||||
fuzzforge = get_fuzzforge_agent()
|
||||
|
||||
# Print ASCII banner
|
||||
print("\033[95m") # Purple color
|
||||
print(" ███████╗██╗ ██╗███████╗███████╗███████╗ ██████╗ ██████╗ ██████╗ ███████╗ █████╗ ██╗")
|
||||
print(" ██╔════╝██║ ██║╚══███╔╝╚══███╔╝██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔══██╗██║")
|
||||
print(" █████╗ ██║ ██║ ███╔╝ ███╔╝ █████╗ ██║ ██║██████╔╝██║ ███╗█████╗ ███████║██║")
|
||||
print(" ██╔══╝ ██║ ██║ ███╔╝ ███╔╝ ██╔══╝ ██║ ██║██╔══██╗██║ ██║██╔══╝ ██╔══██║██║")
|
||||
print(" ██║ ╚██████╔╝███████╗███████╗██║ ╚██████╔╝██║ ██║╚██████╔╝███████╗ ██║ ██║██║")
|
||||
print(" ╚═╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═╝╚═╝")
|
||||
print("\033[0m") # Reset color
|
||||
|
||||
# Create A2A app
|
||||
print(f"🚀 Starting FuzzForge A2A Server")
|
||||
print(f" Model: {fuzzforge.model}")
|
||||
if fuzzforge.cognee_url:
|
||||
print(f" Memory: Cognee at {fuzzforge.cognee_url}")
|
||||
print(f" Port: {port}")
|
||||
|
||||
app = create_custom_a2a_app(fuzzforge.adk_agent, port=port, executor=fuzzforge.executor)
|
||||
|
||||
print(f"\n✅ FuzzForge A2A Server ready!")
|
||||
print(f" Agent card: http://localhost:{port}/.well-known/agent-card.json")
|
||||
print(f" A2A endpoint: http://localhost:{port}/")
|
||||
print(f"\n📡 Other agents can register FuzzForge at: http://localhost:{port}")
|
||||
|
||||
return app
|
||||
|
||||
|
||||
def main():
|
||||
"""Start the A2A server using uvicorn."""
|
||||
import uvicorn
|
||||
|
||||
app = create_a2a_app()
|
||||
port = int(os.getenv('FUZZFORGE_PORT', 10100))
|
||||
|
||||
print(f"\n🎯 Starting server with uvicorn...")
|
||||
uvicorn.run(app, host="127.0.0.1", port=port)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
230
ai/src/fuzzforge_ai/a2a_server.py
Normal file
230
ai/src/fuzzforge_ai/a2a_server.py
Normal file
@@ -0,0 +1,230 @@
|
||||
"""Custom A2A wiring so we can access task store and queue manager."""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import Optional, Union
|
||||
|
||||
from starlette.applications import Starlette
|
||||
from starlette.responses import Response, FileResponse
|
||||
from starlette.routing import Route
|
||||
|
||||
from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor
|
||||
from google.adk.a2a.utils.agent_card_builder import AgentCardBuilder
|
||||
from google.adk.a2a.experimental import a2a_experimental
|
||||
from google.adk.agents.base_agent import BaseAgent
|
||||
from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService
|
||||
from google.adk.auth.credential_service.in_memory_credential_service import InMemoryCredentialService
|
||||
from google.adk.cli.utils.logs import setup_adk_logger
|
||||
from google.adk.memory.in_memory_memory_service import InMemoryMemoryService
|
||||
from google.adk.runners import Runner
|
||||
from google.adk.sessions.in_memory_session_service import InMemorySessionService
|
||||
|
||||
from a2a.server.apps import A2AStarletteApplication
|
||||
from a2a.server.request_handlers.default_request_handler import DefaultRequestHandler
|
||||
from a2a.server.tasks.inmemory_task_store import InMemoryTaskStore
|
||||
from a2a.server.events.in_memory_queue_manager import InMemoryQueueManager
|
||||
from a2a.types import AgentCard
|
||||
|
||||
from .agent_executor import FuzzForgeExecutor
|
||||
|
||||
|
||||
import json
|
||||
|
||||
|
||||
async def serve_artifact(request):
|
||||
"""Serve artifact files via HTTP for A2A agents"""
|
||||
artifact_id = request.path_params["artifact_id"]
|
||||
|
||||
# Try to get the executor instance to access artifact cache
|
||||
# We'll store a reference to it during app creation
|
||||
executor = getattr(serve_artifact, '_executor', None)
|
||||
if not executor:
|
||||
return Response("Artifact service not available", status_code=503)
|
||||
|
||||
try:
|
||||
# Look in the artifact cache directory
|
||||
artifact_cache_dir = executor._artifact_cache_dir
|
||||
artifact_dir = artifact_cache_dir / artifact_id
|
||||
|
||||
if not artifact_dir.exists():
|
||||
return Response("Artifact not found", status_code=404)
|
||||
|
||||
# Find the artifact file (should be only one file in the directory)
|
||||
artifact_files = list(artifact_dir.glob("*"))
|
||||
if not artifact_files:
|
||||
return Response("Artifact file not found", status_code=404)
|
||||
|
||||
artifact_file = artifact_files[0] # Take the first (and should be only) file
|
||||
|
||||
# Determine mime type from file extension or default to octet-stream
|
||||
import mimetypes
|
||||
mime_type, _ = mimetypes.guess_type(str(artifact_file))
|
||||
if not mime_type:
|
||||
mime_type = 'application/octet-stream'
|
||||
|
||||
return FileResponse(
|
||||
path=str(artifact_file),
|
||||
media_type=mime_type,
|
||||
filename=artifact_file.name
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return Response(f"Error serving artifact: {str(e)}", status_code=500)
|
||||
|
||||
|
||||
async def knowledge_query(request):
|
||||
"""Expose knowledge graph search over HTTP for external agents."""
|
||||
executor = getattr(knowledge_query, '_executor', None)
|
||||
if not executor:
|
||||
return Response("Knowledge service not available", status_code=503)
|
||||
|
||||
try:
|
||||
payload = await request.json()
|
||||
except Exception:
|
||||
return Response("Invalid JSON body", status_code=400)
|
||||
|
||||
query = payload.get("query")
|
||||
if not query:
|
||||
return Response("'query' is required", status_code=400)
|
||||
|
||||
search_type = payload.get("search_type", "INSIGHTS")
|
||||
dataset = payload.get("dataset")
|
||||
|
||||
result = await executor.query_project_knowledge_api(
|
||||
query=query,
|
||||
search_type=search_type,
|
||||
dataset=dataset,
|
||||
)
|
||||
|
||||
status = 200 if not isinstance(result, dict) or "error" not in result else 400
|
||||
return Response(
|
||||
json.dumps(result, default=str),
|
||||
status_code=status,
|
||||
media_type="application/json",
|
||||
)
|
||||
|
||||
|
||||
async def create_file_artifact(request):
|
||||
"""Create an artifact from a project file via HTTP."""
|
||||
executor = getattr(create_file_artifact, '_executor', None)
|
||||
if not executor:
|
||||
return Response("File service not available", status_code=503)
|
||||
|
||||
try:
|
||||
payload = await request.json()
|
||||
except Exception:
|
||||
return Response("Invalid JSON body", status_code=400)
|
||||
|
||||
path = payload.get("path")
|
||||
if not path:
|
||||
return Response("'path' is required", status_code=400)
|
||||
|
||||
result = await executor.create_project_file_artifact_api(path)
|
||||
status = 200 if not isinstance(result, dict) or "error" not in result else 400
|
||||
return Response(
|
||||
json.dumps(result, default=str),
|
||||
status_code=status,
|
||||
media_type="application/json",
|
||||
)
|
||||
|
||||
|
||||
def _load_agent_card(agent_card: Optional[Union[AgentCard, str]]) -> Optional[AgentCard]:
|
||||
if agent_card is None:
|
||||
return None
|
||||
if isinstance(agent_card, AgentCard):
|
||||
return agent_card
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
path = Path(agent_card)
|
||||
with path.open('r', encoding='utf-8') as handle:
|
||||
data = json.load(handle)
|
||||
return AgentCard(**data)
|
||||
|
||||
|
||||
@a2a_experimental
|
||||
def create_a2a_app(
|
||||
agent: BaseAgent,
|
||||
*,
|
||||
host: str = "localhost",
|
||||
port: int = 8000,
|
||||
protocol: str = "http",
|
||||
agent_card: Optional[Union[AgentCard, str]] = None,
|
||||
executor=None, # Accept executor reference
|
||||
) -> Starlette:
|
||||
"""Variant of google.adk.a2a.utils.to_a2a that exposes task-store handles."""
|
||||
|
||||
setup_adk_logger(logging.INFO)
|
||||
|
||||
async def create_runner() -> Runner:
|
||||
return Runner(
|
||||
agent=agent,
|
||||
app_name=agent.name or "fuzzforge",
|
||||
artifact_service=InMemoryArtifactService(),
|
||||
session_service=InMemorySessionService(),
|
||||
memory_service=InMemoryMemoryService(),
|
||||
credential_service=InMemoryCredentialService(),
|
||||
)
|
||||
|
||||
task_store = InMemoryTaskStore()
|
||||
queue_manager = InMemoryQueueManager()
|
||||
|
||||
agent_executor = A2aAgentExecutor(runner=create_runner)
|
||||
request_handler = DefaultRequestHandler(
|
||||
agent_executor=agent_executor,
|
||||
task_store=task_store,
|
||||
queue_manager=queue_manager,
|
||||
)
|
||||
|
||||
rpc_url = f"{protocol}://{host}:{port}/"
|
||||
provided_card = _load_agent_card(agent_card)
|
||||
|
||||
card_builder = AgentCardBuilder(agent=agent, rpc_url=rpc_url)
|
||||
|
||||
app = Starlette()
|
||||
|
||||
async def setup() -> None:
|
||||
if provided_card is not None:
|
||||
final_card = provided_card
|
||||
else:
|
||||
final_card = await card_builder.build()
|
||||
|
||||
a2a_app = A2AStarletteApplication(
|
||||
agent_card=final_card,
|
||||
http_handler=request_handler,
|
||||
)
|
||||
a2a_app.add_routes_to_app(app)
|
||||
|
||||
# Add artifact serving route
|
||||
app.router.add_route("/artifacts/{artifact_id}", serve_artifact, methods=["GET"])
|
||||
app.router.add_route("/graph/query", knowledge_query, methods=["POST"])
|
||||
app.router.add_route("/project/files", create_file_artifact, methods=["POST"])
|
||||
|
||||
app.add_event_handler("startup", setup)
|
||||
|
||||
# Expose handles so the executor can emit task updates later
|
||||
FuzzForgeExecutor.task_store = task_store
|
||||
FuzzForgeExecutor.queue_manager = queue_manager
|
||||
|
||||
# Store reference to executor for artifact serving
|
||||
serve_artifact._executor = executor
|
||||
knowledge_query._executor = executor
|
||||
create_file_artifact._executor = executor
|
||||
|
||||
return app
|
||||
|
||||
|
||||
__all__ = ["create_a2a_app"]
|
||||
133
ai/src/fuzzforge_ai/agent.py
Normal file
133
ai/src/fuzzforge_ai/agent.py
Normal file
@@ -0,0 +1,133 @@
|
||||
"""
|
||||
FuzzForge Agent Definition
|
||||
The core agent that combines all components
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
from google.adk import Agent
|
||||
from google.adk.models.lite_llm import LiteLlm
|
||||
from .agent_card import get_fuzzforge_agent_card
|
||||
from .agent_executor import FuzzForgeExecutor
|
||||
from .memory_service import FuzzForgeMemoryService, HybridMemoryManager
|
||||
|
||||
# Load environment variables from the AI module's .env file
|
||||
try:
|
||||
from dotenv import load_dotenv
|
||||
_ai_dir = Path(__file__).parent
|
||||
_env_file = _ai_dir / ".env"
|
||||
if _env_file.exists():
|
||||
load_dotenv(_env_file, override=False) # Don't override existing env vars
|
||||
except ImportError:
|
||||
# dotenv not available, skip loading
|
||||
pass
|
||||
|
||||
|
||||
class FuzzForgeAgent:
|
||||
"""The main FuzzForge agent that combines card, executor, and ADK agent"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model: str = None,
|
||||
cognee_url: str = None,
|
||||
port: int = 10100,
|
||||
):
|
||||
"""Initialize FuzzForge agent with configuration"""
|
||||
self.model = model or os.getenv('LITELLM_MODEL', 'gpt-4o-mini')
|
||||
self.cognee_url = cognee_url or os.getenv('COGNEE_MCP_URL')
|
||||
self.port = port
|
||||
|
||||
# Initialize ADK Memory Service for conversational memory
|
||||
memory_type = os.getenv('MEMORY_SERVICE', 'inmemory')
|
||||
self.memory_service = FuzzForgeMemoryService(memory_type=memory_type)
|
||||
|
||||
# Create the executor (the brain) with memory and session services
|
||||
self.executor = FuzzForgeExecutor(
|
||||
model=self.model,
|
||||
cognee_url=self.cognee_url,
|
||||
debug=os.getenv('FUZZFORGE_DEBUG', '0') == '1',
|
||||
memory_service=self.memory_service,
|
||||
session_persistence=os.getenv('SESSION_PERSISTENCE', 'inmemory'),
|
||||
fuzzforge_mcp_url=os.getenv('FUZZFORGE_MCP_URL'),
|
||||
)
|
||||
|
||||
# Create Hybrid Memory Manager (ADK + Cognee direct integration)
|
||||
# MCP tools removed - using direct Cognee integration only
|
||||
self.memory_manager = HybridMemoryManager(
|
||||
memory_service=self.memory_service,
|
||||
cognee_tools=None # No MCP tools, direct integration used instead
|
||||
)
|
||||
|
||||
# Get the agent card (the identity)
|
||||
self.agent_card = get_fuzzforge_agent_card(f"http://localhost:{self.port}")
|
||||
|
||||
# Create the ADK agent (for A2A server mode)
|
||||
self.adk_agent = self._create_adk_agent()
|
||||
|
||||
def _create_adk_agent(self) -> Agent:
|
||||
"""Create the ADK agent for A2A server mode"""
|
||||
# Build instruction
|
||||
instruction = f"""You are {self.agent_card.name}, {self.agent_card.description}
|
||||
|
||||
Your capabilities include:
|
||||
"""
|
||||
for skill in self.agent_card.skills:
|
||||
instruction += f"\n- {skill.name}: {skill.description}"
|
||||
|
||||
instruction += """
|
||||
|
||||
When responding to requests:
|
||||
1. Use your registered agents when appropriate
|
||||
2. Use Cognee memory tools when available
|
||||
3. Provide helpful, concise responses
|
||||
4. Maintain context across conversations
|
||||
"""
|
||||
|
||||
# Create ADK agent
|
||||
return Agent(
|
||||
model=LiteLlm(model=self.model),
|
||||
name=self.agent_card.name,
|
||||
description=self.agent_card.description,
|
||||
instruction=instruction,
|
||||
tools=self.executor.agent.tools if hasattr(self.executor.agent, 'tools') else []
|
||||
)
|
||||
|
||||
async def process_message(self, message: str, context_id: str = None) -> str:
|
||||
"""Process a message using the executor"""
|
||||
result = await self.executor.execute(message, context_id or "default")
|
||||
return result.get("response", "No response generated")
|
||||
|
||||
async def register_agent(self, url: str) -> Dict[str, Any]:
|
||||
"""Register a new agent"""
|
||||
return await self.executor.register_agent(url)
|
||||
|
||||
def list_agents(self) -> List[Dict[str, Any]]:
|
||||
"""List registered agents"""
|
||||
return self.executor.list_agents()
|
||||
|
||||
async def cleanup(self):
|
||||
"""Clean up resources"""
|
||||
await self.executor.cleanup()
|
||||
|
||||
|
||||
# Create a singleton instance for import
|
||||
_instance = None
|
||||
|
||||
def get_fuzzforge_agent() -> FuzzForgeAgent:
|
||||
"""Get the singleton FuzzForge agent instance"""
|
||||
global _instance
|
||||
if _instance is None:
|
||||
_instance = FuzzForgeAgent()
|
||||
return _instance
|
||||
183
ai/src/fuzzforge_ai/agent_card.py
Normal file
183
ai/src/fuzzforge_ai/agent_card.py
Normal file
@@ -0,0 +1,183 @@
|
||||
"""
|
||||
FuzzForge Agent Card and Skills Definition
|
||||
Defines what FuzzForge can do and how others can discover it
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Optional, Dict, Any
|
||||
|
||||
@dataclass
|
||||
class AgentSkill:
|
||||
"""Represents a specific capability of the agent"""
|
||||
id: str
|
||||
name: str
|
||||
description: str
|
||||
tags: List[str]
|
||||
examples: List[str]
|
||||
input_modes: List[str] = None
|
||||
output_modes: List[str] = None
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary for JSON serialization"""
|
||||
return {
|
||||
"id": self.id,
|
||||
"name": self.name,
|
||||
"description": self.description,
|
||||
"tags": self.tags,
|
||||
"examples": self.examples,
|
||||
"inputModes": self.input_modes or ["text/plain"],
|
||||
"outputModes": self.output_modes or ["text/plain"]
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentCapabilities:
|
||||
"""Defines agent capabilities for A2A protocol"""
|
||||
streaming: bool = False
|
||||
push_notifications: bool = False
|
||||
multi_turn: bool = True
|
||||
context_retention: bool = True
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"streaming": self.streaming,
|
||||
"pushNotifications": self.push_notifications,
|
||||
"multiTurn": self.multi_turn,
|
||||
"contextRetention": self.context_retention
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentCard:
|
||||
"""The agent's business card - tells others what this agent can do"""
|
||||
name: str
|
||||
description: str
|
||||
version: str
|
||||
url: str
|
||||
skills: List[AgentSkill]
|
||||
capabilities: AgentCapabilities
|
||||
default_input_modes: List[str] = None
|
||||
default_output_modes: List[str] = None
|
||||
preferred_transport: str = "JSONRPC"
|
||||
protocol_version: str = "0.3.0"
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to A2A-compliant agent card JSON"""
|
||||
return {
|
||||
"name": self.name,
|
||||
"description": self.description,
|
||||
"version": self.version,
|
||||
"url": self.url,
|
||||
"protocolVersion": self.protocol_version,
|
||||
"preferredTransport": self.preferred_transport,
|
||||
"defaultInputModes": self.default_input_modes or ["text/plain"],
|
||||
"defaultOutputModes": self.default_output_modes or ["text/plain"],
|
||||
"capabilities": self.capabilities.to_dict(),
|
||||
"skills": [skill.to_dict() for skill in self.skills]
|
||||
}
|
||||
|
||||
|
||||
# Define FuzzForge's skills
|
||||
orchestration_skill = AgentSkill(
|
||||
id="orchestration",
|
||||
name="Agent Orchestration",
|
||||
description="Route requests to appropriate registered agents based on their capabilities",
|
||||
tags=["orchestration", "routing", "coordination"],
|
||||
examples=[
|
||||
"Route this to the calculator",
|
||||
"Send this to the appropriate agent",
|
||||
"Which agent should handle this?"
|
||||
]
|
||||
)
|
||||
|
||||
memory_skill = AgentSkill(
|
||||
id="memory",
|
||||
name="Memory Management",
|
||||
description="Store and retrieve information using Cognee knowledge graph",
|
||||
tags=["memory", "knowledge", "storage", "cognee"],
|
||||
examples=[
|
||||
"Remember that my favorite color is blue",
|
||||
"What do you remember about me?",
|
||||
"Search your memory for project details"
|
||||
]
|
||||
)
|
||||
|
||||
conversation_skill = AgentSkill(
|
||||
id="conversation",
|
||||
name="General Conversation",
|
||||
description="Engage in general conversation and answer questions using LLM",
|
||||
tags=["chat", "conversation", "qa", "llm"],
|
||||
examples=[
|
||||
"What is the meaning of life?",
|
||||
"Explain quantum computing",
|
||||
"Help me understand this concept"
|
||||
]
|
||||
)
|
||||
|
||||
workflow_automation_skill = AgentSkill(
|
||||
id="workflow_automation",
|
||||
name="Workflow Automation",
|
||||
description="Operate project workflows via MCP, monitor runs, and share results",
|
||||
tags=["workflow", "automation", "mcp", "orchestration"],
|
||||
examples=[
|
||||
"Submit the security assessment workflow",
|
||||
"Kick off the infrastructure scan and monitor it",
|
||||
"Summarise findings for run abc123"
|
||||
]
|
||||
)
|
||||
|
||||
agent_management_skill = AgentSkill(
|
||||
id="agent_management",
|
||||
name="Agent Registry Management",
|
||||
description="Register, list, and manage connections to other A2A agents",
|
||||
tags=["registry", "management", "discovery"],
|
||||
examples=[
|
||||
"Register agent at http://localhost:10201",
|
||||
"List all registered agents",
|
||||
"Show agent capabilities"
|
||||
]
|
||||
)
|
||||
|
||||
# Define FuzzForge's capabilities
|
||||
fuzzforge_capabilities = AgentCapabilities(
|
||||
streaming=False,
|
||||
push_notifications=True,
|
||||
multi_turn=True, # We support multi-turn conversations
|
||||
context_retention=True # We maintain context across turns
|
||||
)
|
||||
|
||||
# Create the public agent card
|
||||
def get_fuzzforge_agent_card(url: str = "http://localhost:10100") -> AgentCard:
|
||||
"""Get FuzzForge's agent card with current configuration"""
|
||||
return AgentCard(
|
||||
name="ProjectOrchestrator",
|
||||
description=(
|
||||
"An A2A-capable project agent that can launch and monitor FuzzForge workflows, "
|
||||
"consult the project knowledge graph, and coordinate with speciality agents."
|
||||
),
|
||||
version="project-agent",
|
||||
url=url,
|
||||
skills=[
|
||||
orchestration_skill,
|
||||
memory_skill,
|
||||
conversation_skill,
|
||||
workflow_automation_skill,
|
||||
agent_management_skill
|
||||
],
|
||||
capabilities=fuzzforge_capabilities,
|
||||
default_input_modes=["text/plain", "application/json"],
|
||||
default_output_modes=["text/plain", "application/json"],
|
||||
preferred_transport="JSONRPC",
|
||||
protocol_version="0.3.0"
|
||||
)
|
||||
2319
ai/src/fuzzforge_ai/agent_executor.py
Normal file
2319
ai/src/fuzzforge_ai/agent_executor.py
Normal file
File diff suppressed because it is too large
Load Diff
977
ai/src/fuzzforge_ai/cli.py
Executable file
977
ai/src/fuzzforge_ai/cli.py
Executable file
@@ -0,0 +1,977 @@
|
||||
#!/usr/bin/env python3
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
"""
|
||||
FuzzForge CLI - Clean modular version
|
||||
Uses the separated agent components
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import shlex
|
||||
import os
|
||||
import sys
|
||||
import signal
|
||||
import warnings
|
||||
import logging
|
||||
import random
|
||||
from datetime import datetime
|
||||
from contextlib import contextmanager
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Ensure Cognee writes logs inside the project workspace
|
||||
project_root = Path.cwd()
|
||||
default_log_dir = project_root / ".fuzzforge" / "logs"
|
||||
default_log_dir.mkdir(parents=True, exist_ok=True)
|
||||
log_path = default_log_dir / "cognee.log"
|
||||
os.environ.setdefault("COGNEE_LOG_PATH", str(log_path))
|
||||
|
||||
# Suppress warnings
|
||||
warnings.filterwarnings("ignore")
|
||||
logging.basicConfig(level=logging.ERROR)
|
||||
|
||||
# Load .env file with explicit path handling
|
||||
# 1. First check current working directory for .fuzzforge/.env
|
||||
fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
|
||||
if fuzzforge_env.exists():
|
||||
load_dotenv(fuzzforge_env, override=True)
|
||||
else:
|
||||
# 2. Then check parent directories for .fuzzforge projects
|
||||
current_path = Path.cwd()
|
||||
for parent in [current_path] + list(current_path.parents):
|
||||
fuzzforge_dir = parent / ".fuzzforge"
|
||||
if fuzzforge_dir.exists():
|
||||
project_env = fuzzforge_dir / ".env"
|
||||
if project_env.exists():
|
||||
load_dotenv(project_env, override=True)
|
||||
break
|
||||
else:
|
||||
# 3. Fallback to generic load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
# Enhanced readline configuration for Rich Console input compatibility
|
||||
try:
|
||||
import readline
|
||||
# Enable Rich-compatible input features
|
||||
readline.parse_and_bind("tab: complete")
|
||||
readline.parse_and_bind("set editing-mode emacs")
|
||||
readline.parse_and_bind("set show-all-if-ambiguous on")
|
||||
readline.parse_and_bind("set completion-ignore-case on")
|
||||
readline.parse_and_bind("set colored-completion-prefix on")
|
||||
readline.parse_and_bind("set enable-bracketed-paste on") # Better paste support
|
||||
# Navigation bindings for better editing
|
||||
readline.parse_and_bind("Control-a: beginning-of-line")
|
||||
readline.parse_and_bind("Control-e: end-of-line")
|
||||
readline.parse_and_bind("Control-u: unix-line-discard")
|
||||
readline.parse_and_bind("Control-k: kill-line")
|
||||
readline.parse_and_bind("Control-w: unix-word-rubout")
|
||||
readline.parse_and_bind("Meta-Backspace: backward-kill-word")
|
||||
# History and completion
|
||||
readline.set_history_length(2000)
|
||||
readline.set_startup_hook(None)
|
||||
# Enable multiline editing hints
|
||||
readline.parse_and_bind("set horizontal-scroll-mode off")
|
||||
readline.parse_and_bind("set mark-symlinked-directories on")
|
||||
READLINE_AVAILABLE = True
|
||||
except ImportError:
|
||||
READLINE_AVAILABLE = False
|
||||
|
||||
from rich.console import Console
|
||||
from rich.table import Table
|
||||
from rich.panel import Panel
|
||||
from rich.prompt import Prompt
|
||||
from rich import box
|
||||
|
||||
from google.adk.events.event import Event
|
||||
from google.adk.events.event_actions import EventActions
|
||||
from google.genai import types as gen_types
|
||||
|
||||
from .agent import FuzzForgeAgent
|
||||
from .agent_card import get_fuzzforge_agent_card
|
||||
from .config_manager import ConfigManager
|
||||
from .config_bridge import ProjectConfigManager
|
||||
from .remote_agent import RemoteAgentConnection
|
||||
|
||||
console = Console()
|
||||
|
||||
# Global shutdown flag
|
||||
shutdown_requested = False
|
||||
|
||||
# Dynamic status messages for better UX
|
||||
THINKING_MESSAGES = [
|
||||
"Thinking", "Processing", "Computing", "Analyzing", "Working",
|
||||
"Pondering", "Deliberating", "Calculating", "Reasoning", "Evaluating"
|
||||
]
|
||||
|
||||
WORKING_MESSAGES = [
|
||||
"Working", "Processing", "Handling", "Executing", "Running",
|
||||
"Operating", "Performing", "Conducting", "Managing", "Coordinating"
|
||||
]
|
||||
|
||||
SEARCH_MESSAGES = [
|
||||
"Searching", "Scanning", "Exploring", "Investigating", "Hunting",
|
||||
"Seeking", "Probing", "Examining", "Inspecting", "Browsing"
|
||||
]
|
||||
|
||||
# Cool prompt symbols
|
||||
PROMPT_STYLES = [
|
||||
"▶", "❯", "➤", "→", "»", "⟩", "▷", "⇨", "⟶", "◆"
|
||||
]
|
||||
|
||||
def get_dynamic_status(action_type="thinking"):
|
||||
"""Get a random status message based on action type"""
|
||||
if action_type == "thinking":
|
||||
return f"{random.choice(THINKING_MESSAGES)}..."
|
||||
elif action_type == "working":
|
||||
return f"{random.choice(WORKING_MESSAGES)}..."
|
||||
elif action_type == "searching":
|
||||
return f"{random.choice(SEARCH_MESSAGES)}..."
|
||||
else:
|
||||
return f"{random.choice(THINKING_MESSAGES)}..."
|
||||
|
||||
def get_prompt_symbol():
|
||||
"""Get prompt symbol indicating where to write"""
|
||||
return ">>"
|
||||
|
||||
def signal_handler(signum, frame):
|
||||
"""Handle Ctrl+C gracefully"""
|
||||
global shutdown_requested
|
||||
shutdown_requested = True
|
||||
console.print("\n\n[yellow]Shutting down gracefully...[/yellow]")
|
||||
sys.exit(0)
|
||||
|
||||
signal.signal(signal.SIGINT, signal_handler)
|
||||
|
||||
@contextmanager
|
||||
def safe_status(message: str):
|
||||
"""Safe status context manager"""
|
||||
status = console.status(message, spinner="dots")
|
||||
try:
|
||||
status.start()
|
||||
yield
|
||||
finally:
|
||||
status.stop()
|
||||
|
||||
|
||||
class FuzzForgeCLI:
|
||||
"""Command-line interface for FuzzForge"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the CLI"""
|
||||
# Ensure .env is loaded from .fuzzforge directory
|
||||
fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
|
||||
if fuzzforge_env.exists():
|
||||
load_dotenv(fuzzforge_env, override=True)
|
||||
|
||||
# Load configuration for agent registry
|
||||
self.config_manager = ConfigManager()
|
||||
|
||||
# Check environment configuration
|
||||
if not os.getenv('LITELLM_MODEL'):
|
||||
console.print("[red]ERROR: LITELLM_MODEL not set in .env file[/red]")
|
||||
console.print("Please set LITELLM_MODEL to your desired model")
|
||||
sys.exit(1)
|
||||
|
||||
# Create the agent (uses env vars directly)
|
||||
self.agent = FuzzForgeAgent()
|
||||
|
||||
# Create a consistent context ID for this CLI session
|
||||
self.context_id = f"cli_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
|
||||
|
||||
# Track registered agents for config persistence
|
||||
self.agents_modified = False
|
||||
|
||||
# Command handlers
|
||||
self.commands = {
|
||||
"/help": self.cmd_help,
|
||||
"/register": self.cmd_register,
|
||||
"/unregister": self.cmd_unregister,
|
||||
"/list": self.cmd_list,
|
||||
"/memory": self.cmd_memory,
|
||||
"/recall": self.cmd_recall,
|
||||
"/artifacts": self.cmd_artifacts,
|
||||
"/tasks": self.cmd_tasks,
|
||||
"/skills": self.cmd_skills,
|
||||
"/sessions": self.cmd_sessions,
|
||||
"/clear": self.cmd_clear,
|
||||
"/sendfile": self.cmd_sendfile,
|
||||
"/quit": self.cmd_quit,
|
||||
"/exit": self.cmd_quit,
|
||||
}
|
||||
|
||||
self.background_tasks: set[asyncio.Task] = set()
|
||||
|
||||
def print_banner(self):
|
||||
"""Print welcome banner"""
|
||||
card = self.agent.agent_card
|
||||
|
||||
# Print ASCII banner
|
||||
console.print("[medium_purple3] ███████╗██╗ ██╗███████╗███████╗███████╗ ██████╗ ██████╗ ██████╗ ███████╗ █████╗ ██╗[/medium_purple3]")
|
||||
console.print("[medium_purple3] ██╔════╝██║ ██║╚══███╔╝╚══███╔╝██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔══██╗██║[/medium_purple3]")
|
||||
console.print("[medium_purple3] █████╗ ██║ ██║ ███╔╝ ███╔╝ █████╗ ██║ ██║██████╔╝██║ ███╗█████╗ ███████║██║[/medium_purple3]")
|
||||
console.print("[medium_purple3] ██╔══╝ ██║ ██║ ███╔╝ ███╔╝ ██╔══╝ ██║ ██║██╔══██╗██║ ██║██╔══╝ ██╔══██║██║[/medium_purple3]")
|
||||
console.print("[medium_purple3] ██║ ╚██████╔╝███████╗███████╗██║ ╚██████╔╝██║ ██║╚██████╔╝███████╗ ██║ ██║██║[/medium_purple3]")
|
||||
console.print("[medium_purple3] ╚═╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═╝╚═╝[/medium_purple3]")
|
||||
console.print(f"\n[dim]{card.description}[/dim]\n")
|
||||
|
||||
provider = (
|
||||
os.getenv("LLM_PROVIDER")
|
||||
or os.getenv("LLM_COGNEE_PROVIDER")
|
||||
or os.getenv("COGNEE_LLM_PROVIDER")
|
||||
or "unknown"
|
||||
)
|
||||
|
||||
console.print(
|
||||
"LLM Provider: [medium_purple1]{provider}[/medium_purple1]".format(
|
||||
provider=provider
|
||||
)
|
||||
)
|
||||
console.print(
|
||||
"LLM Model: [medium_purple1]{model}[/medium_purple1]".format(
|
||||
model=self.agent.model
|
||||
)
|
||||
)
|
||||
if self.agent.executor.agentops_trace:
|
||||
console.print(f"Tracking: [medium_purple1]AgentOps active[/medium_purple1]")
|
||||
|
||||
# Show skills
|
||||
console.print("\nSkills:")
|
||||
for skill in card.skills:
|
||||
console.print(
|
||||
f" • [deep_sky_blue1]{skill.name}[/deep_sky_blue1] – {skill.description}"
|
||||
)
|
||||
console.print("\nType /help for commands or just chat\n")
|
||||
|
||||
async def cmd_help(self, args: str = "") -> None:
|
||||
"""Show help"""
|
||||
help_text = """
|
||||
[bold]Commands:[/bold]
|
||||
/register <url> - Register an A2A agent (saves to config)
|
||||
/unregister <name> - Remove agent from registry and config
|
||||
/list - List registered agents
|
||||
|
||||
[bold]Memory Systems:[/bold]
|
||||
/recall <query> - Search past conversations (ADK Memory)
|
||||
/memory - Show knowledge graph (Cognee)
|
||||
/memory save - Save to knowledge graph
|
||||
/memory search - Search knowledge graph
|
||||
|
||||
[bold]Other:[/bold]
|
||||
/artifacts - List created artifacts
|
||||
/artifacts <id> - Show artifact content
|
||||
/tasks [id] - Show task list or details
|
||||
/skills - Show FuzzForge skills
|
||||
/sessions - List active sessions
|
||||
/sendfile <agent> <path> [message] - Attach file as artifact and route to agent
|
||||
/clear - Clear screen
|
||||
/help - Show this help
|
||||
/quit - Exit
|
||||
|
||||
[bold]Sample prompts:[/bold]
|
||||
run fuzzforge workflow security_assessment on /absolute/path --volume-mode ro
|
||||
list fuzzforge runs limit=5
|
||||
get fuzzforge summary <run_id>
|
||||
query project knowledge about "unsafe Rust" using GRAPH_COMPLETION
|
||||
export project file src/lib.rs as artifact
|
||||
/memory search "recent findings"
|
||||
|
||||
[bold]Input Editing:[/bold]
|
||||
Arrow keys - Move cursor
|
||||
Ctrl+A/E - Start/end of line
|
||||
Up/Down - Command history
|
||||
"""
|
||||
console.print(help_text)
|
||||
|
||||
async def cmd_register(self, args: str) -> None:
|
||||
"""Register an agent"""
|
||||
if not args:
|
||||
console.print("Usage: /register <url>")
|
||||
return
|
||||
|
||||
with safe_status(f"{get_dynamic_status('working')} Registering {args}"):
|
||||
result = await self.agent.register_agent(args.strip())
|
||||
|
||||
if result["success"]:
|
||||
console.print(f"✅ Registered: [bold]{result['name']}[/bold]")
|
||||
console.print(f" Capabilities: {result['capabilities']} skills")
|
||||
|
||||
# Get description from the agent's card
|
||||
agents = self.agent.list_agents()
|
||||
description = ""
|
||||
for agent in agents:
|
||||
if agent['name'] == result['name']:
|
||||
description = agent.get('description', '')
|
||||
break
|
||||
|
||||
# Add to config for persistence
|
||||
self.config_manager.add_registered_agent(
|
||||
name=result['name'],
|
||||
url=args.strip(),
|
||||
description=description
|
||||
)
|
||||
console.print(f" [dim]Saved to config for auto-registration[/dim]")
|
||||
else:
|
||||
console.print(f"[red]Failed: {result['error']}[/red]")
|
||||
|
||||
async def cmd_unregister(self, args: str) -> None:
|
||||
"""Unregister an agent and remove from config"""
|
||||
if not args:
|
||||
console.print("Usage: /unregister <name or url>")
|
||||
return
|
||||
|
||||
# Try to find the agent
|
||||
agents = self.agent.list_agents()
|
||||
agent_to_remove = None
|
||||
|
||||
for agent in agents:
|
||||
if agent['name'].lower() == args.lower() or agent['url'] == args:
|
||||
agent_to_remove = agent
|
||||
break
|
||||
|
||||
if not agent_to_remove:
|
||||
console.print(f"[yellow]Agent '{args}' not found[/yellow]")
|
||||
return
|
||||
|
||||
# Remove from config
|
||||
if self.config_manager.remove_registered_agent(name=agent_to_remove['name'], url=agent_to_remove['url']):
|
||||
console.print(f"✅ Unregistered: [bold]{agent_to_remove['name']}[/bold]")
|
||||
console.print(f" [dim]Removed from config (won't auto-register next time)[/dim]")
|
||||
else:
|
||||
console.print(f"[yellow]Agent unregistered from session but not found in config[/yellow]")
|
||||
|
||||
async def cmd_list(self, args: str = "") -> None:
|
||||
"""List registered agents"""
|
||||
agents = self.agent.list_agents()
|
||||
|
||||
if not agents:
|
||||
console.print("No agents registered. Use /register <url>")
|
||||
return
|
||||
|
||||
table = Table(title="Registered Agents", box=box.ROUNDED)
|
||||
table.add_column("Name", style="medium_purple3")
|
||||
table.add_column("URL", style="deep_sky_blue3")
|
||||
table.add_column("Skills", style="plum3")
|
||||
table.add_column("Description", style="dim")
|
||||
|
||||
for agent in agents:
|
||||
desc = agent['description']
|
||||
if len(desc) > 40:
|
||||
desc = desc[:37] + "..."
|
||||
table.add_row(
|
||||
agent['name'],
|
||||
agent['url'],
|
||||
str(agent['skills']),
|
||||
desc
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
|
||||
async def cmd_recall(self, args: str = "") -> None:
|
||||
"""Search conversational memory (past conversations)"""
|
||||
if not args:
|
||||
console.print("Usage: /recall <query>")
|
||||
return
|
||||
|
||||
await self._sync_conversational_memory()
|
||||
|
||||
# First try MemoryService (for ingested memories)
|
||||
with safe_status(get_dynamic_status('searching')):
|
||||
results = await self.agent.memory_manager.search_conversational_memory(args)
|
||||
|
||||
if results and results.memories:
|
||||
console.print(f"[bold]Found {len(results.memories)} memories:[/bold]\n")
|
||||
for i, memory in enumerate(results.memories, 1):
|
||||
# MemoryEntry has 'text' field, not 'content'
|
||||
text = getattr(memory, 'text', str(memory))
|
||||
if len(text) > 200:
|
||||
text = text[:200] + "..."
|
||||
console.print(f"{i}. {text}")
|
||||
else:
|
||||
# If MemoryService is empty, search SQLite directly
|
||||
console.print("[yellow]No memories in MemoryService, searching SQLite sessions...[/yellow]")
|
||||
|
||||
# Check if using DatabaseSessionService
|
||||
if hasattr(self.agent.executor, 'session_service'):
|
||||
service_type = type(self.agent.executor.session_service).__name__
|
||||
if service_type == 'DatabaseSessionService':
|
||||
# Search SQLite database directly
|
||||
import sqlite3
|
||||
import os
|
||||
db_path = os.getenv('SESSION_DB_PATH', './fuzzforge_sessions.db')
|
||||
|
||||
if os.path.exists(db_path):
|
||||
conn = sqlite3.connect(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Search in events table
|
||||
query = f"%{args}%"
|
||||
cursor.execute(
|
||||
"SELECT content FROM events WHERE content LIKE ? LIMIT 10",
|
||||
(query,)
|
||||
)
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
if rows:
|
||||
console.print(f"[green]Found {len(rows)} matches in SQLite sessions:[/green]\n")
|
||||
for i, (content,) in enumerate(rows, 1):
|
||||
# Parse JSON content
|
||||
import json
|
||||
try:
|
||||
data = json.loads(content)
|
||||
if 'parts' in data and data['parts']:
|
||||
text = data['parts'][0].get('text', '')[:150]
|
||||
role = data.get('role', 'unknown')
|
||||
console.print(f"{i}. [{role}]: {text}...")
|
||||
except:
|
||||
console.print(f"{i}. {content[:150]}...")
|
||||
else:
|
||||
console.print("[yellow]No matches found in SQLite either[/yellow]")
|
||||
else:
|
||||
console.print("[yellow]SQLite database not found[/yellow]")
|
||||
else:
|
||||
console.print(f"[dim]Using {service_type} (not searchable)[/dim]")
|
||||
else:
|
||||
console.print("[yellow]No session history available[/yellow]")
|
||||
|
||||
async def cmd_memory(self, args: str = "") -> None:
|
||||
"""Inspect conversational memory and knowledge graph state."""
|
||||
raw_args = (args or "").strip()
|
||||
lower_args = raw_args.lower()
|
||||
|
||||
if not raw_args or lower_args in {"status", "info"}:
|
||||
await self._show_memory_status()
|
||||
return
|
||||
|
||||
if lower_args == "datasets":
|
||||
await self._show_dataset_summary()
|
||||
return
|
||||
|
||||
if lower_args.startswith("search ") or lower_args.startswith("recall "):
|
||||
query = raw_args.split(" ", 1)[1].strip() if " " in raw_args else ""
|
||||
if not query:
|
||||
console.print("Usage: /memory search <query>")
|
||||
return
|
||||
await self.cmd_recall(query)
|
||||
return
|
||||
|
||||
console.print("Usage: /memory [status|datasets|search <query>]")
|
||||
console.print("[dim]/memory search <query> is an alias for /recall <query>[/dim]")
|
||||
|
||||
async def _sync_conversational_memory(self) -> None:
|
||||
"""Ensure the ADK memory service ingests any completed sessions."""
|
||||
memory_service = getattr(self.agent.memory_manager, "memory_service", None)
|
||||
executor_sessions = getattr(self.agent.executor, "sessions", {})
|
||||
metadata_map = getattr(self.agent.executor, "session_metadata", {})
|
||||
|
||||
if not memory_service or not executor_sessions:
|
||||
return
|
||||
|
||||
for context_id, session in list(executor_sessions.items()):
|
||||
meta = metadata_map.get(context_id, {})
|
||||
if meta.get('memory_synced'):
|
||||
continue
|
||||
|
||||
add_session = getattr(memory_service, "add_session_to_memory", None)
|
||||
if not callable(add_session):
|
||||
return
|
||||
|
||||
try:
|
||||
await add_session(session)
|
||||
meta['memory_synced'] = True
|
||||
metadata_map[context_id] = meta
|
||||
except Exception as exc: # pragma: no cover - defensive logging
|
||||
if os.getenv('FUZZFORGE_DEBUG', '0') == '1':
|
||||
console.print(f"[yellow]Memory sync failed:[/yellow] {exc}")
|
||||
|
||||
async def _show_memory_status(self) -> None:
|
||||
"""Render conversational memory, session store, and knowledge graph status."""
|
||||
await self._sync_conversational_memory()
|
||||
|
||||
status = self.agent.memory_manager.get_status()
|
||||
|
||||
conversational = status.get("conversational_memory", {})
|
||||
conv_type = conversational.get("type", "unknown")
|
||||
conv_active = "yes" if conversational.get("active") else "no"
|
||||
conv_details = conversational.get("details", "")
|
||||
|
||||
session_service = getattr(self.agent.executor, "session_service", None)
|
||||
session_service_name = type(session_service).__name__ if session_service else "Unavailable"
|
||||
|
||||
session_lines = [
|
||||
f"[bold]Service:[/bold] {session_service_name}"
|
||||
]
|
||||
|
||||
session_count = None
|
||||
event_count = None
|
||||
db_path_display = None
|
||||
|
||||
if session_service_name == "DatabaseSessionService":
|
||||
import sqlite3
|
||||
|
||||
db_path = os.getenv('SESSION_DB_PATH', './fuzzforge_sessions.db')
|
||||
session_path = Path(db_path).expanduser().resolve()
|
||||
db_path_display = str(session_path)
|
||||
|
||||
if session_path.exists():
|
||||
try:
|
||||
with sqlite3.connect(session_path) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT COUNT(*) FROM sessions")
|
||||
session_count = cursor.fetchone()[0]
|
||||
cursor.execute("SELECT COUNT(*) FROM events")
|
||||
event_count = cursor.fetchone()[0]
|
||||
except Exception as exc:
|
||||
session_lines.append(f"[yellow]Warning:[/yellow] Unable to read session database ({exc})")
|
||||
else:
|
||||
session_lines.append("[yellow]SQLite session database not found yet[/yellow]")
|
||||
|
||||
elif session_service_name == "InMemorySessionService":
|
||||
session_lines.append("[dim]Session data persists for the current process only[/dim]")
|
||||
|
||||
if db_path_display:
|
||||
session_lines.append(f"[bold]Database:[/bold] {db_path_display}")
|
||||
if session_count is not None:
|
||||
session_lines.append(f"[bold]Sessions Recorded:[/bold] {session_count}")
|
||||
if event_count is not None:
|
||||
session_lines.append(f"[bold]Events Logged:[/bold] {event_count}")
|
||||
|
||||
conv_lines = [
|
||||
f"[bold]Type:[/bold] {conv_type}",
|
||||
f"[bold]Active:[/bold] {conv_active}"
|
||||
]
|
||||
if conv_details:
|
||||
conv_lines.append(f"[bold]Details:[/bold] {conv_details}")
|
||||
|
||||
console.print(Panel("\n".join(conv_lines), title="Conversation Memory", border_style="medium_purple3"))
|
||||
console.print(Panel("\n".join(session_lines), title="Session Store", border_style="deep_sky_blue3"))
|
||||
|
||||
# Knowledge graph section
|
||||
knowledge = status.get("knowledge_graph", {})
|
||||
kg_active = knowledge.get("active", False)
|
||||
kg_lines = [
|
||||
f"[bold]Active:[/bold] {'yes' if kg_active else 'no'}",
|
||||
f"[bold]Purpose:[/bold] {knowledge.get('purpose', 'N/A')}"
|
||||
]
|
||||
|
||||
cognee_data = None
|
||||
cognee_error = None
|
||||
try:
|
||||
project_config = ProjectConfigManager()
|
||||
cognee_data = project_config.get_cognee_config()
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
cognee_error = str(exc)
|
||||
|
||||
if cognee_data:
|
||||
data_dir = cognee_data.get('data_directory')
|
||||
system_dir = cognee_data.get('system_directory')
|
||||
if data_dir:
|
||||
kg_lines.append(f"[bold]Data dir:[/bold] {data_dir}")
|
||||
if system_dir:
|
||||
kg_lines.append(f"[bold]System dir:[/bold] {system_dir}")
|
||||
elif cognee_error:
|
||||
kg_lines.append(f"[yellow]Config unavailable:[/yellow] {cognee_error}")
|
||||
|
||||
dataset_summary = None
|
||||
if kg_active:
|
||||
try:
|
||||
integration = await self.agent.executor._get_knowledge_integration()
|
||||
if integration:
|
||||
dataset_summary = await integration.list_datasets()
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
kg_lines.append(f"[yellow]Dataset listing failed:[/yellow] {exc}")
|
||||
|
||||
if dataset_summary:
|
||||
if dataset_summary.get("error"):
|
||||
kg_lines.append(f"[yellow]Dataset listing failed:[/yellow] {dataset_summary['error']}")
|
||||
else:
|
||||
datasets = dataset_summary.get("datasets", [])
|
||||
total = dataset_summary.get("total_datasets")
|
||||
if total is not None:
|
||||
kg_lines.append(f"[bold]Datasets:[/bold] {total}")
|
||||
if datasets:
|
||||
preview = ", ".join(sorted(datasets)[:5])
|
||||
if len(datasets) > 5:
|
||||
preview += ", …"
|
||||
kg_lines.append(f"[bold]Samples:[/bold] {preview}")
|
||||
else:
|
||||
kg_lines.append("[dim]Run `fuzzforge ingest` to populate the knowledge graph[/dim]")
|
||||
|
||||
console.print(Panel("\n".join(kg_lines), title="Knowledge Graph", border_style="spring_green4"))
|
||||
console.print("\n[dim]Subcommands: /memory datasets | /memory search <query>[/dim]")
|
||||
|
||||
async def _show_dataset_summary(self) -> None:
|
||||
"""List datasets available in the Cognee knowledge graph."""
|
||||
try:
|
||||
integration = await self.agent.executor._get_knowledge_integration()
|
||||
except Exception as exc:
|
||||
console.print(f"[yellow]Knowledge graph unavailable:[/yellow] {exc}")
|
||||
return
|
||||
|
||||
if not integration:
|
||||
console.print("[yellow]Knowledge graph is not initialised yet.[/yellow]")
|
||||
console.print("[dim]Run `fuzzforge ingest --path . --recursive` to create the project dataset.[/dim]")
|
||||
return
|
||||
|
||||
with safe_status(get_dynamic_status('searching')):
|
||||
dataset_info = await integration.list_datasets()
|
||||
|
||||
if dataset_info.get("error"):
|
||||
console.print(f"[red]{dataset_info['error']}[/red]")
|
||||
return
|
||||
|
||||
datasets = dataset_info.get("datasets", [])
|
||||
if not datasets:
|
||||
console.print("[yellow]No datasets found.[/yellow]")
|
||||
console.print("[dim]Run `fuzzforge ingest` to populate the knowledge graph.[/dim]")
|
||||
return
|
||||
|
||||
table = Table(title="Cognee Datasets", box=box.ROUNDED)
|
||||
table.add_column("Dataset", style="medium_purple3")
|
||||
table.add_column("Notes", style="dim")
|
||||
|
||||
for name in sorted(datasets):
|
||||
note = ""
|
||||
if name.endswith("_codebase"):
|
||||
note = "primary project dataset"
|
||||
table.add_row(name, note)
|
||||
|
||||
console.print(table)
|
||||
console.print(
|
||||
"[dim]Use knowledge graph prompts (e.g. `search project knowledge for \"topic\" using INSIGHTS`) to query these datasets.[/dim]"
|
||||
)
|
||||
|
||||
async def cmd_artifacts(self, args: str = "") -> None:
|
||||
"""List or show artifacts"""
|
||||
if args:
|
||||
# Show specific artifact
|
||||
artifacts = await self.agent.executor.get_artifacts(self.context_id)
|
||||
for artifact in artifacts:
|
||||
if artifact['id'] == args or args in artifact['id']:
|
||||
console.print(Panel(
|
||||
f"[bold]{artifact['title']}[/bold]\n"
|
||||
f"Type: {artifact['type']} | Created: {artifact['created_at'][:19]}\n\n"
|
||||
f"[code]{artifact['content']}[/code]",
|
||||
title=f"Artifact: {artifact['id']}",
|
||||
border_style="medium_purple3"
|
||||
))
|
||||
return
|
||||
console.print(f"[yellow]Artifact {args} not found[/yellow]")
|
||||
return
|
||||
|
||||
# List all artifacts
|
||||
artifacts = await self.agent.executor.get_artifacts(self.context_id)
|
||||
|
||||
if not artifacts:
|
||||
console.print("No artifacts created yet")
|
||||
console.print("[dim]Artifacts are created when generating code, configs, or documents[/dim]")
|
||||
return
|
||||
|
||||
table = Table(title="Artifacts", box=box.ROUNDED)
|
||||
table.add_column("ID", style="medium_purple3")
|
||||
table.add_column("Type", style="deep_sky_blue3")
|
||||
table.add_column("Title", style="plum3")
|
||||
table.add_column("Size", style="dim")
|
||||
table.add_column("Created", style="dim")
|
||||
|
||||
for artifact in artifacts:
|
||||
size = f"{len(artifact['content'])} chars"
|
||||
created = artifact['created_at'][:19] # Just date and time
|
||||
|
||||
table.add_row(
|
||||
artifact['id'],
|
||||
artifact['type'],
|
||||
artifact['title'][:40] + "..." if len(artifact['title']) > 40 else artifact['title'],
|
||||
size,
|
||||
created
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
console.print(f"\n[dim]Use /artifacts <id> to view artifact content[/dim]")
|
||||
|
||||
async def cmd_tasks(self, args: str = "") -> None:
|
||||
"""List tasks or show details for a specific task."""
|
||||
store = getattr(self.agent.executor, "task_store", None)
|
||||
if not store or not hasattr(store, "tasks"):
|
||||
console.print("Task store not available")
|
||||
return
|
||||
|
||||
task_id = args.strip()
|
||||
|
||||
async with store.lock:
|
||||
tasks = dict(store.tasks)
|
||||
|
||||
if not tasks:
|
||||
console.print("No tasks recorded yet")
|
||||
return
|
||||
|
||||
if task_id:
|
||||
task = tasks.get(task_id)
|
||||
if not task:
|
||||
console.print(f"Task '{task_id}' not found")
|
||||
return
|
||||
|
||||
state_str = task.status.state.value if hasattr(task.status.state, "value") else str(task.status.state)
|
||||
console.print(f"\n[bold]Task {task.id}[/bold]")
|
||||
console.print(f"Context: {task.context_id}")
|
||||
console.print(f"State: {state_str}")
|
||||
console.print(f"Timestamp: {task.status.timestamp}")
|
||||
if task.metadata:
|
||||
console.print("Metadata:")
|
||||
for key, value in task.metadata.items():
|
||||
console.print(f" • {key}: {value}")
|
||||
if task.history:
|
||||
console.print("History:")
|
||||
for entry in task.history[-5:]:
|
||||
text = getattr(entry, "text", None)
|
||||
if not text and hasattr(entry, "parts"):
|
||||
text = " ".join(
|
||||
getattr(part, "text", "") for part in getattr(entry, "parts", [])
|
||||
)
|
||||
console.print(f" - {text}")
|
||||
return
|
||||
|
||||
table = Table(title="FuzzForge Tasks", box=box.ROUNDED)
|
||||
table.add_column("ID", style="medium_purple3")
|
||||
table.add_column("State", style="white")
|
||||
table.add_column("Workflow", style="deep_sky_blue3")
|
||||
table.add_column("Updated", style="green")
|
||||
|
||||
for task in tasks.values():
|
||||
state_value = task.status.state.value if hasattr(task.status.state, "value") else str(task.status.state)
|
||||
workflow = ""
|
||||
if task.metadata:
|
||||
workflow = task.metadata.get("workflow") or task.metadata.get("workflow_name") or ""
|
||||
timestamp = task.status.timestamp if task.status else ""
|
||||
table.add_row(task.id, state_value, workflow, timestamp)
|
||||
|
||||
console.print(table)
|
||||
console.print("\n[dim]Use /tasks <id> to view task details[/dim]")
|
||||
|
||||
async def cmd_sessions(self, args: str = "") -> None:
|
||||
"""List active sessions"""
|
||||
sessions = self.agent.executor.sessions
|
||||
|
||||
if not sessions:
|
||||
console.print("No active sessions")
|
||||
return
|
||||
|
||||
table = Table(title="Active Sessions", box=box.ROUNDED)
|
||||
table.add_column("Context ID", style="medium_purple3")
|
||||
table.add_column("Session ID", style="deep_sky_blue3")
|
||||
table.add_column("User ID", style="plum3")
|
||||
table.add_column("State", style="dim")
|
||||
|
||||
for context_id, session in sessions.items():
|
||||
# Get session info
|
||||
session_id = getattr(session, 'id', 'N/A')
|
||||
user_id = getattr(session, 'user_id', 'N/A')
|
||||
state = getattr(session, 'state', {})
|
||||
|
||||
# Format state info
|
||||
agents_count = len(state.get('registered_agents', []))
|
||||
state_info = f"{agents_count} agents registered"
|
||||
|
||||
table.add_row(
|
||||
context_id[:20] + "..." if len(context_id) > 20 else context_id,
|
||||
session_id[:20] + "..." if len(str(session_id)) > 20 else str(session_id),
|
||||
user_id,
|
||||
state_info
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
console.print(f"\n[dim]Current session: {self.context_id}[/dim]")
|
||||
|
||||
async def cmd_skills(self, args: str = "") -> None:
|
||||
"""Show FuzzForge skills"""
|
||||
card = self.agent.agent_card
|
||||
|
||||
table = Table(title=f"{card.name} Skills", box=box.ROUNDED)
|
||||
table.add_column("Skill", style="medium_purple3")
|
||||
table.add_column("Description", style="white")
|
||||
table.add_column("Tags", style="deep_sky_blue3")
|
||||
|
||||
for skill in card.skills:
|
||||
table.add_row(
|
||||
skill.name,
|
||||
skill.description,
|
||||
", ".join(skill.tags[:3])
|
||||
)
|
||||
|
||||
console.print(table)
|
||||
|
||||
async def cmd_clear(self, args: str = "") -> None:
|
||||
"""Clear screen"""
|
||||
console.clear()
|
||||
self.print_banner()
|
||||
|
||||
async def cmd_sendfile(self, args: str) -> None:
|
||||
"""Encode a local file as an artifact and route it to a registered agent."""
|
||||
tokens = shlex.split(args)
|
||||
if len(tokens) < 2:
|
||||
console.print("Usage: /sendfile <agent_name> <path> [message]")
|
||||
return
|
||||
|
||||
agent_name = tokens[0]
|
||||
file_arg = tokens[1]
|
||||
note = " ".join(tokens[2:]).strip()
|
||||
|
||||
file_path = Path(file_arg).expanduser()
|
||||
if not file_path.exists():
|
||||
console.print(f"[red]File not found:[/red] {file_path}")
|
||||
return
|
||||
|
||||
session = self.agent.executor.sessions.get(self.context_id)
|
||||
if not session:
|
||||
console.print("[red]No active session available. Try sending a prompt first.[/red]")
|
||||
return
|
||||
|
||||
console.print(f"[dim]Delegating {file_path.name} to {agent_name}...[/dim]")
|
||||
|
||||
async def _delegate() -> None:
|
||||
try:
|
||||
response = await self.agent.executor.delegate_file_to_agent(
|
||||
agent_name,
|
||||
str(file_path),
|
||||
note,
|
||||
session=session,
|
||||
context_id=self.context_id,
|
||||
)
|
||||
console.print(f"[{agent_name}]: {response}")
|
||||
except Exception as exc:
|
||||
console.print(f"[red]Failed to delegate file:[/red] {exc}")
|
||||
finally:
|
||||
self.background_tasks.discard(asyncio.current_task())
|
||||
|
||||
task = asyncio.create_task(_delegate())
|
||||
self.background_tasks.add(task)
|
||||
console.print("[dim]Delegation in progress… you can continue working.[/dim]")
|
||||
|
||||
async def cmd_quit(self, args: str = "") -> None:
|
||||
"""Exit the CLI"""
|
||||
console.print("\n[green]Shutting down...[/green]")
|
||||
await self.agent.cleanup()
|
||||
if self.background_tasks:
|
||||
for task in list(self.background_tasks):
|
||||
task.cancel()
|
||||
await asyncio.gather(*self.background_tasks, return_exceptions=True)
|
||||
console.print("Goodbye!\n")
|
||||
sys.exit(0)
|
||||
|
||||
async def process_command(self, text: str) -> bool:
|
||||
"""Process slash commands"""
|
||||
if not text.startswith('/'):
|
||||
return False
|
||||
|
||||
parts = text.split(maxsplit=1)
|
||||
cmd = parts[0].lower()
|
||||
args = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
if cmd in self.commands:
|
||||
await self.commands[cmd](args)
|
||||
return True
|
||||
|
||||
console.print(f"Unknown command: {cmd}")
|
||||
return True
|
||||
|
||||
async def auto_register_agents(self):
|
||||
"""Auto-register agents from config on startup"""
|
||||
agents_to_register = self.config_manager.get_registered_agents()
|
||||
|
||||
if agents_to_register:
|
||||
console.print(f"\n[dim]Auto-registering {len(agents_to_register)} agents from config...[/dim]")
|
||||
|
||||
for agent_config in agents_to_register:
|
||||
url = agent_config.get('url')
|
||||
name = agent_config.get('name', 'Unknown')
|
||||
|
||||
if url:
|
||||
try:
|
||||
with safe_status(f"Registering {name}..."):
|
||||
result = await self.agent.register_agent(url)
|
||||
|
||||
if result["success"]:
|
||||
console.print(f" ✅ {name}: [green]Connected[/green]")
|
||||
else:
|
||||
console.print(f" ⚠️ {name}: [yellow]Failed - {result.get('error', 'Unknown error')}[/yellow]")
|
||||
except Exception as e:
|
||||
console.print(f" ⚠️ {name}: [yellow]Failed - {e}[/yellow]")
|
||||
|
||||
console.print("") # Empty line for spacing
|
||||
|
||||
async def run(self):
|
||||
"""Main CLI loop"""
|
||||
self.print_banner()
|
||||
|
||||
# Auto-register agents from config
|
||||
await self.auto_register_agents()
|
||||
|
||||
while not shutdown_requested:
|
||||
try:
|
||||
# Use standard input with non-deletable colored prompt
|
||||
prompt_symbol = get_prompt_symbol()
|
||||
try:
|
||||
# Print colored prompt then use input() for non-deletable behavior
|
||||
console.print(f"[medium_purple3]{prompt_symbol}[/medium_purple3] ", end="")
|
||||
user_input = input().strip()
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
raise
|
||||
|
||||
if not user_input:
|
||||
continue
|
||||
|
||||
# Check for commands
|
||||
if await self.process_command(user_input):
|
||||
continue
|
||||
|
||||
# Process message
|
||||
with safe_status(get_dynamic_status('thinking')):
|
||||
response = await self.agent.process_message(user_input, self.context_id)
|
||||
|
||||
# Display response
|
||||
console.print(f"\n{response}\n")
|
||||
|
||||
except KeyboardInterrupt:
|
||||
await self.cmd_quit()
|
||||
|
||||
except EOFError:
|
||||
await self.cmd_quit()
|
||||
|
||||
except Exception as e:
|
||||
console.print(f"[red]Error: {e}[/red]")
|
||||
if os.getenv('FUZZFORGE_DEBUG') == '1':
|
||||
console.print_exception()
|
||||
console.print("")
|
||||
|
||||
await self.agent.cleanup()
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point"""
|
||||
try:
|
||||
cli = FuzzForgeCLI()
|
||||
asyncio.run(cli.run())
|
||||
except KeyboardInterrupt:
|
||||
console.print("\n[yellow]Interrupted[/yellow]")
|
||||
sys.exit(0)
|
||||
except Exception as e:
|
||||
console.print(f"[red]Fatal error: {e}[/red]")
|
||||
if os.getenv('FUZZFORGE_DEBUG') == '1':
|
||||
console.print_exception()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
435
ai/src/fuzzforge_ai/cognee_integration.py
Normal file
435
ai/src/fuzzforge_ai/cognee_integration.py
Normal file
@@ -0,0 +1,435 @@
|
||||
"""
|
||||
Cognee Integration Module for FuzzForge
|
||||
Provides standardized access to project-specific knowledge graphs
|
||||
Can be reused by external agents and other components
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import os
|
||||
import asyncio
|
||||
import json
|
||||
from typing import Dict, List, Any, Optional, Union
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class CogneeProjectIntegration:
|
||||
"""
|
||||
Standardized Cognee integration that can be reused across agents
|
||||
Automatically detects project context and provides knowledge graph access
|
||||
"""
|
||||
|
||||
def __init__(self, project_dir: Optional[str] = None):
|
||||
"""
|
||||
Initialize with project directory (defaults to current working directory)
|
||||
|
||||
Args:
|
||||
project_dir: Path to project directory (optional, defaults to cwd)
|
||||
"""
|
||||
self.project_dir = Path(project_dir) if project_dir else Path.cwd()
|
||||
self.config_file = self.project_dir / ".fuzzforge" / "config.yaml"
|
||||
self.project_context = None
|
||||
self._cognee = None
|
||||
self._initialized = False
|
||||
|
||||
async def initialize(self) -> bool:
|
||||
"""
|
||||
Initialize Cognee with project context
|
||||
|
||||
Returns:
|
||||
bool: True if initialization successful
|
||||
"""
|
||||
try:
|
||||
# Import Cognee
|
||||
import cognee
|
||||
self._cognee = cognee
|
||||
|
||||
# Load project context
|
||||
if not self._load_project_context():
|
||||
return False
|
||||
|
||||
# Configure Cognee for this project
|
||||
await self._setup_cognee_config()
|
||||
|
||||
self._initialized = True
|
||||
return True
|
||||
|
||||
except ImportError:
|
||||
print("Cognee not installed. Install with: pip install cognee")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"Failed to initialize Cognee: {e}")
|
||||
return False
|
||||
|
||||
def _load_project_context(self) -> bool:
|
||||
"""Load project context from FuzzForge config"""
|
||||
try:
|
||||
if not self.config_file.exists():
|
||||
print(f"No FuzzForge config found at {self.config_file}")
|
||||
return False
|
||||
|
||||
import yaml
|
||||
with open(self.config_file, 'r') as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
self.project_context = {
|
||||
"project_name": config.get("project", {}).get("name", "default"),
|
||||
"project_id": config.get("project", {}).get("id", "default"),
|
||||
"tenant_id": config.get("cognee", {}).get("tenant", "default")
|
||||
}
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error loading project context: {e}")
|
||||
return False
|
||||
|
||||
async def _setup_cognee_config(self):
|
||||
"""Configure Cognee for project-specific access"""
|
||||
# Set API key and model
|
||||
api_key = os.getenv('OPENAI_API_KEY')
|
||||
model = os.getenv('LITELLM_MODEL', 'gpt-4o-mini')
|
||||
|
||||
if not api_key:
|
||||
raise ValueError("OPENAI_API_KEY required for Cognee operations")
|
||||
|
||||
# Configure Cognee
|
||||
self._cognee.config.set_llm_api_key(api_key)
|
||||
self._cognee.config.set_llm_model(model)
|
||||
self._cognee.config.set_llm_provider("openai")
|
||||
|
||||
# Set project-specific directories
|
||||
project_cognee_dir = self.project_dir / ".fuzzforge" / "cognee" / f"project_{self.project_context['project_id']}"
|
||||
|
||||
self._cognee.config.data_root_directory(str(project_cognee_dir / "data"))
|
||||
self._cognee.config.system_root_directory(str(project_cognee_dir / "system"))
|
||||
|
||||
# Ensure directories exist
|
||||
project_cognee_dir.mkdir(parents=True, exist_ok=True)
|
||||
(project_cognee_dir / "data").mkdir(exist_ok=True)
|
||||
(project_cognee_dir / "system").mkdir(exist_ok=True)
|
||||
|
||||
async def search_knowledge_graph(self, query: str, search_type: str = "GRAPH_COMPLETION", dataset: str = None) -> Dict[str, Any]:
|
||||
"""
|
||||
Search the project's knowledge graph
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
search_type: Type of search ("GRAPH_COMPLETION", "INSIGHTS", "CHUNKS", etc.)
|
||||
dataset: Specific dataset to search (optional)
|
||||
|
||||
Returns:
|
||||
Dict containing search results
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not self._initialized:
|
||||
return {"error": "Cognee not initialized"}
|
||||
|
||||
try:
|
||||
from cognee.modules.search.types import SearchType
|
||||
|
||||
# Resolve search type dynamically; fallback to GRAPH_COMPLETION
|
||||
try:
|
||||
search_type_enum = getattr(SearchType, search_type.upper())
|
||||
except AttributeError:
|
||||
search_type_enum = SearchType.GRAPH_COMPLETION
|
||||
search_type = "GRAPH_COMPLETION"
|
||||
|
||||
# Prepare search kwargs
|
||||
search_kwargs = {
|
||||
"query_type": search_type_enum,
|
||||
"query_text": query
|
||||
}
|
||||
|
||||
# Add dataset filter if specified
|
||||
if dataset:
|
||||
search_kwargs["datasets"] = [dataset]
|
||||
|
||||
results = await self._cognee.search(**search_kwargs)
|
||||
|
||||
return {
|
||||
"query": query,
|
||||
"search_type": search_type,
|
||||
"dataset": dataset,
|
||||
"results": results,
|
||||
"project": self.project_context["project_name"]
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Search failed: {e}"}
|
||||
|
||||
async def list_knowledge_data(self) -> Dict[str, Any]:
|
||||
"""
|
||||
List available data in the knowledge graph
|
||||
|
||||
Returns:
|
||||
Dict containing available data
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not self._initialized:
|
||||
return {"error": "Cognee not initialized"}
|
||||
|
||||
try:
|
||||
data = await self._cognee.list_data()
|
||||
return {
|
||||
"project": self.project_context["project_name"],
|
||||
"available_data": data
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to list data: {e}"}
|
||||
|
||||
async def ingest_text_to_dataset(self, text: str, dataset: str = None) -> Dict[str, Any]:
|
||||
"""
|
||||
Ingest text content into a specific dataset
|
||||
|
||||
Args:
|
||||
text: Text to ingest
|
||||
dataset: Dataset name (defaults to project_name_codebase)
|
||||
|
||||
Returns:
|
||||
Dict containing ingest results
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not self._initialized:
|
||||
return {"error": "Cognee not initialized"}
|
||||
|
||||
if not dataset:
|
||||
dataset = f"{self.project_context['project_name']}_codebase"
|
||||
|
||||
try:
|
||||
# Add text to dataset
|
||||
await self._cognee.add([text], dataset_name=dataset)
|
||||
|
||||
# Process (cognify) the dataset
|
||||
await self._cognee.cognify([dataset])
|
||||
|
||||
return {
|
||||
"text_length": len(text),
|
||||
"dataset": dataset,
|
||||
"project": self.project_context["project_name"],
|
||||
"status": "success"
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Ingest failed: {e}"}
|
||||
|
||||
async def ingest_files_to_dataset(self, file_paths: list, dataset: str = None) -> Dict[str, Any]:
|
||||
"""
|
||||
Ingest multiple files into a specific dataset
|
||||
|
||||
Args:
|
||||
file_paths: List of file paths to ingest
|
||||
dataset: Dataset name (defaults to project_name_codebase)
|
||||
|
||||
Returns:
|
||||
Dict containing ingest results
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not self._initialized:
|
||||
return {"error": "Cognee not initialized"}
|
||||
|
||||
if not dataset:
|
||||
dataset = f"{self.project_context['project_name']}_codebase"
|
||||
|
||||
try:
|
||||
# Validate and filter readable files
|
||||
valid_files = []
|
||||
for file_path in file_paths:
|
||||
try:
|
||||
path = Path(file_path)
|
||||
if path.exists() and path.is_file():
|
||||
# Test if file is readable
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
f.read(1)
|
||||
valid_files.append(str(path))
|
||||
except (UnicodeDecodeError, PermissionError, OSError):
|
||||
continue
|
||||
|
||||
if not valid_files:
|
||||
return {"error": "No valid files found to ingest"}
|
||||
|
||||
# Add files to dataset
|
||||
await self._cognee.add(valid_files, dataset_name=dataset)
|
||||
|
||||
# Process (cognify) the dataset
|
||||
await self._cognee.cognify([dataset])
|
||||
|
||||
return {
|
||||
"files_processed": len(valid_files),
|
||||
"total_files_requested": len(file_paths),
|
||||
"dataset": dataset,
|
||||
"project": self.project_context["project_name"],
|
||||
"status": "success"
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Ingest failed: {e}"}
|
||||
|
||||
async def list_datasets(self) -> Dict[str, Any]:
|
||||
"""
|
||||
List all datasets available in the project
|
||||
|
||||
Returns:
|
||||
Dict containing available datasets
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not self._initialized:
|
||||
return {"error": "Cognee not initialized"}
|
||||
|
||||
try:
|
||||
# Get available datasets by searching for data
|
||||
data = await self._cognee.list_data()
|
||||
|
||||
# Extract unique dataset names from the data
|
||||
datasets = set()
|
||||
if isinstance(data, list):
|
||||
for item in data:
|
||||
if isinstance(item, dict) and 'dataset_name' in item:
|
||||
datasets.add(item['dataset_name'])
|
||||
|
||||
return {
|
||||
"project": self.project_context["project_name"],
|
||||
"datasets": list(datasets),
|
||||
"total_datasets": len(datasets)
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to list datasets: {e}"}
|
||||
|
||||
async def create_dataset(self, dataset: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Create a new dataset (dataset is created automatically when data is added)
|
||||
|
||||
Args:
|
||||
dataset: Dataset name to create
|
||||
|
||||
Returns:
|
||||
Dict containing creation result
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not self._initialized:
|
||||
return {"error": "Cognee not initialized"}
|
||||
|
||||
try:
|
||||
# In Cognee, datasets are created implicitly when data is added
|
||||
# We'll add empty content to create the dataset
|
||||
await self._cognee.add([f"Dataset {dataset} initialized for project {self.project_context['project_name']}"],
|
||||
dataset_name=dataset)
|
||||
|
||||
return {
|
||||
"dataset": dataset,
|
||||
"project": self.project_context["project_name"],
|
||||
"status": "created"
|
||||
}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to create dataset: {e}"}
|
||||
|
||||
def get_project_context(self) -> Optional[Dict[str, str]]:
|
||||
"""Get current project context"""
|
||||
return self.project_context
|
||||
|
||||
def is_initialized(self) -> bool:
|
||||
"""Check if Cognee is initialized"""
|
||||
return self._initialized
|
||||
|
||||
|
||||
# Convenience functions for easy integration
|
||||
async def search_project_codebase(query: str, project_dir: Optional[str] = None, dataset: str = None, search_type: str = "GRAPH_COMPLETION") -> str:
|
||||
"""
|
||||
Convenience function to search project codebase
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
project_dir: Project directory (optional, defaults to cwd)
|
||||
dataset: Specific dataset to search (optional)
|
||||
search_type: Type of search ("GRAPH_COMPLETION", "INSIGHTS", "CHUNKS")
|
||||
|
||||
Returns:
|
||||
Formatted search results as string
|
||||
"""
|
||||
cognee_integration = CogneeProjectIntegration(project_dir)
|
||||
result = await cognee_integration.search_knowledge_graph(query, search_type, dataset)
|
||||
|
||||
if "error" in result:
|
||||
return f"Error searching codebase: {result['error']}"
|
||||
|
||||
project_name = result.get("project", "Unknown")
|
||||
results = result.get("results", [])
|
||||
|
||||
if not results:
|
||||
return f"No results found for '{query}' in project {project_name}"
|
||||
|
||||
output = f"Search results for '{query}' in project {project_name}:\n\n"
|
||||
|
||||
# Format results
|
||||
if isinstance(results, list):
|
||||
for i, item in enumerate(results, 1):
|
||||
if isinstance(item, dict):
|
||||
# Handle structured results
|
||||
output += f"{i}. "
|
||||
if "search_result" in item:
|
||||
output += f"Dataset: {item.get('dataset_name', 'Unknown')}\n"
|
||||
for result_item in item["search_result"]:
|
||||
if isinstance(result_item, dict):
|
||||
if "name" in result_item:
|
||||
output += f" - {result_item['name']}: {result_item.get('description', '')}\n"
|
||||
elif "text" in result_item:
|
||||
text = result_item["text"][:200] + "..." if len(result_item["text"]) > 200 else result_item["text"]
|
||||
output += f" - {text}\n"
|
||||
else:
|
||||
output += f" - {str(result_item)[:200]}...\n"
|
||||
else:
|
||||
output += f"{str(item)[:200]}...\n"
|
||||
output += "\n"
|
||||
else:
|
||||
output += f"{i}. {str(item)[:200]}...\n\n"
|
||||
else:
|
||||
output += f"{str(results)[:500]}..."
|
||||
|
||||
return output
|
||||
|
||||
|
||||
async def list_project_knowledge(project_dir: Optional[str] = None) -> str:
|
||||
"""
|
||||
Convenience function to list project knowledge
|
||||
|
||||
Args:
|
||||
project_dir: Project directory (optional, defaults to cwd)
|
||||
|
||||
Returns:
|
||||
Formatted list of available data
|
||||
"""
|
||||
cognee_integration = CogneeProjectIntegration(project_dir)
|
||||
result = await cognee_integration.list_knowledge_data()
|
||||
|
||||
if "error" in result:
|
||||
return f"Error listing knowledge: {result['error']}"
|
||||
|
||||
project_name = result.get("project", "Unknown")
|
||||
data = result.get("available_data", [])
|
||||
|
||||
output = f"Available knowledge in project {project_name}:\n\n"
|
||||
|
||||
if not data:
|
||||
output += "No data available in knowledge graph"
|
||||
else:
|
||||
for i, item in enumerate(data, 1):
|
||||
output += f"{i}. {item}\n"
|
||||
|
||||
return output
|
||||
416
ai/src/fuzzforge_ai/cognee_service.py
Normal file
416
ai/src/fuzzforge_ai/cognee_service.py
Normal file
@@ -0,0 +1,416 @@
|
||||
"""
|
||||
Cognee Service for FuzzForge
|
||||
Provides integrated Cognee functionality for codebase analysis and knowledge graphs
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import os
|
||||
import asyncio
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional
|
||||
from datetime import datetime
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class CogneeService:
|
||||
"""
|
||||
Service for managing Cognee integration with FuzzForge
|
||||
Handles multi-tenant isolation and project-specific knowledge graphs
|
||||
"""
|
||||
|
||||
def __init__(self, config):
|
||||
"""Initialize with FuzzForge config"""
|
||||
self.config = config
|
||||
self.cognee_config = config.get_cognee_config()
|
||||
self.project_context = config.get_project_context()
|
||||
self._cognee = None
|
||||
self._user = None
|
||||
self._initialized = False
|
||||
|
||||
async def initialize(self):
|
||||
"""Initialize Cognee with project-specific configuration"""
|
||||
try:
|
||||
# Ensure environment variables for Cognee are set before import
|
||||
self.config.setup_cognee_environment()
|
||||
logger.debug(
|
||||
"Cognee environment configured",
|
||||
extra={
|
||||
"data": self.cognee_config.get("data_directory"),
|
||||
"system": self.cognee_config.get("system_directory"),
|
||||
},
|
||||
)
|
||||
|
||||
import cognee
|
||||
self._cognee = cognee
|
||||
|
||||
# Configure LLM with API key BEFORE any other cognee operations
|
||||
provider = os.getenv("LLM_PROVIDER", "openai")
|
||||
model = os.getenv("LLM_MODEL") or os.getenv("LITELLM_MODEL", "gpt-4o-mini")
|
||||
api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
|
||||
endpoint = os.getenv("LLM_ENDPOINT")
|
||||
api_version = os.getenv("LLM_API_VERSION")
|
||||
max_tokens = os.getenv("LLM_MAX_TOKENS")
|
||||
|
||||
if provider.lower() in {"openai", "azure_openai", "custom"} and not api_key:
|
||||
raise ValueError(
|
||||
"OpenAI-compatible API key is required for Cognee LLM operations. "
|
||||
"Set OPENAI_API_KEY, LLM_API_KEY, or COGNEE_LLM_API_KEY in your .env"
|
||||
)
|
||||
|
||||
# Expose environment variables for downstream libraries
|
||||
os.environ["LLM_PROVIDER"] = provider
|
||||
os.environ["LITELLM_MODEL"] = model
|
||||
os.environ["LLM_MODEL"] = model
|
||||
if api_key:
|
||||
os.environ["LLM_API_KEY"] = api_key
|
||||
# Maintain compatibility with components still expecting OPENAI_API_KEY
|
||||
if provider.lower() in {"openai", "azure_openai", "custom"}:
|
||||
os.environ.setdefault("OPENAI_API_KEY", api_key)
|
||||
if endpoint:
|
||||
os.environ["LLM_ENDPOINT"] = endpoint
|
||||
if api_version:
|
||||
os.environ["LLM_API_VERSION"] = api_version
|
||||
if max_tokens:
|
||||
os.environ["LLM_MAX_TOKENS"] = str(max_tokens)
|
||||
|
||||
# Configure Cognee's runtime using its configuration helpers when available
|
||||
if hasattr(cognee.config, "set_llm_provider"):
|
||||
cognee.config.set_llm_provider(provider)
|
||||
if hasattr(cognee.config, "set_llm_model"):
|
||||
cognee.config.set_llm_model(model)
|
||||
if api_key and hasattr(cognee.config, "set_llm_api_key"):
|
||||
cognee.config.set_llm_api_key(api_key)
|
||||
if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
|
||||
cognee.config.set_llm_endpoint(endpoint)
|
||||
if api_version and hasattr(cognee.config, "set_llm_api_version"):
|
||||
cognee.config.set_llm_api_version(api_version)
|
||||
if max_tokens and hasattr(cognee.config, "set_llm_max_tokens"):
|
||||
cognee.config.set_llm_max_tokens(int(max_tokens))
|
||||
|
||||
# Configure graph database
|
||||
cognee.config.set_graph_db_config({
|
||||
"graph_database_provider": self.cognee_config.get("graph_database_provider", "kuzu"),
|
||||
})
|
||||
|
||||
# Set data directories
|
||||
data_dir = self.cognee_config.get("data_directory")
|
||||
system_dir = self.cognee_config.get("system_directory")
|
||||
|
||||
if data_dir:
|
||||
logger.debug("Setting cognee data root", extra={"path": data_dir})
|
||||
cognee.config.data_root_directory(data_dir)
|
||||
if system_dir:
|
||||
logger.debug("Setting cognee system root", extra={"path": system_dir})
|
||||
cognee.config.system_root_directory(system_dir)
|
||||
|
||||
# Setup multi-tenant user context
|
||||
await self._setup_user_context()
|
||||
|
||||
self._initialized = True
|
||||
logger.info(f"Cognee initialized for project {self.project_context['project_name']} "
|
||||
f"with Kuzu at {system_dir}")
|
||||
|
||||
except ImportError:
|
||||
logger.error("Cognee not installed. Install with: pip install cognee")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize Cognee: {e}")
|
||||
raise
|
||||
|
||||
async def create_dataset(self):
|
||||
"""Create dataset for this project if it doesn't exist"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
try:
|
||||
# Dataset creation is handled automatically by Cognee when adding files
|
||||
# We just ensure we have the right context set up
|
||||
dataset_name = f"{self.project_context['project_name']}_codebase"
|
||||
logger.info(f"Dataset {dataset_name} ready for project {self.project_context['project_name']}")
|
||||
return dataset_name
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create dataset: {e}")
|
||||
raise
|
||||
|
||||
async def _setup_user_context(self):
|
||||
"""Setup user context for multi-tenant isolation"""
|
||||
try:
|
||||
from cognee.modules.users.methods import create_user, get_user
|
||||
|
||||
# Always try fallback email first to avoid validation issues
|
||||
fallback_email = f"project_{self.project_context['project_id']}@fuzzforge.example"
|
||||
user_tenant = self.project_context['tenant_id']
|
||||
|
||||
# Try to get existing fallback user first
|
||||
try:
|
||||
self._user = await get_user(fallback_email)
|
||||
logger.info(f"Using existing user: {fallback_email}")
|
||||
return
|
||||
except:
|
||||
# User doesn't exist, try to create fallback
|
||||
pass
|
||||
|
||||
# Create fallback user
|
||||
try:
|
||||
self._user = await create_user(fallback_email, user_tenant)
|
||||
logger.info(f"Created fallback user: {fallback_email} for tenant: {user_tenant}")
|
||||
return
|
||||
except Exception as fallback_error:
|
||||
logger.warning(f"Fallback user creation failed: {fallback_error}")
|
||||
self._user = None
|
||||
return
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not setup multi-tenant user context: {e}")
|
||||
logger.info("Proceeding with default context")
|
||||
self._user = None
|
||||
|
||||
def get_project_dataset_name(self, dataset_suffix: str = "codebase") -> str:
|
||||
"""Get project-specific dataset name"""
|
||||
return f"{self.project_context['project_name']}_{dataset_suffix}"
|
||||
|
||||
async def ingest_text(self, content: str, dataset: str = "fuzzforge") -> bool:
|
||||
"""Ingest text content into knowledge graph"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
try:
|
||||
await self._cognee.add([content], dataset)
|
||||
await self._cognee.cognify([dataset])
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to ingest text: {e}")
|
||||
return False
|
||||
|
||||
async def ingest_files(self, file_paths: List[Path], dataset: str = "fuzzforge") -> Dict[str, Any]:
|
||||
"""Ingest multiple files into knowledge graph"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
results = {
|
||||
"success": 0,
|
||||
"failed": 0,
|
||||
"errors": []
|
||||
}
|
||||
|
||||
try:
|
||||
ingest_paths: List[str] = []
|
||||
for file_path in file_paths:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8'):
|
||||
ingest_paths.append(str(file_path))
|
||||
results["success"] += 1
|
||||
except (UnicodeDecodeError, PermissionError) as exc:
|
||||
results["failed"] += 1
|
||||
results["errors"].append(f"{file_path}: {exc}")
|
||||
logger.warning("Skipping %s: %s", file_path, exc)
|
||||
|
||||
if ingest_paths:
|
||||
await self._cognee.add(ingest_paths, dataset_name=dataset)
|
||||
await self._cognee.cognify([dataset])
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to ingest files: {e}")
|
||||
results["errors"].append(f"Cognify error: {str(e)}")
|
||||
|
||||
return results
|
||||
|
||||
async def search_insights(self, query: str, dataset: str = None) -> List[str]:
|
||||
"""Search for insights in the knowledge graph"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
try:
|
||||
from cognee.modules.search.types import SearchType
|
||||
|
||||
kwargs = {
|
||||
"query_type": SearchType.INSIGHTS,
|
||||
"query_text": query
|
||||
}
|
||||
|
||||
if dataset:
|
||||
kwargs["datasets"] = [dataset]
|
||||
|
||||
results = await self._cognee.search(**kwargs)
|
||||
return results if isinstance(results, list) else []
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to search insights: {e}")
|
||||
return []
|
||||
|
||||
async def search_chunks(self, query: str, dataset: str = None) -> List[str]:
|
||||
"""Search for relevant text chunks"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
try:
|
||||
from cognee.modules.search.types import SearchType
|
||||
|
||||
kwargs = {
|
||||
"query_type": SearchType.CHUNKS,
|
||||
"query_text": query
|
||||
}
|
||||
|
||||
if dataset:
|
||||
kwargs["datasets"] = [dataset]
|
||||
|
||||
results = await self._cognee.search(**kwargs)
|
||||
return results if isinstance(results, list) else []
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to search chunks: {e}")
|
||||
return []
|
||||
|
||||
async def search_graph_completion(self, query: str) -> List[str]:
|
||||
"""Search for graph completion (relationships)"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
try:
|
||||
from cognee.modules.search.types import SearchType
|
||||
|
||||
results = await self._cognee.search(
|
||||
query_type=SearchType.GRAPH_COMPLETION,
|
||||
query_text=query
|
||||
)
|
||||
return results if isinstance(results, list) else []
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to search graph completion: {e}")
|
||||
return []
|
||||
|
||||
async def get_status(self) -> Dict[str, Any]:
|
||||
"""Get service status and statistics"""
|
||||
status = {
|
||||
"initialized": self._initialized,
|
||||
"enabled": self.cognee_config.get("enabled", True),
|
||||
"provider": self.cognee_config.get("graph_database_provider", "kuzu"),
|
||||
"data_directory": self.cognee_config.get("data_directory"),
|
||||
"system_directory": self.cognee_config.get("system_directory"),
|
||||
}
|
||||
|
||||
if self._initialized:
|
||||
try:
|
||||
# Check if directories exist and get sizes
|
||||
data_dir = Path(status["data_directory"])
|
||||
system_dir = Path(status["system_directory"])
|
||||
|
||||
status.update({
|
||||
"data_dir_exists": data_dir.exists(),
|
||||
"system_dir_exists": system_dir.exists(),
|
||||
"kuzu_db_exists": (system_dir / "kuzu_db").exists(),
|
||||
"lancedb_exists": (system_dir / "lancedb").exists(),
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
status["status_error"] = str(e)
|
||||
|
||||
return status
|
||||
|
||||
async def clear_data(self, confirm: bool = False):
|
||||
"""Clear all ingested data (dangerous!)"""
|
||||
if not confirm:
|
||||
raise ValueError("Must confirm data clearing with confirm=True")
|
||||
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
try:
|
||||
await self._cognee.prune.prune_data()
|
||||
await self._cognee.prune.prune_system(metadata=True)
|
||||
logger.info("Cognee data cleared")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to clear data: {e}")
|
||||
raise
|
||||
|
||||
|
||||
class FuzzForgeCogneeIntegration:
|
||||
"""
|
||||
Main integration class for FuzzForge + Cognee
|
||||
Provides high-level operations for security analysis
|
||||
"""
|
||||
|
||||
def __init__(self, config):
|
||||
self.service = CogneeService(config)
|
||||
|
||||
async def analyze_codebase(self, path: Path, recursive: bool = True) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyze a codebase and extract security-relevant insights
|
||||
"""
|
||||
# Collect code files
|
||||
from fuzzforge_ai.ingest_utils import collect_ingest_files
|
||||
|
||||
files = collect_ingest_files(path, recursive, None, [])
|
||||
|
||||
if not files:
|
||||
return {"error": "No files found to analyze"}
|
||||
|
||||
# Ingest files
|
||||
results = await self.service.ingest_files(files, "security_analysis")
|
||||
|
||||
if results["success"] == 0:
|
||||
return {"error": "Failed to ingest any files", "details": results}
|
||||
|
||||
# Extract security insights
|
||||
security_queries = [
|
||||
"vulnerabilities security risks",
|
||||
"authentication authorization",
|
||||
"input validation sanitization",
|
||||
"encryption cryptography",
|
||||
"error handling exceptions",
|
||||
"logging sensitive data"
|
||||
]
|
||||
|
||||
insights = {}
|
||||
for query in security_queries:
|
||||
insight_results = await self.service.search_insights(query, "security_analysis")
|
||||
if insight_results:
|
||||
insights[query.replace(" ", "_")] = insight_results
|
||||
|
||||
return {
|
||||
"files_processed": results["success"],
|
||||
"files_failed": results["failed"],
|
||||
"errors": results["errors"],
|
||||
"security_insights": insights
|
||||
}
|
||||
|
||||
async def query_codebase(self, query: str, search_type: str = "insights") -> List[str]:
|
||||
"""Query the ingested codebase"""
|
||||
if search_type == "insights":
|
||||
return await self.service.search_insights(query)
|
||||
elif search_type == "chunks":
|
||||
return await self.service.search_chunks(query)
|
||||
elif search_type == "graph":
|
||||
return await self.service.search_graph_completion(query)
|
||||
else:
|
||||
raise ValueError(f"Unknown search type: {search_type}")
|
||||
|
||||
async def get_project_summary(self) -> Dict[str, Any]:
|
||||
"""Get a summary of the analyzed project"""
|
||||
# Search for general project insights
|
||||
summary_queries = [
|
||||
"project structure components",
|
||||
"main functionality features",
|
||||
"programming languages frameworks",
|
||||
"dependencies libraries"
|
||||
]
|
||||
|
||||
summary = {}
|
||||
for query in summary_queries:
|
||||
results = await self.service.search_insights(query)
|
||||
if results:
|
||||
summary[query.replace(" ", "_")] = results[:3] # Top 3 results
|
||||
|
||||
return summary
|
||||
9
ai/src/fuzzforge_ai/config.yaml
Normal file
9
ai/src/fuzzforge_ai/config.yaml
Normal file
@@ -0,0 +1,9 @@
|
||||
# FuzzForge Registered Agents
|
||||
# These agents will be automatically registered on startup
|
||||
|
||||
registered_agents:
|
||||
|
||||
# Example entries:
|
||||
# - name: Calculator
|
||||
# url: http://localhost:10201
|
||||
# description: Mathematical calculations agent
|
||||
31
ai/src/fuzzforge_ai/config_bridge.py
Normal file
31
ai/src/fuzzforge_ai/config_bridge.py
Normal file
@@ -0,0 +1,31 @@
|
||||
"""Bridge module providing access to the host CLI configuration manager."""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
try:
|
||||
from fuzzforge_cli.config import ProjectConfigManager as _ProjectConfigManager
|
||||
except ImportError as exc: # pragma: no cover - used when CLI not available
|
||||
class _ProjectConfigManager: # type: ignore[no-redef]
|
||||
"""Fallback implementation that raises a helpful error."""
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
raise ImportError(
|
||||
"ProjectConfigManager is unavailable. Install the FuzzForge CLI "
|
||||
"package or supply a compatible configuration object."
|
||||
) from exc
|
||||
|
||||
def __getattr__(name): # pragma: no cover - defensive
|
||||
raise ImportError("ProjectConfigManager unavailable") from exc
|
||||
|
||||
ProjectConfigManager = _ProjectConfigManager
|
||||
|
||||
__all__ = ["ProjectConfigManager"]
|
||||
134
ai/src/fuzzforge_ai/config_manager.py
Normal file
134
ai/src/fuzzforge_ai/config_manager.py
Normal file
@@ -0,0 +1,134 @@
|
||||
"""
|
||||
Configuration manager for FuzzForge
|
||||
Handles loading and saving registered agents
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import os
|
||||
import yaml
|
||||
from typing import Dict, Any, List
|
||||
|
||||
class ConfigManager:
|
||||
"""Manages FuzzForge agent registry configuration"""
|
||||
|
||||
def __init__(self, config_path: str = None):
|
||||
"""Initialize config manager"""
|
||||
if config_path:
|
||||
self.config_path = config_path
|
||||
else:
|
||||
# Check for local .fuzzforge/agents.yaml first, then fall back to global
|
||||
local_config = os.path.join(os.getcwd(), '.fuzzforge', 'agents.yaml')
|
||||
global_config = os.path.join(os.path.dirname(__file__), 'config.yaml')
|
||||
|
||||
if os.path.exists(local_config):
|
||||
self.config_path = local_config
|
||||
if os.getenv("FUZZFORGE_DEBUG", "0") == "1":
|
||||
print(f"[CONFIG] Using local config: {local_config}")
|
||||
else:
|
||||
self.config_path = global_config
|
||||
if os.getenv("FUZZFORGE_DEBUG", "0") == "1":
|
||||
print(f"[CONFIG] Using global config: {global_config}")
|
||||
|
||||
self.config = self.load_config()
|
||||
|
||||
def load_config(self) -> Dict[str, Any]:
|
||||
"""Load configuration from YAML file"""
|
||||
if not os.path.exists(self.config_path):
|
||||
# Create default config if it doesn't exist
|
||||
return {'registered_agents': []}
|
||||
|
||||
try:
|
||||
with open(self.config_path, 'r') as f:
|
||||
config = yaml.safe_load(f) or {}
|
||||
# Ensure registered_agents is a list
|
||||
if 'registered_agents' not in config or config['registered_agents'] is None:
|
||||
config['registered_agents'] = []
|
||||
return config
|
||||
except Exception as e:
|
||||
print(f"[WARNING] Failed to load config: {e}")
|
||||
return {'registered_agents': []}
|
||||
|
||||
def save_config(self):
|
||||
"""Save current configuration to file"""
|
||||
try:
|
||||
# Create a clean config with comments
|
||||
config_content = """# FuzzForge Registered Agents
|
||||
# These agents will be automatically registered on startup
|
||||
|
||||
"""
|
||||
# Add the agents list
|
||||
if self.config.get('registered_agents'):
|
||||
config_content += yaml.dump({'registered_agents': self.config['registered_agents']},
|
||||
default_flow_style=False, sort_keys=False)
|
||||
else:
|
||||
config_content += "registered_agents: []\n"
|
||||
|
||||
config_content += """
|
||||
# Example entries:
|
||||
# - name: Calculator
|
||||
# url: http://localhost:10201
|
||||
# description: Mathematical calculations agent
|
||||
"""
|
||||
|
||||
with open(self.config_path, 'w') as f:
|
||||
f.write(config_content)
|
||||
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"[ERROR] Failed to save config: {e}")
|
||||
return False
|
||||
|
||||
def get_registered_agents(self) -> List[Dict[str, Any]]:
|
||||
"""Get list of registered agents from config"""
|
||||
return self.config.get('registered_agents', [])
|
||||
|
||||
def add_registered_agent(self, name: str, url: str, description: str = "") -> bool:
|
||||
"""Add a new registered agent to config"""
|
||||
if 'registered_agents' not in self.config:
|
||||
self.config['registered_agents'] = []
|
||||
|
||||
# Check if agent already exists
|
||||
for agent in self.config['registered_agents']:
|
||||
if agent.get('url') == url:
|
||||
# Update existing agent
|
||||
agent['name'] = name
|
||||
agent['description'] = description
|
||||
return self.save_config()
|
||||
|
||||
# Add new agent
|
||||
self.config['registered_agents'].append({
|
||||
'name': name,
|
||||
'url': url,
|
||||
'description': description
|
||||
})
|
||||
|
||||
return self.save_config()
|
||||
|
||||
def remove_registered_agent(self, name: str = None, url: str = None) -> bool:
|
||||
"""Remove a registered agent from config"""
|
||||
if 'registered_agents' not in self.config:
|
||||
return False
|
||||
|
||||
original_count = len(self.config['registered_agents'])
|
||||
|
||||
# Filter out the agent
|
||||
self.config['registered_agents'] = [
|
||||
agent for agent in self.config['registered_agents']
|
||||
if not ((name and agent.get('name') == name) or
|
||||
(url and agent.get('url') == url))
|
||||
]
|
||||
|
||||
if len(self.config['registered_agents']) < original_count:
|
||||
return self.save_config()
|
||||
|
||||
return False
|
||||
104
ai/src/fuzzforge_ai/ingest_utils.py
Normal file
104
ai/src/fuzzforge_ai/ingest_utils.py
Normal file
@@ -0,0 +1,104 @@
|
||||
"""Utilities for collecting files to ingest into Cognee."""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import fnmatch
|
||||
from pathlib import Path
|
||||
from typing import Iterable, List, Optional
|
||||
|
||||
_DEFAULT_FILE_TYPES = [
|
||||
".py",
|
||||
".js",
|
||||
".ts",
|
||||
".java",
|
||||
".cpp",
|
||||
".c",
|
||||
".h",
|
||||
".rs",
|
||||
".go",
|
||||
".rb",
|
||||
".php",
|
||||
".cs",
|
||||
".swift",
|
||||
".kt",
|
||||
".scala",
|
||||
".clj",
|
||||
".hs",
|
||||
".md",
|
||||
".txt",
|
||||
".yaml",
|
||||
".yml",
|
||||
".json",
|
||||
".toml",
|
||||
".cfg",
|
||||
".ini",
|
||||
]
|
||||
|
||||
_DEFAULT_EXCLUDE = [
|
||||
"*.pyc",
|
||||
"__pycache__",
|
||||
".git",
|
||||
".svn",
|
||||
".hg",
|
||||
"node_modules",
|
||||
".venv",
|
||||
"venv",
|
||||
".env",
|
||||
"dist",
|
||||
"build",
|
||||
".pytest_cache",
|
||||
".mypy_cache",
|
||||
".tox",
|
||||
"coverage",
|
||||
"*.log",
|
||||
"*.tmp",
|
||||
]
|
||||
|
||||
|
||||
def collect_ingest_files(
|
||||
path: Path,
|
||||
recursive: bool = True,
|
||||
file_types: Optional[Iterable[str]] = None,
|
||||
exclude: Optional[Iterable[str]] = None,
|
||||
) -> List[Path]:
|
||||
"""Return a list of files eligible for ingestion."""
|
||||
path = path.resolve()
|
||||
files: List[Path] = []
|
||||
|
||||
extensions = list(file_types) if file_types else list(_DEFAULT_FILE_TYPES)
|
||||
exclusions = list(exclude) if exclude else []
|
||||
exclusions.extend(_DEFAULT_EXCLUDE)
|
||||
|
||||
def should_exclude(file_path: Path) -> bool:
|
||||
file_str = str(file_path)
|
||||
for pattern in exclusions:
|
||||
if fnmatch.fnmatch(file_str, f"*{pattern}*") or fnmatch.fnmatch(file_path.name, pattern):
|
||||
return True
|
||||
return False
|
||||
|
||||
if path.is_file():
|
||||
if not should_exclude(path) and any(str(path).endswith(ext) for ext in extensions):
|
||||
files.append(path)
|
||||
return files
|
||||
|
||||
pattern = "**/*" if recursive else "*"
|
||||
for file_path in path.glob(pattern):
|
||||
if file_path.is_file() and not should_exclude(file_path):
|
||||
if any(str(file_path).endswith(ext) for ext in extensions):
|
||||
files.append(file_path)
|
||||
|
||||
return files
|
||||
|
||||
|
||||
__all__ = ["collect_ingest_files"]
|
||||
247
ai/src/fuzzforge_ai/memory_service.py
Normal file
247
ai/src/fuzzforge_ai/memory_service.py
Normal file
@@ -0,0 +1,247 @@
|
||||
"""
|
||||
FuzzForge Memory Service
|
||||
Implements ADK MemoryService pattern for conversational memory
|
||||
Separate from Cognee which will be used for RAG/codebase analysis
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import os
|
||||
import json
|
||||
from typing import Dict, List, Any, Optional
|
||||
from datetime import datetime
|
||||
import logging
|
||||
|
||||
# ADK Memory imports
|
||||
from google.adk.memory import InMemoryMemoryService, BaseMemoryService
|
||||
from google.adk.memory.base_memory_service import SearchMemoryResponse
|
||||
from google.adk.memory.memory_entry import MemoryEntry
|
||||
|
||||
# Optional VertexAI Memory Bank
|
||||
try:
|
||||
from google.adk.memory import VertexAiMemoryBankService
|
||||
VERTEX_AVAILABLE = True
|
||||
except ImportError:
|
||||
VERTEX_AVAILABLE = False
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class FuzzForgeMemoryService:
|
||||
"""
|
||||
Manages conversational memory using ADK patterns
|
||||
This is separate from Cognee which will handle RAG/codebase
|
||||
"""
|
||||
|
||||
def __init__(self, memory_type: str = "inmemory", **kwargs):
|
||||
"""
|
||||
Initialize memory service
|
||||
|
||||
Args:
|
||||
memory_type: "inmemory" or "vertexai"
|
||||
**kwargs: Additional args for specific memory service
|
||||
For vertexai: project, location, agent_engine_id
|
||||
"""
|
||||
self.memory_type = memory_type
|
||||
self.service = self._create_service(memory_type, **kwargs)
|
||||
|
||||
def _create_service(self, memory_type: str, **kwargs) -> BaseMemoryService:
|
||||
"""Create the appropriate memory service"""
|
||||
|
||||
if memory_type == "inmemory":
|
||||
# Use ADK's InMemoryMemoryService for local development
|
||||
logger.info("Using InMemory MemoryService for conversational memory")
|
||||
return InMemoryMemoryService()
|
||||
|
||||
elif memory_type == "vertexai" and VERTEX_AVAILABLE:
|
||||
# Use VertexAI Memory Bank for production
|
||||
project = kwargs.get('project') or os.getenv('GOOGLE_CLOUD_PROJECT')
|
||||
location = kwargs.get('location') or os.getenv('GOOGLE_CLOUD_LOCATION', 'us-central1')
|
||||
agent_engine_id = kwargs.get('agent_engine_id') or os.getenv('AGENT_ENGINE_ID')
|
||||
|
||||
if not all([project, location, agent_engine_id]):
|
||||
logger.warning("VertexAI config missing, falling back to InMemory")
|
||||
return InMemoryMemoryService()
|
||||
|
||||
logger.info(f"Using VertexAI MemoryBank: {agent_engine_id}")
|
||||
return VertexAiMemoryBankService(
|
||||
project=project,
|
||||
location=location,
|
||||
agent_engine_id=agent_engine_id
|
||||
)
|
||||
else:
|
||||
# Default to in-memory
|
||||
logger.info("Defaulting to InMemory MemoryService")
|
||||
return InMemoryMemoryService()
|
||||
|
||||
async def add_session_to_memory(self, session: Any) -> None:
|
||||
"""
|
||||
Add a completed session to long-term memory
|
||||
This extracts meaningful information from the conversation
|
||||
|
||||
Args:
|
||||
session: The session object to process
|
||||
"""
|
||||
try:
|
||||
# Let the underlying service handle the ingestion
|
||||
# It will extract relevant information based on the implementation
|
||||
await self.service.add_session_to_memory(session)
|
||||
|
||||
logger.debug(f"Added session {session.id} to {self.memory_type} memory")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to add session to memory: {e}")
|
||||
|
||||
async def search_memory(self,
|
||||
query: str,
|
||||
app_name: str = "fuzzforge",
|
||||
user_id: str = None,
|
||||
max_results: int = 10) -> SearchMemoryResponse:
|
||||
"""
|
||||
Search long-term memory for relevant information
|
||||
|
||||
Args:
|
||||
query: The search query
|
||||
app_name: Application name for filtering
|
||||
user_id: User ID for filtering (optional)
|
||||
max_results: Maximum number of results
|
||||
|
||||
Returns:
|
||||
SearchMemoryResponse with relevant memories
|
||||
"""
|
||||
try:
|
||||
# Search the memory service
|
||||
results = await self.service.search_memory(
|
||||
app_name=app_name,
|
||||
user_id=user_id,
|
||||
query=query
|
||||
)
|
||||
|
||||
logger.debug(f"Memory search for '{query}' returned {len(results.memories)} results")
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Memory search failed: {e}")
|
||||
# Return empty results on error
|
||||
return SearchMemoryResponse(memories=[])
|
||||
|
||||
async def ingest_completed_sessions(self, session_service) -> int:
|
||||
"""
|
||||
Batch ingest all completed sessions into memory
|
||||
Useful for initial memory population
|
||||
|
||||
Args:
|
||||
session_service: The session service containing sessions
|
||||
|
||||
Returns:
|
||||
Number of sessions ingested
|
||||
"""
|
||||
ingested = 0
|
||||
|
||||
try:
|
||||
# Get all sessions from the session service
|
||||
sessions = await session_service.list_sessions(app_name="fuzzforge")
|
||||
|
||||
for session_info in sessions:
|
||||
# Load full session
|
||||
session = await session_service.load_session(
|
||||
app_name="fuzzforge",
|
||||
user_id=session_info.get('user_id'),
|
||||
session_id=session_info.get('id')
|
||||
)
|
||||
|
||||
if session and len(session.get_events()) > 0:
|
||||
await self.add_session_to_memory(session)
|
||||
ingested += 1
|
||||
|
||||
logger.info(f"Ingested {ingested} sessions into {self.memory_type} memory")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to batch ingest sessions: {e}")
|
||||
|
||||
return ingested
|
||||
|
||||
def get_status(self) -> Dict[str, Any]:
|
||||
"""Get memory service status"""
|
||||
return {
|
||||
"type": self.memory_type,
|
||||
"active": self.service is not None,
|
||||
"vertex_available": VERTEX_AVAILABLE,
|
||||
"details": {
|
||||
"inmemory": "Non-persistent, keyword search",
|
||||
"vertexai": "Persistent, semantic search with LLM extraction"
|
||||
}.get(self.memory_type, "Unknown")
|
||||
}
|
||||
|
||||
|
||||
class HybridMemoryManager:
|
||||
"""
|
||||
Manages both ADK MemoryService (conversational) and Cognee (RAG/codebase)
|
||||
Provides unified interface for both memory systems
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
memory_service: FuzzForgeMemoryService = None,
|
||||
cognee_tools = None):
|
||||
"""
|
||||
Initialize with both memory systems
|
||||
|
||||
Args:
|
||||
memory_service: ADK-pattern memory for conversations
|
||||
cognee_tools: Cognee MCP tools for RAG/codebase
|
||||
"""
|
||||
# ADK memory for conversations
|
||||
self.memory_service = memory_service or FuzzForgeMemoryService()
|
||||
|
||||
# Cognee for knowledge graphs and RAG (future)
|
||||
self.cognee_tools = cognee_tools
|
||||
|
||||
async def search_conversational_memory(self, query: str) -> SearchMemoryResponse:
|
||||
"""Search past conversations using ADK memory"""
|
||||
return await self.memory_service.search_memory(query)
|
||||
|
||||
async def search_knowledge_graph(self, query: str, search_type: str = "GRAPH_COMPLETION"):
|
||||
"""Search Cognee knowledge graph (for RAG/codebase in future)"""
|
||||
if not self.cognee_tools:
|
||||
return None
|
||||
|
||||
try:
|
||||
# Use Cognee's graph search
|
||||
return await self.cognee_tools.search(
|
||||
query=query,
|
||||
search_type=search_type
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug(f"Cognee search failed: {e}")
|
||||
return None
|
||||
|
||||
async def store_in_graph(self, content: str):
|
||||
"""Store in Cognee knowledge graph (for codebase analysis later)"""
|
||||
if not self.cognee_tools:
|
||||
return None
|
||||
|
||||
try:
|
||||
# Use cognify to create graph structures
|
||||
return await self.cognee_tools.cognify(content)
|
||||
except Exception as e:
|
||||
logger.debug(f"Cognee store failed: {e}")
|
||||
return None
|
||||
|
||||
def get_status(self) -> Dict[str, Any]:
|
||||
"""Get status of both memory systems"""
|
||||
return {
|
||||
"conversational_memory": self.memory_service.get_status(),
|
||||
"knowledge_graph": {
|
||||
"active": self.cognee_tools is not None,
|
||||
"purpose": "RAG/codebase analysis (future)"
|
||||
}
|
||||
}
|
||||
148
ai/src/fuzzforge_ai/remote_agent.py
Normal file
148
ai/src/fuzzforge_ai/remote_agent.py
Normal file
@@ -0,0 +1,148 @@
|
||||
"""
|
||||
Remote Agent Connection Handler
|
||||
Handles A2A protocol communication with remote agents
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import httpx
|
||||
import uuid
|
||||
from typing import Dict, Any, Optional, List
|
||||
|
||||
|
||||
class RemoteAgentConnection:
|
||||
"""Handles A2A protocol communication with remote agents"""
|
||||
|
||||
def __init__(self, url: str):
|
||||
"""Initialize connection to a remote agent"""
|
||||
self.url = url.rstrip('/')
|
||||
self.agent_card = None
|
||||
self.client = httpx.AsyncClient(timeout=120.0)
|
||||
self.context_id = None
|
||||
|
||||
async def get_agent_card(self) -> Optional[Dict[str, Any]]:
|
||||
"""Get the agent card from the remote agent"""
|
||||
try:
|
||||
# Try new path first (A2A 0.3.0+)
|
||||
response = await self.client.get(f"{self.url}/.well-known/agent-card.json")
|
||||
response.raise_for_status()
|
||||
self.agent_card = response.json()
|
||||
return self.agent_card
|
||||
except:
|
||||
# Try old path for compatibility
|
||||
try:
|
||||
response = await self.client.get(f"{self.url}/.well-known/agent.json")
|
||||
response.raise_for_status()
|
||||
self.agent_card = response.json()
|
||||
return self.agent_card
|
||||
except Exception as e:
|
||||
print(f"Failed to get agent card from {self.url}: {e}")
|
||||
return None
|
||||
|
||||
async def send_message(self, message: str | Dict[str, Any] | List[Dict[str, Any]]) -> str:
|
||||
"""Send a message to the remote agent using A2A protocol"""
|
||||
try:
|
||||
parts: List[Dict[str, Any]]
|
||||
metadata: Dict[str, Any] | None = None
|
||||
if isinstance(message, dict):
|
||||
metadata = message.get("metadata") if isinstance(message.get("metadata"), dict) else None
|
||||
raw_parts = message.get("parts", [])
|
||||
if not raw_parts:
|
||||
text_value = message.get("text") or message.get("message")
|
||||
if isinstance(text_value, str):
|
||||
raw_parts = [{"type": "text", "text": text_value}]
|
||||
parts = [raw_part for raw_part in raw_parts if isinstance(raw_part, dict)]
|
||||
elif isinstance(message, list):
|
||||
parts = [part for part in message if isinstance(part, dict)]
|
||||
metadata = None
|
||||
else:
|
||||
parts = [{"type": "text", "text": message}]
|
||||
metadata = None
|
||||
|
||||
if not parts:
|
||||
parts = [{"type": "text", "text": ""}]
|
||||
|
||||
# Build JSON-RPC request per A2A spec
|
||||
payload = {
|
||||
"jsonrpc": "2.0",
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"messageId": str(uuid.uuid4()),
|
||||
"role": "user",
|
||||
"parts": parts,
|
||||
}
|
||||
},
|
||||
"id": 1
|
||||
}
|
||||
|
||||
if metadata:
|
||||
payload["params"]["message"]["metadata"] = metadata
|
||||
|
||||
# Include context if we have one
|
||||
if self.context_id:
|
||||
payload["params"]["contextId"] = self.context_id
|
||||
|
||||
# Send to root endpoint per A2A protocol
|
||||
response = await self.client.post(f"{self.url}/", json=payload)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
# Extract response based on A2A JSON-RPC format
|
||||
if isinstance(result, dict):
|
||||
# Update context for continuity
|
||||
if "result" in result and isinstance(result["result"], dict):
|
||||
if "contextId" in result["result"]:
|
||||
self.context_id = result["result"]["contextId"]
|
||||
|
||||
# Extract text from artifacts
|
||||
if "artifacts" in result["result"]:
|
||||
texts = []
|
||||
for artifact in result["result"]["artifacts"]:
|
||||
if isinstance(artifact, dict) and "parts" in artifact:
|
||||
for part in artifact["parts"]:
|
||||
if isinstance(part, dict) and "text" in part:
|
||||
texts.append(part["text"])
|
||||
if texts:
|
||||
return " ".join(texts)
|
||||
|
||||
# Extract from message format
|
||||
if "message" in result["result"]:
|
||||
msg = result["result"]["message"]
|
||||
if isinstance(msg, dict) and "parts" in msg:
|
||||
texts = []
|
||||
for part in msg["parts"]:
|
||||
if isinstance(part, dict) and "text" in part:
|
||||
texts.append(part["text"])
|
||||
return " ".join(texts) if texts else str(msg)
|
||||
return str(msg)
|
||||
|
||||
return str(result["result"])
|
||||
|
||||
# Handle error response
|
||||
elif "error" in result:
|
||||
error = result["error"]
|
||||
if isinstance(error, dict):
|
||||
return f"Error: {error.get('message', str(error))}"
|
||||
return f"Error: {error}"
|
||||
|
||||
# Fallback
|
||||
return result.get("response", result.get("message", str(result)))
|
||||
|
||||
return str(result)
|
||||
|
||||
except Exception as e:
|
||||
return f"Error communicating with agent: {e}"
|
||||
|
||||
async def close(self):
|
||||
"""Close the connection properly"""
|
||||
await self.client.aclose()
|
||||
41
backend/Dockerfile
Normal file
41
backend/Dockerfile
Normal file
@@ -0,0 +1,41 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies including Docker client and rsync
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
lsb-release \
|
||||
rsync \
|
||||
&& curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
|
||||
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null \
|
||||
&& apt-get update \
|
||||
&& apt-get install -y docker-ce-cli \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Docker client configuration removed - localhost:5001 doesn't require insecure registry config
|
||||
|
||||
# Install uv for faster package management
|
||||
RUN pip install uv
|
||||
|
||||
# Copy project files
|
||||
COPY pyproject.toml ./
|
||||
COPY uv.lock ./
|
||||
|
||||
# Install dependencies
|
||||
RUN uv sync --no-dev
|
||||
|
||||
# Copy source code
|
||||
COPY . .
|
||||
|
||||
# Expose port
|
||||
EXPOSE 8000
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD curl -f http://localhost:8000/health || exit 1
|
||||
|
||||
# Start the application
|
||||
CMD ["uv", "run", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
257
backend/README.md
Normal file
257
backend/README.md
Normal file
@@ -0,0 +1,257 @@
|
||||
# FuzzForge Backend
|
||||
|
||||
A stateless API server for security testing workflow orchestration using Prefect. This system dynamically discovers workflows, executes them in isolated Docker containers with volume mounting, and returns findings in SARIF format.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **Workflow Discovery System**: Automatically discovers workflows at startup
|
||||
2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
|
||||
3. **Prefect Integration**: Handles container orchestration, workflow execution, and monitoring
|
||||
4. **Volume Mounting**: Secure file access with configurable permissions (ro/rw)
|
||||
5. **SARIF Output**: Standardized security findings format
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Stateless**: No persistent data, fully scalable
|
||||
- **Generic**: No hardcoded workflows, automatic discovery
|
||||
- **Isolated**: Each workflow runs in its own Docker container
|
||||
- **Extensible**: Easy to add new workflows and modules
|
||||
- **Secure**: Read-only volume mounts by default, path validation
|
||||
- **Observable**: Comprehensive logging and status tracking
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker and Docker Compose
|
||||
|
||||
### Installation
|
||||
|
||||
From the project root, start all services:
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
This will start:
|
||||
- Prefect server (API at http://localhost:4200/api)
|
||||
- PostgreSQL database
|
||||
- Redis cache
|
||||
- Docker registry (port 5001)
|
||||
- Prefect worker (for running workflows)
|
||||
- FuzzForge backend API (port 8000)
|
||||
- FuzzForge MCP server (port 8010)
|
||||
|
||||
**Note**: The Prefect UI at http://localhost:4200 is not currently accessible from the host due to the API being configured for inter-container communication. Use the REST API or MCP interface instead.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Workflows
|
||||
|
||||
- `GET /workflows` - List all discovered workflows
|
||||
- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
|
||||
- `GET /workflows/{name}/parameters` - Get workflow parameter schema
|
||||
- `GET /workflows/metadata/schema` - Get metadata.yaml schema
|
||||
- `POST /workflows/{name}/submit` - Submit a workflow for execution
|
||||
|
||||
### Runs
|
||||
|
||||
- `GET /runs/{run_id}/status` - Get run status
|
||||
- `GET /runs/{run_id}/findings` - Get SARIF findings from completed run
|
||||
- `GET /runs/{workflow_name}/findings/{run_id}` - Alternative findings endpoint with workflow name
|
||||
|
||||
## Workflow Structure
|
||||
|
||||
Each workflow must have:
|
||||
|
||||
```
|
||||
toolbox/workflows/{workflow_name}/
|
||||
workflow.py # Prefect flow definition
|
||||
metadata.yaml # Mandatory metadata (parameters, version, etc.)
|
||||
Dockerfile # Optional custom container definition
|
||||
requirements.txt # Optional Python dependencies
|
||||
```
|
||||
|
||||
### Example metadata.yaml
|
||||
|
||||
```yaml
|
||||
name: security_assessment
|
||||
version: "1.0.0"
|
||||
description: "Comprehensive security analysis workflow"
|
||||
author: "FuzzForge Team"
|
||||
category: "comprehensive"
|
||||
tags:
|
||||
- "security"
|
||||
- "analysis"
|
||||
- "comprehensive"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "file_scanner"
|
||||
- "security_analyzer"
|
||||
- "sarif_reporter"
|
||||
resources:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
timeout: 1800
|
||||
|
||||
has_docker: true
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode"
|
||||
scanner_config:
|
||||
type: object
|
||||
description: "Scanner configuration"
|
||||
properties:
|
||||
max_file_size:
|
||||
type: integer
|
||||
description: "Maximum file size to scan (bytes)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted security findings"
|
||||
summary:
|
||||
type: object
|
||||
description: "Scan execution summary"
|
||||
```
|
||||
|
||||
### Metadata Field Descriptions
|
||||
|
||||
- **name**: Workflow identifier (must match directory name)
|
||||
- **version**: Semantic version (x.y.z format)
|
||||
- **description**: Human-readable description of the workflow
|
||||
- **author**: Workflow author/maintainer
|
||||
- **category**: Workflow category (comprehensive, specialized, fuzzing, focused)
|
||||
- **tags**: Array of descriptive tags for categorization
|
||||
- **requirements.tools**: Required security tools that the workflow uses
|
||||
- **requirements.resources**: Resource requirements enforced at runtime:
|
||||
- `memory`: Memory limit (e.g., "512Mi", "1Gi")
|
||||
- `cpu`: CPU limit (e.g., "500m" for 0.5 cores, "1" for 1 core)
|
||||
- `timeout`: Maximum execution time in seconds
|
||||
- **parameters**: JSON Schema object defining workflow parameters
|
||||
- **output_schema**: Expected output format (typically SARIF)
|
||||
|
||||
### Resource Requirements
|
||||
|
||||
Resource requirements defined in workflow metadata are automatically enforced. Users can override defaults when submitting workflows:
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"target_path": "/tmp/project",
|
||||
"volume_mode": "ro",
|
||||
"resource_limits": {
|
||||
"memory_limit": "1Gi",
|
||||
"cpu_limit": "1"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
Resource precedence: User limits > Workflow requirements > System defaults
|
||||
|
||||
## Module Development
|
||||
|
||||
Modules implement the `BaseModule` interface:
|
||||
|
||||
```python
|
||||
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
|
||||
|
||||
class MyModule(BaseModule):
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
return ModuleMetadata(
|
||||
name="my_module",
|
||||
version="1.0.0",
|
||||
description="Module description",
|
||||
category="scanner",
|
||||
...
|
||||
)
|
||||
|
||||
async def execute(self, config: Dict, workspace: Path) -> ModuleResult:
|
||||
# Module logic here
|
||||
findings = [...]
|
||||
return self.create_result(findings=findings)
|
||||
|
||||
def validate_config(self, config: Dict) -> bool:
|
||||
# Validate configuration
|
||||
return True
|
||||
```
|
||||
|
||||
## Submitting a Workflow
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"target_path": "/home/user/project",
|
||||
"volume_mode": "ro",
|
||||
"parameters": {
|
||||
"scanner_config": {"patterns": ["*.py"]},
|
||||
"analyzer_config": {"check_secrets": true}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## Getting Findings
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8000/runs/{run_id}/findings"
|
||||
```
|
||||
|
||||
Returns SARIF-formatted findings:
|
||||
|
||||
```json
|
||||
{
|
||||
"workflow": "security_assessment",
|
||||
"run_id": "abc-123",
|
||||
"sarif": {
|
||||
"version": "2.1.0",
|
||||
"runs": [{
|
||||
"tool": {...},
|
||||
"results": [...]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Volume Mounting**: Only allowed directories can be mounted
|
||||
2. **Read-Only Default**: Volumes mounted as read-only unless explicitly set
|
||||
3. **Container Isolation**: Each workflow runs in an isolated container
|
||||
4. **Resource Limits**: Can set CPU/memory limits via Prefect
|
||||
5. **Network Isolation**: Containers use bridge networking
|
||||
|
||||
## Development
|
||||
|
||||
### Adding a New Workflow
|
||||
|
||||
1. Create directory: `toolbox/workflows/my_workflow/`
|
||||
2. Add `workflow.py` with a Prefect flow
|
||||
3. Add mandatory `metadata.yaml`
|
||||
4. Restart backend: `docker-compose restart fuzzforge-backend`
|
||||
|
||||
### Adding a New Module
|
||||
|
||||
1. Create module in `toolbox/modules/{category}/`
|
||||
2. Implement `BaseModule` interface
|
||||
3. Use in workflows via import
|
||||
122
backend/mcp-config.json
Normal file
122
backend/mcp-config.json
Normal file
@@ -0,0 +1,122 @@
|
||||
{
|
||||
"name": "FuzzForge Security Testing Platform",
|
||||
"description": "MCP server for FuzzForge security testing workflows via Docker Compose",
|
||||
"version": "0.6.0",
|
||||
"connection": {
|
||||
"type": "http",
|
||||
"host": "localhost",
|
||||
"port": 8010,
|
||||
"base_url": "http://localhost:8010",
|
||||
"mcp_endpoint": "/mcp"
|
||||
},
|
||||
"docker_compose": {
|
||||
"service": "fuzzforge-backend",
|
||||
"command": "docker compose up -d",
|
||||
"health_check": "http://localhost:8000/health"
|
||||
},
|
||||
"capabilities": {
|
||||
"tools": [
|
||||
{
|
||||
"name": "submit_security_scan_mcp",
|
||||
"description": "Submit a security scanning workflow for execution",
|
||||
"parameters": {
|
||||
"workflow_name": "string",
|
||||
"target_path": "string",
|
||||
"volume_mode": "string (ro|rw)",
|
||||
"parameters": "object"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "get_comprehensive_scan_summary",
|
||||
"description": "Get a comprehensive summary of scan results with analysis",
|
||||
"parameters": {
|
||||
"run_id": "string"
|
||||
}
|
||||
}
|
||||
],
|
||||
"fastapi_routes": [
|
||||
{
|
||||
"method": "GET",
|
||||
"path": "/",
|
||||
"description": "Get API status and loaded workflows count"
|
||||
},
|
||||
{
|
||||
"method": "GET",
|
||||
"path": "/workflows/",
|
||||
"description": "List all available security testing workflows"
|
||||
},
|
||||
{
|
||||
"method": "POST",
|
||||
"path": "/workflows/{workflow_name}/submit",
|
||||
"description": "Submit a security scanning workflow for execution"
|
||||
},
|
||||
{
|
||||
"method": "GET",
|
||||
"path": "/runs/{run_id}/status",
|
||||
"description": "Get the current status of a security scan run"
|
||||
},
|
||||
{
|
||||
"method": "GET",
|
||||
"path": "/runs/{run_id}/findings",
|
||||
"description": "Get security findings from a completed scan"
|
||||
},
|
||||
{
|
||||
"method": "GET",
|
||||
"path": "/fuzzing/{run_id}/stats",
|
||||
"description": "Get fuzzing statistics for a run"
|
||||
}
|
||||
]
|
||||
},
|
||||
"examples": {
|
||||
"start_infrastructure_scan": {
|
||||
"description": "Run infrastructure security scan on a project",
|
||||
"steps": [
|
||||
"1. Start Docker Compose: docker compose up -d",
|
||||
"2. Submit scan via MCP tool: submit_security_scan_mcp",
|
||||
"3. Monitor status and get results"
|
||||
],
|
||||
"workflow_name": "infrastructure_scan",
|
||||
"target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/infrastructure_vulnerable",
|
||||
"parameters": {
|
||||
"checkov_config": {
|
||||
"severity": ["HIGH", "MEDIUM", "LOW"]
|
||||
},
|
||||
"hadolint_config": {
|
||||
"severity": ["error", "warning", "info", "style"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"static_analysis_scan": {
|
||||
"description": "Run static analysis security scan",
|
||||
"workflow_name": "static_analysis_scan",
|
||||
"target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/static_analysis_vulnerable",
|
||||
"parameters": {
|
||||
"bandit_config": {
|
||||
"severity": ["HIGH", "MEDIUM", "LOW"]
|
||||
},
|
||||
"opengrep_config": {
|
||||
"severity": ["HIGH", "MEDIUM", "LOW"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"secret_detection_scan": {
|
||||
"description": "Run secret detection scan",
|
||||
"workflow_name": "secret_detection_scan",
|
||||
"target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/secret_detection_vulnerable",
|
||||
"parameters": {
|
||||
"trufflehog_config": {
|
||||
"verified_only": false
|
||||
},
|
||||
"gitleaks_config": {
|
||||
"no_git": true
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"usage": {
|
||||
"via_mcp": "Connect MCP client to http://localhost:8010/mcp after starting Docker Compose",
|
||||
"via_api": "Use FastAPI endpoints directly at http://localhost:8000",
|
||||
"start_system": "docker compose up -d",
|
||||
"stop_system": "docker compose down"
|
||||
}
|
||||
}
|
||||
25
backend/pyproject.toml
Normal file
25
backend/pyproject.toml
Normal file
@@ -0,0 +1,25 @@
|
||||
[project]
|
||||
name = "backend"
|
||||
version = "0.6.0"
|
||||
description = "FuzzForge OSS backend"
|
||||
authors = []
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
dependencies = [
|
||||
"fastapi>=0.116.1",
|
||||
"prefect>=3.4.18",
|
||||
"pydantic>=2.0.0",
|
||||
"pyyaml>=6.0",
|
||||
"docker>=7.0.0",
|
||||
"aiofiles>=23.0.0",
|
||||
"uvicorn>=0.30.0",
|
||||
"aiohttp>=3.12.15",
|
||||
"fastmcp",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest>=8.0.0",
|
||||
"pytest-asyncio>=0.23.0",
|
||||
"httpx>=0.27.0",
|
||||
]
|
||||
11
backend/src/__init__.py
Normal file
11
backend/src/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
11
backend/src/api/__init__.py
Normal file
11
backend/src/api/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
325
backend/src/api/fuzzing.py
Normal file
325
backend/src/api/fuzzing.py
Normal file
@@ -0,0 +1,325 @@
|
||||
"""
|
||||
API endpoints for fuzzing workflow management and real-time monitoring
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
from typing import List, Dict, Any
|
||||
from fastapi import APIRouter, HTTPException, Depends, WebSocket, WebSocketDisconnect
|
||||
from fastapi.responses import StreamingResponse
|
||||
import asyncio
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
from src.models.findings import (
|
||||
FuzzingStats,
|
||||
CrashReport
|
||||
)
|
||||
from src.core.workflow_discovery import WorkflowDiscovery
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/fuzzing", tags=["fuzzing"])
|
||||
|
||||
# In-memory storage for real-time stats (in production, use Redis or similar)
|
||||
fuzzing_stats: Dict[str, FuzzingStats] = {}
|
||||
crash_reports: Dict[str, List[CrashReport]] = {}
|
||||
active_connections: Dict[str, List[WebSocket]] = {}
|
||||
|
||||
|
||||
def initialize_fuzzing_tracking(run_id: str, workflow_name: str):
|
||||
"""
|
||||
Initialize fuzzing tracking for a new run.
|
||||
|
||||
This function should be called when a workflow is submitted to enable
|
||||
real-time monitoring and stats collection.
|
||||
|
||||
Args:
|
||||
run_id: The run identifier
|
||||
workflow_name: Name of the workflow
|
||||
"""
|
||||
fuzzing_stats[run_id] = FuzzingStats(
|
||||
run_id=run_id,
|
||||
workflow=workflow_name
|
||||
)
|
||||
crash_reports[run_id] = []
|
||||
active_connections[run_id] = []
|
||||
|
||||
|
||||
@router.get("/{run_id}/stats", response_model=FuzzingStats)
|
||||
async def get_fuzzing_stats(run_id: str) -> FuzzingStats:
|
||||
"""
|
||||
Get current fuzzing statistics for a run.
|
||||
|
||||
Args:
|
||||
run_id: The fuzzing run ID
|
||||
|
||||
Returns:
|
||||
Current fuzzing statistics
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if run not found
|
||||
"""
|
||||
if run_id not in fuzzing_stats:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Fuzzing run not found: {run_id}"
|
||||
)
|
||||
|
||||
return fuzzing_stats[run_id]
|
||||
|
||||
|
||||
@router.get("/{run_id}/crashes", response_model=List[CrashReport])
|
||||
async def get_crash_reports(run_id: str) -> List[CrashReport]:
|
||||
"""
|
||||
Get crash reports for a fuzzing run.
|
||||
|
||||
Args:
|
||||
run_id: The fuzzing run ID
|
||||
|
||||
Returns:
|
||||
List of crash reports
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if run not found
|
||||
"""
|
||||
if run_id not in crash_reports:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Fuzzing run not found: {run_id}"
|
||||
)
|
||||
|
||||
return crash_reports[run_id]
|
||||
|
||||
|
||||
@router.post("/{run_id}/stats")
|
||||
async def update_fuzzing_stats(run_id: str, stats: FuzzingStats):
|
||||
"""
|
||||
Update fuzzing statistics (called by fuzzing workflows).
|
||||
|
||||
Args:
|
||||
run_id: The fuzzing run ID
|
||||
stats: Updated statistics
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if run not found
|
||||
"""
|
||||
if run_id not in fuzzing_stats:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Fuzzing run not found: {run_id}"
|
||||
)
|
||||
|
||||
# Update stats
|
||||
fuzzing_stats[run_id] = stats
|
||||
|
||||
# Debug: log reception for live instrumentation
|
||||
try:
|
||||
logger.info(
|
||||
"Received fuzzing stats update: run_id=%s exec=%s eps=%.2f crashes=%s corpus=%s elapsed=%ss",
|
||||
run_id,
|
||||
stats.executions,
|
||||
stats.executions_per_sec,
|
||||
stats.crashes,
|
||||
stats.corpus_size,
|
||||
stats.elapsed_time,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Notify connected WebSocket clients
|
||||
if run_id in active_connections:
|
||||
message = {
|
||||
"type": "stats_update",
|
||||
"data": stats.model_dump()
|
||||
}
|
||||
for websocket in active_connections[run_id][:]: # Copy to avoid modification during iteration
|
||||
try:
|
||||
await websocket.send_text(json.dumps(message))
|
||||
except Exception:
|
||||
# Remove disconnected clients
|
||||
active_connections[run_id].remove(websocket)
|
||||
|
||||
|
||||
@router.post("/{run_id}/crash")
|
||||
async def report_crash(run_id: str, crash: CrashReport):
|
||||
"""
|
||||
Report a new crash (called by fuzzing workflows).
|
||||
|
||||
Args:
|
||||
run_id: The fuzzing run ID
|
||||
crash: Crash report details
|
||||
"""
|
||||
if run_id not in crash_reports:
|
||||
crash_reports[run_id] = []
|
||||
|
||||
# Add crash report
|
||||
crash_reports[run_id].append(crash)
|
||||
|
||||
# Update stats
|
||||
if run_id in fuzzing_stats:
|
||||
fuzzing_stats[run_id].crashes += 1
|
||||
fuzzing_stats[run_id].last_crash_time = crash.timestamp
|
||||
|
||||
# Notify connected WebSocket clients
|
||||
if run_id in active_connections:
|
||||
message = {
|
||||
"type": "crash_report",
|
||||
"data": crash.model_dump()
|
||||
}
|
||||
for websocket in active_connections[run_id][:]:
|
||||
try:
|
||||
await websocket.send_text(json.dumps(message))
|
||||
except Exception:
|
||||
active_connections[run_id].remove(websocket)
|
||||
|
||||
|
||||
@router.websocket("/{run_id}/live")
|
||||
async def websocket_endpoint(websocket: WebSocket, run_id: str):
|
||||
"""
|
||||
WebSocket endpoint for real-time fuzzing updates.
|
||||
|
||||
Args:
|
||||
websocket: WebSocket connection
|
||||
run_id: The fuzzing run ID to monitor
|
||||
"""
|
||||
await websocket.accept()
|
||||
|
||||
# Initialize connection tracking
|
||||
if run_id not in active_connections:
|
||||
active_connections[run_id] = []
|
||||
active_connections[run_id].append(websocket)
|
||||
|
||||
try:
|
||||
# Send current stats on connection
|
||||
if run_id in fuzzing_stats:
|
||||
current = fuzzing_stats[run_id]
|
||||
if isinstance(current, dict):
|
||||
payload = current
|
||||
elif hasattr(current, "model_dump"):
|
||||
payload = current.model_dump()
|
||||
elif hasattr(current, "dict"):
|
||||
payload = current.dict()
|
||||
else:
|
||||
payload = getattr(current, "__dict__", {"run_id": run_id})
|
||||
message = {"type": "stats_update", "data": payload}
|
||||
await websocket.send_text(json.dumps(message))
|
||||
|
||||
# Keep connection alive
|
||||
while True:
|
||||
try:
|
||||
# Wait for ping or handle disconnect
|
||||
data = await asyncio.wait_for(websocket.receive_text(), timeout=30.0)
|
||||
# Echo back for ping-pong
|
||||
if data == "ping":
|
||||
await websocket.send_text("pong")
|
||||
except asyncio.TimeoutError:
|
||||
# Send periodic heartbeat
|
||||
await websocket.send_text(json.dumps({"type": "heartbeat"}))
|
||||
|
||||
except WebSocketDisconnect:
|
||||
# Clean up connection
|
||||
if run_id in active_connections and websocket in active_connections[run_id]:
|
||||
active_connections[run_id].remove(websocket)
|
||||
except Exception as e:
|
||||
logger.error(f"WebSocket error for run {run_id}: {e}")
|
||||
if run_id in active_connections and websocket in active_connections[run_id]:
|
||||
active_connections[run_id].remove(websocket)
|
||||
|
||||
|
||||
@router.get("/{run_id}/stream")
|
||||
async def stream_fuzzing_updates(run_id: str):
|
||||
"""
|
||||
Server-Sent Events endpoint for real-time fuzzing updates.
|
||||
|
||||
Args:
|
||||
run_id: The fuzzing run ID to monitor
|
||||
|
||||
Returns:
|
||||
Streaming response with real-time updates
|
||||
"""
|
||||
if run_id not in fuzzing_stats:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Fuzzing run not found: {run_id}"
|
||||
)
|
||||
|
||||
async def event_stream():
|
||||
"""Generate server-sent events for fuzzing updates"""
|
||||
last_stats_time = datetime.utcnow()
|
||||
|
||||
while True:
|
||||
try:
|
||||
# Send current stats
|
||||
if run_id in fuzzing_stats:
|
||||
current_stats = fuzzing_stats[run_id]
|
||||
if isinstance(current_stats, dict):
|
||||
stats_payload = current_stats
|
||||
elif hasattr(current_stats, "model_dump"):
|
||||
stats_payload = current_stats.model_dump()
|
||||
elif hasattr(current_stats, "dict"):
|
||||
stats_payload = current_stats.dict()
|
||||
else:
|
||||
stats_payload = getattr(current_stats, "__dict__", {"run_id": run_id})
|
||||
event_data = f"data: {json.dumps({'type': 'stats', 'data': stats_payload})}\n\n"
|
||||
yield event_data
|
||||
|
||||
# Send recent crashes
|
||||
if run_id in crash_reports:
|
||||
recent_crashes = [
|
||||
crash for crash in crash_reports[run_id]
|
||||
if crash.timestamp > last_stats_time
|
||||
]
|
||||
for crash in recent_crashes:
|
||||
event_data = f"data: {json.dumps({'type': 'crash', 'data': crash.model_dump()})}\n\n"
|
||||
yield event_data
|
||||
|
||||
last_stats_time = datetime.utcnow()
|
||||
await asyncio.sleep(5) # Update every 5 seconds
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in event stream for run {run_id}: {e}")
|
||||
break
|
||||
|
||||
return StreamingResponse(
|
||||
event_stream(),
|
||||
media_type="text/event-stream",
|
||||
headers={
|
||||
"Cache-Control": "no-cache",
|
||||
"Connection": "keep-alive",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
@router.delete("/{run_id}")
|
||||
async def cleanup_fuzzing_run(run_id: str):
|
||||
"""
|
||||
Clean up fuzzing run data.
|
||||
|
||||
Args:
|
||||
run_id: The fuzzing run ID to clean up
|
||||
"""
|
||||
# Clean up tracking data
|
||||
fuzzing_stats.pop(run_id, None)
|
||||
crash_reports.pop(run_id, None)
|
||||
|
||||
# Close any active WebSocket connections
|
||||
if run_id in active_connections:
|
||||
for websocket in active_connections[run_id]:
|
||||
try:
|
||||
await websocket.close()
|
||||
except Exception:
|
||||
pass
|
||||
del active_connections[run_id]
|
||||
|
||||
return {"message": f"Cleaned up fuzzing run {run_id}"}
|
||||
184
backend/src/api/runs.py
Normal file
184
backend/src/api/runs.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""
|
||||
API endpoints for workflow run management and findings retrieval
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
from typing import Dict, Any
|
||||
from fastapi import APIRouter, HTTPException, Depends
|
||||
|
||||
from src.models.findings import WorkflowFindings, WorkflowStatus
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/runs", tags=["runs"])
|
||||
|
||||
|
||||
def get_prefect_manager():
|
||||
"""Dependency to get the Prefect manager instance"""
|
||||
from src.main import prefect_mgr
|
||||
return prefect_mgr
|
||||
|
||||
|
||||
@router.get("/{run_id}/status", response_model=WorkflowStatus)
|
||||
async def get_run_status(
|
||||
run_id: str,
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> WorkflowStatus:
|
||||
"""
|
||||
Get the current status of a workflow run.
|
||||
|
||||
Args:
|
||||
run_id: The flow run ID
|
||||
|
||||
Returns:
|
||||
Status information including state, timestamps, and completion flags
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if run not found
|
||||
"""
|
||||
try:
|
||||
status = await prefect_mgr.get_flow_run_status(run_id)
|
||||
|
||||
# Find workflow name from deployment
|
||||
workflow_name = "unknown"
|
||||
workflow_deployment_id = status.get("workflow", "")
|
||||
for name, deployment_id in prefect_mgr.deployments.items():
|
||||
if str(deployment_id) == str(workflow_deployment_id):
|
||||
workflow_name = name
|
||||
break
|
||||
|
||||
return WorkflowStatus(
|
||||
run_id=status["run_id"],
|
||||
workflow=workflow_name,
|
||||
status=status["status"],
|
||||
is_completed=status["is_completed"],
|
||||
is_failed=status["is_failed"],
|
||||
is_running=status["is_running"],
|
||||
created_at=status["created_at"],
|
||||
updated_at=status["updated_at"]
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get status for run {run_id}: {e}")
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Run not found: {run_id}"
|
||||
)
|
||||
|
||||
|
||||
@router.get("/{run_id}/findings", response_model=WorkflowFindings)
|
||||
async def get_run_findings(
|
||||
run_id: str,
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> WorkflowFindings:
|
||||
"""
|
||||
Get the findings from a completed workflow run.
|
||||
|
||||
Args:
|
||||
run_id: The flow run ID
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings from the workflow execution
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if run not found, 400 if run not completed
|
||||
"""
|
||||
try:
|
||||
# Get run status first
|
||||
status = await prefect_mgr.get_flow_run_status(run_id)
|
||||
|
||||
if not status["is_completed"]:
|
||||
if status["is_running"]:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Run {run_id} is still running. Current status: {status['status']}"
|
||||
)
|
||||
elif status["is_failed"]:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Run {run_id} failed. Status: {status['status']}"
|
||||
)
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Run {run_id} not completed. Status: {status['status']}"
|
||||
)
|
||||
|
||||
# Get the findings
|
||||
findings = await prefect_mgr.get_flow_run_findings(run_id)
|
||||
|
||||
# Find workflow name
|
||||
workflow_name = "unknown"
|
||||
workflow_deployment_id = status.get("workflow", "")
|
||||
for name, deployment_id in prefect_mgr.deployments.items():
|
||||
if str(deployment_id) == str(workflow_deployment_id):
|
||||
workflow_name = name
|
||||
break
|
||||
|
||||
# Get workflow version if available
|
||||
metadata = {
|
||||
"completion_time": status["updated_at"],
|
||||
"workflow_version": "unknown"
|
||||
}
|
||||
|
||||
if workflow_name in prefect_mgr.workflows:
|
||||
workflow_info = prefect_mgr.workflows[workflow_name]
|
||||
metadata["workflow_version"] = workflow_info.metadata.get("version", "unknown")
|
||||
|
||||
return WorkflowFindings(
|
||||
workflow=workflow_name,
|
||||
run_id=run_id,
|
||||
sarif=findings,
|
||||
metadata=metadata
|
||||
)
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get findings for run {run_id}: {e}")
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Failed to retrieve findings: {str(e)}"
|
||||
)
|
||||
|
||||
|
||||
@router.get("/{workflow_name}/findings/{run_id}", response_model=WorkflowFindings)
|
||||
async def get_workflow_findings(
|
||||
workflow_name: str,
|
||||
run_id: str,
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> WorkflowFindings:
|
||||
"""
|
||||
Get findings for a specific workflow run.
|
||||
|
||||
Alternative endpoint that includes workflow name in the path for clarity.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
run_id: The flow run ID
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings from the workflow execution
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if workflow or run not found, 400 if run not completed
|
||||
"""
|
||||
if workflow_name not in prefect_mgr.workflows:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Workflow not found: {workflow_name}"
|
||||
)
|
||||
|
||||
# Delegate to the main findings endpoint
|
||||
return await get_run_findings(run_id, prefect_mgr)
|
||||
386
backend/src/api/workflows.py
Normal file
386
backend/src/api/workflows.py
Normal file
@@ -0,0 +1,386 @@
|
||||
"""
|
||||
API endpoints for workflow management with enhanced error handling
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
import traceback
|
||||
from typing import List, Dict, Any, Optional
|
||||
from fastapi import APIRouter, HTTPException, Depends
|
||||
from pathlib import Path
|
||||
|
||||
from src.models.findings import (
|
||||
WorkflowSubmission,
|
||||
WorkflowMetadata,
|
||||
WorkflowListItem,
|
||||
RunSubmissionResponse
|
||||
)
|
||||
from src.core.workflow_discovery import WorkflowDiscovery
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/workflows", tags=["workflows"])
|
||||
|
||||
|
||||
def create_structured_error_response(
|
||||
error_type: str,
|
||||
message: str,
|
||||
workflow_name: Optional[str] = None,
|
||||
run_id: Optional[str] = None,
|
||||
container_info: Optional[Dict[str, Any]] = None,
|
||||
deployment_info: Optional[Dict[str, Any]] = None,
|
||||
suggestions: Optional[List[str]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""Create a structured error response with rich context."""
|
||||
error_response = {
|
||||
"error": {
|
||||
"type": error_type,
|
||||
"message": message,
|
||||
"timestamp": __import__("datetime").datetime.utcnow().isoformat() + "Z"
|
||||
}
|
||||
}
|
||||
|
||||
if workflow_name:
|
||||
error_response["error"]["workflow_name"] = workflow_name
|
||||
|
||||
if run_id:
|
||||
error_response["error"]["run_id"] = run_id
|
||||
|
||||
if container_info:
|
||||
error_response["error"]["container"] = container_info
|
||||
|
||||
if deployment_info:
|
||||
error_response["error"]["deployment"] = deployment_info
|
||||
|
||||
if suggestions:
|
||||
error_response["error"]["suggestions"] = suggestions
|
||||
|
||||
return error_response
|
||||
|
||||
|
||||
def get_prefect_manager():
|
||||
"""Dependency to get the Prefect manager instance"""
|
||||
from src.main import prefect_mgr
|
||||
return prefect_mgr
|
||||
|
||||
|
||||
@router.get("/", response_model=List[WorkflowListItem])
|
||||
async def list_workflows(
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> List[WorkflowListItem]:
|
||||
"""
|
||||
List all discovered workflows with their metadata.
|
||||
|
||||
Returns a summary of each workflow including name, version, description,
|
||||
author, and tags.
|
||||
"""
|
||||
workflows = []
|
||||
for name, info in prefect_mgr.workflows.items():
|
||||
workflows.append(WorkflowListItem(
|
||||
name=name,
|
||||
version=info.metadata.get("version", "0.6.0"),
|
||||
description=info.metadata.get("description", ""),
|
||||
author=info.metadata.get("author"),
|
||||
tags=info.metadata.get("tags", [])
|
||||
))
|
||||
|
||||
return workflows
|
||||
|
||||
|
||||
@router.get("/metadata/schema")
|
||||
async def get_metadata_schema() -> Dict[str, Any]:
|
||||
"""
|
||||
Get the JSON schema for workflow metadata files.
|
||||
|
||||
This schema defines the structure and requirements for metadata.yaml files
|
||||
that must accompany each workflow.
|
||||
"""
|
||||
return WorkflowDiscovery.get_metadata_schema()
|
||||
|
||||
|
||||
@router.get("/{workflow_name}/metadata", response_model=WorkflowMetadata)
|
||||
async def get_workflow_metadata(
|
||||
workflow_name: str,
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> WorkflowMetadata:
|
||||
"""
|
||||
Get complete metadata for a specific workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Complete metadata including parameters schema, supported volume modes,
|
||||
required modules, and more.
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if workflow not found
|
||||
"""
|
||||
if workflow_name not in prefect_mgr.workflows:
|
||||
available_workflows = list(prefect_mgr.workflows.keys())
|
||||
error_response = create_structured_error_response(
|
||||
error_type="WorkflowNotFound",
|
||||
message=f"Workflow '{workflow_name}' not found",
|
||||
workflow_name=workflow_name,
|
||||
suggestions=[
|
||||
f"Available workflows: {', '.join(available_workflows)}",
|
||||
"Use GET /workflows/ to see all available workflows",
|
||||
"Check workflow name spelling and case sensitivity"
|
||||
]
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=error_response
|
||||
)
|
||||
|
||||
info = prefect_mgr.workflows[workflow_name]
|
||||
metadata = info.metadata
|
||||
|
||||
return WorkflowMetadata(
|
||||
name=workflow_name,
|
||||
version=metadata.get("version", "0.6.0"),
|
||||
description=metadata.get("description", ""),
|
||||
author=metadata.get("author"),
|
||||
tags=metadata.get("tags", []),
|
||||
parameters=metadata.get("parameters", {}),
|
||||
default_parameters=metadata.get("default_parameters", {}),
|
||||
required_modules=metadata.get("required_modules", []),
|
||||
supported_volume_modes=metadata.get("supported_volume_modes", ["ro", "rw"]),
|
||||
has_custom_docker=info.has_docker
|
||||
)
|
||||
|
||||
|
||||
@router.post("/{workflow_name}/submit", response_model=RunSubmissionResponse)
|
||||
async def submit_workflow(
|
||||
workflow_name: str,
|
||||
submission: WorkflowSubmission,
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> RunSubmissionResponse:
|
||||
"""
|
||||
Submit a workflow for execution with volume mounting.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow to execute
|
||||
submission: Submission parameters including target path and volume mode
|
||||
|
||||
Returns:
|
||||
Run submission response with run_id and initial status
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if workflow not found, 400 for invalid parameters
|
||||
"""
|
||||
if workflow_name not in prefect_mgr.workflows:
|
||||
available_workflows = list(prefect_mgr.workflows.keys())
|
||||
error_response = create_structured_error_response(
|
||||
error_type="WorkflowNotFound",
|
||||
message=f"Workflow '{workflow_name}' not found",
|
||||
workflow_name=workflow_name,
|
||||
suggestions=[
|
||||
f"Available workflows: {', '.join(available_workflows)}",
|
||||
"Use GET /workflows/ to see all available workflows",
|
||||
"Check workflow name spelling and case sensitivity"
|
||||
]
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=error_response
|
||||
)
|
||||
|
||||
try:
|
||||
# Convert ResourceLimits to dict if provided
|
||||
resource_limits_dict = None
|
||||
if submission.resource_limits:
|
||||
resource_limits_dict = {
|
||||
"cpu_limit": submission.resource_limits.cpu_limit,
|
||||
"memory_limit": submission.resource_limits.memory_limit,
|
||||
"cpu_request": submission.resource_limits.cpu_request,
|
||||
"memory_request": submission.resource_limits.memory_request
|
||||
}
|
||||
|
||||
# Submit the workflow with enhanced parameters
|
||||
flow_run = await prefect_mgr.submit_workflow(
|
||||
workflow_name=workflow_name,
|
||||
target_path=submission.target_path,
|
||||
volume_mode=submission.volume_mode,
|
||||
parameters=submission.parameters,
|
||||
resource_limits=resource_limits_dict,
|
||||
additional_volumes=submission.additional_volumes,
|
||||
timeout=submission.timeout
|
||||
)
|
||||
|
||||
run_id = str(flow_run.id)
|
||||
|
||||
# Initialize fuzzing tracking if this looks like a fuzzing workflow
|
||||
workflow_info = prefect_mgr.workflows.get(workflow_name, {})
|
||||
workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
|
||||
if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
|
||||
from src.api.fuzzing import initialize_fuzzing_tracking
|
||||
initialize_fuzzing_tracking(run_id, workflow_name)
|
||||
|
||||
return RunSubmissionResponse(
|
||||
run_id=run_id,
|
||||
status=flow_run.state.name if flow_run.state else "PENDING",
|
||||
workflow=workflow_name,
|
||||
message=f"Workflow '{workflow_name}' submitted successfully"
|
||||
)
|
||||
|
||||
except ValueError as e:
|
||||
# Parameter validation errors
|
||||
error_response = create_structured_error_response(
|
||||
error_type="ValidationError",
|
||||
message=str(e),
|
||||
workflow_name=workflow_name,
|
||||
suggestions=[
|
||||
"Check parameter types and values",
|
||||
"Use GET /workflows/{workflow_name}/parameters for schema",
|
||||
"Ensure all required parameters are provided"
|
||||
]
|
||||
)
|
||||
raise HTTPException(status_code=400, detail=error_response)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to submit workflow '{workflow_name}': {e}")
|
||||
logger.error(f"Traceback: {traceback.format_exc()}")
|
||||
|
||||
# Try to get more context about the error
|
||||
container_info = None
|
||||
deployment_info = None
|
||||
suggestions = []
|
||||
|
||||
error_message = str(e)
|
||||
error_type = "WorkflowSubmissionError"
|
||||
|
||||
# Detect specific error patterns
|
||||
if "deployment" in error_message.lower():
|
||||
error_type = "DeploymentError"
|
||||
deployment_info = {
|
||||
"status": "failed",
|
||||
"error": error_message
|
||||
}
|
||||
suggestions.extend([
|
||||
"Check if Prefect server is running and accessible",
|
||||
"Verify Docker is running and has sufficient resources",
|
||||
"Check container image availability",
|
||||
"Ensure volume paths exist and are accessible"
|
||||
])
|
||||
|
||||
elif "volume" in error_message.lower() or "mount" in error_message.lower():
|
||||
error_type = "VolumeError"
|
||||
suggestions.extend([
|
||||
"Check if the target path exists and is accessible",
|
||||
"Verify file permissions (Docker needs read access)",
|
||||
"Ensure the path is not in use by another process",
|
||||
"Try using an absolute path instead of relative path"
|
||||
])
|
||||
|
||||
elif "memory" in error_message.lower() or "resource" in error_message.lower():
|
||||
error_type = "ResourceError"
|
||||
suggestions.extend([
|
||||
"Check system memory and CPU availability",
|
||||
"Consider reducing resource limits or dataset size",
|
||||
"Monitor Docker resource usage",
|
||||
"Increase Docker memory limits if needed"
|
||||
])
|
||||
|
||||
elif "image" in error_message.lower():
|
||||
error_type = "ImageError"
|
||||
suggestions.extend([
|
||||
"Check if the workflow image exists",
|
||||
"Verify Docker registry access",
|
||||
"Try rebuilding the workflow image",
|
||||
"Check network connectivity to registries"
|
||||
])
|
||||
|
||||
else:
|
||||
suggestions.extend([
|
||||
"Check FuzzForge backend logs for details",
|
||||
"Verify all services are running (docker-compose up -d)",
|
||||
"Try restarting the workflow deployment",
|
||||
"Contact support if the issue persists"
|
||||
])
|
||||
|
||||
error_response = create_structured_error_response(
|
||||
error_type=error_type,
|
||||
message=f"Failed to submit workflow: {error_message}",
|
||||
workflow_name=workflow_name,
|
||||
container_info=container_info,
|
||||
deployment_info=deployment_info,
|
||||
suggestions=suggestions
|
||||
)
|
||||
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=error_response
|
||||
)
|
||||
|
||||
|
||||
@router.get("/{workflow_name}/parameters")
|
||||
async def get_workflow_parameters(
|
||||
workflow_name: str,
|
||||
prefect_mgr=Depends(get_prefect_manager)
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Get the parameters schema for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Parameters schema with types, descriptions, and defaults
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if workflow not found
|
||||
"""
|
||||
if workflow_name not in prefect_mgr.workflows:
|
||||
available_workflows = list(prefect_mgr.workflows.keys())
|
||||
error_response = create_structured_error_response(
|
||||
error_type="WorkflowNotFound",
|
||||
message=f"Workflow '{workflow_name}' not found",
|
||||
workflow_name=workflow_name,
|
||||
suggestions=[
|
||||
f"Available workflows: {', '.join(available_workflows)}",
|
||||
"Use GET /workflows/ to see all available workflows"
|
||||
]
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=error_response
|
||||
)
|
||||
|
||||
info = prefect_mgr.workflows[workflow_name]
|
||||
metadata = info.metadata
|
||||
|
||||
# Return parameters with enhanced schema information
|
||||
parameters_schema = metadata.get("parameters", {})
|
||||
|
||||
# Extract the actual parameter definitions from JSON schema structure
|
||||
if "properties" in parameters_schema:
|
||||
param_definitions = parameters_schema["properties"]
|
||||
else:
|
||||
param_definitions = parameters_schema
|
||||
|
||||
# Add default values to the schema
|
||||
default_params = metadata.get("default_parameters", {})
|
||||
for param_name, param_schema in param_definitions.items():
|
||||
if isinstance(param_schema, dict) and param_name in default_params:
|
||||
param_schema["default"] = default_params[param_name]
|
||||
|
||||
return {
|
||||
"workflow": workflow_name,
|
||||
"parameters": param_definitions,
|
||||
"default_parameters": default_params,
|
||||
"required_parameters": [
|
||||
name for name, schema in param_definitions.items()
|
||||
if isinstance(schema, dict) and schema.get("required", False)
|
||||
]
|
||||
}
|
||||
11
backend/src/core/__init__.py
Normal file
11
backend/src/core/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
770
backend/src/core/prefect_manager.py
Normal file
770
backend/src/core/prefect_manager.py
Normal file
@@ -0,0 +1,770 @@
|
||||
"""
|
||||
Prefect Manager - Core orchestration for workflow deployment and execution
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
import os
|
||||
import platform
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, Optional, Any
|
||||
from prefect import get_client
|
||||
from prefect.docker import DockerImage
|
||||
from prefect.client.schemas import FlowRun
|
||||
|
||||
from src.core.workflow_discovery import WorkflowDiscovery, WorkflowInfo
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_registry_url(context: str = "default") -> str:
|
||||
"""
|
||||
Get the container registry URL to use for a given operation context.
|
||||
|
||||
Goals:
|
||||
- Work reliably across Linux and macOS Docker Desktop
|
||||
- Prefer in-network service discovery when running inside containers
|
||||
- Allow full override via env vars from docker-compose
|
||||
|
||||
Env overrides:
|
||||
- FUZZFORGE_REGISTRY_PUSH_URL: used for image builds/pushes
|
||||
- FUZZFORGE_REGISTRY_PULL_URL: used for workers to pull images
|
||||
"""
|
||||
# Normalize context
|
||||
ctx = (context or "default").lower()
|
||||
|
||||
# Always honor explicit overrides first
|
||||
if ctx in ("push", "build"):
|
||||
push_url = os.getenv("FUZZFORGE_REGISTRY_PUSH_URL")
|
||||
if push_url:
|
||||
logger.debug("Using FUZZFORGE_REGISTRY_PUSH_URL: %s", push_url)
|
||||
return push_url
|
||||
# Default to host-published registry for Docker daemon operations
|
||||
return "localhost:5001"
|
||||
|
||||
if ctx == "pull":
|
||||
pull_url = os.getenv("FUZZFORGE_REGISTRY_PULL_URL")
|
||||
if pull_url:
|
||||
logger.debug("Using FUZZFORGE_REGISTRY_PULL_URL: %s", pull_url)
|
||||
return pull_url
|
||||
# Prefect worker pulls via host Docker daemon as well
|
||||
return "localhost:5001"
|
||||
|
||||
# Default/fallback
|
||||
return os.getenv("FUZZFORGE_REGISTRY_PULL_URL", os.getenv("FUZZFORGE_REGISTRY_PUSH_URL", "localhost:5001"))
|
||||
|
||||
|
||||
def _compose_project_name(default: str = "fuzzforge") -> str:
|
||||
"""Return the docker-compose project name used for network/volume naming.
|
||||
|
||||
Always returns 'fuzzforge' regardless of environment variables.
|
||||
"""
|
||||
return "fuzzforge"
|
||||
|
||||
|
||||
class PrefectManager:
|
||||
"""
|
||||
Manages Prefect deployments and flow runs for discovered workflows.
|
||||
|
||||
This class handles:
|
||||
- Workflow discovery and registration
|
||||
- Docker image building through Prefect
|
||||
- Deployment creation and management
|
||||
- Flow run submission with volume mounting
|
||||
- Findings retrieval from completed runs
|
||||
"""
|
||||
|
||||
def __init__(self, workflows_dir: Path = None):
|
||||
"""
|
||||
Initialize the Prefect manager.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to the workflows directory (default: toolbox/workflows)
|
||||
"""
|
||||
if workflows_dir is None:
|
||||
workflows_dir = Path("toolbox/workflows")
|
||||
|
||||
self.discovery = WorkflowDiscovery(workflows_dir)
|
||||
self.workflows: Dict[str, WorkflowInfo] = {}
|
||||
self.deployments: Dict[str, str] = {} # workflow_name -> deployment_id
|
||||
|
||||
# Security: Define allowed and forbidden paths for host mounting
|
||||
self.allowed_base_paths = [
|
||||
"/tmp",
|
||||
"/home",
|
||||
"/Users", # macOS users
|
||||
"/opt",
|
||||
"/var/tmp",
|
||||
"/workspace", # Common container workspace
|
||||
"/app" # Container application directory (for test projects)
|
||||
]
|
||||
|
||||
self.forbidden_paths = [
|
||||
"/etc",
|
||||
"/root",
|
||||
"/var/run",
|
||||
"/sys",
|
||||
"/proc",
|
||||
"/dev",
|
||||
"/boot",
|
||||
"/var/lib/docker", # Critical Docker data
|
||||
"/var/log", # System logs
|
||||
"/usr/bin", # System binaries
|
||||
"/usr/sbin",
|
||||
"/sbin",
|
||||
"/bin"
|
||||
]
|
||||
|
||||
@staticmethod
|
||||
def _parse_memory_to_bytes(memory_str: str) -> int:
|
||||
"""
|
||||
Parse memory string (like '512Mi', '1Gi') to bytes.
|
||||
|
||||
Args:
|
||||
memory_str: Memory string with unit suffix
|
||||
|
||||
Returns:
|
||||
Memory in bytes
|
||||
|
||||
Raises:
|
||||
ValueError: If format is invalid
|
||||
"""
|
||||
if not memory_str:
|
||||
return 0
|
||||
|
||||
match = re.match(r'^(\d+(?:\.\d+)?)\s*([GMK]i?)$', memory_str.strip())
|
||||
if not match:
|
||||
raise ValueError(f"Invalid memory format: {memory_str}. Expected format like '512Mi', '1Gi'")
|
||||
|
||||
value, unit = match.groups()
|
||||
value = float(value)
|
||||
|
||||
# Convert to bytes based on unit (binary units: Ki, Mi, Gi)
|
||||
if unit in ['K', 'Ki']:
|
||||
multiplier = 1024
|
||||
elif unit in ['M', 'Mi']:
|
||||
multiplier = 1024 * 1024
|
||||
elif unit in ['G', 'Gi']:
|
||||
multiplier = 1024 * 1024 * 1024
|
||||
else:
|
||||
raise ValueError(f"Unsupported memory unit: {unit}")
|
||||
|
||||
return int(value * multiplier)
|
||||
|
||||
@staticmethod
|
||||
def _parse_cpu_to_millicores(cpu_str: str) -> int:
|
||||
"""
|
||||
Parse CPU string (like '500m', '1', '2.5') to millicores.
|
||||
|
||||
Args:
|
||||
cpu_str: CPU string
|
||||
|
||||
Returns:
|
||||
CPU in millicores (1 core = 1000 millicores)
|
||||
|
||||
Raises:
|
||||
ValueError: If format is invalid
|
||||
"""
|
||||
if not cpu_str:
|
||||
return 0
|
||||
|
||||
cpu_str = cpu_str.strip()
|
||||
|
||||
# Handle millicores format (e.g., '500m')
|
||||
if cpu_str.endswith('m'):
|
||||
try:
|
||||
return int(cpu_str[:-1])
|
||||
except ValueError:
|
||||
raise ValueError(f"Invalid CPU format: {cpu_str}")
|
||||
|
||||
# Handle core format (e.g., '1', '2.5')
|
||||
try:
|
||||
cores = float(cpu_str)
|
||||
return int(cores * 1000) # Convert to millicores
|
||||
except ValueError:
|
||||
raise ValueError(f"Invalid CPU format: {cpu_str}")
|
||||
|
||||
def _extract_resource_requirements(self, workflow_info: WorkflowInfo) -> Dict[str, str]:
|
||||
"""
|
||||
Extract resource requirements from workflow metadata.
|
||||
|
||||
Args:
|
||||
workflow_info: Workflow information with metadata
|
||||
|
||||
Returns:
|
||||
Dictionary with resource requirements in Docker format
|
||||
"""
|
||||
metadata = workflow_info.metadata
|
||||
requirements = metadata.get("requirements", {})
|
||||
resources = requirements.get("resources", {})
|
||||
|
||||
resource_config = {}
|
||||
|
||||
# Extract memory requirement
|
||||
memory = resources.get("memory")
|
||||
if memory:
|
||||
try:
|
||||
# Validate memory format and store original string for Docker
|
||||
self._parse_memory_to_bytes(memory)
|
||||
resource_config["memory"] = memory
|
||||
except ValueError as e:
|
||||
logger.warning(f"Invalid memory requirement in {workflow_info.name}: {e}")
|
||||
|
||||
# Extract CPU requirement
|
||||
cpu = resources.get("cpu")
|
||||
if cpu:
|
||||
try:
|
||||
# Validate CPU format and store original string for Docker
|
||||
self._parse_cpu_to_millicores(cpu)
|
||||
resource_config["cpus"] = cpu
|
||||
except ValueError as e:
|
||||
logger.warning(f"Invalid CPU requirement in {workflow_info.name}: {e}")
|
||||
|
||||
# Extract timeout
|
||||
timeout = resources.get("timeout")
|
||||
if timeout and isinstance(timeout, int):
|
||||
resource_config["timeout"] = str(timeout)
|
||||
|
||||
return resource_config
|
||||
|
||||
async def initialize(self):
|
||||
"""
|
||||
Initialize the manager by discovering and deploying all workflows.
|
||||
|
||||
This method:
|
||||
1. Discovers all valid workflows in the workflows directory
|
||||
2. Validates their metadata
|
||||
3. Deploys each workflow to Prefect with Docker images
|
||||
"""
|
||||
try:
|
||||
# Discover workflows
|
||||
self.workflows = await self.discovery.discover_workflows()
|
||||
|
||||
if not self.workflows:
|
||||
logger.warning("No workflows discovered")
|
||||
return
|
||||
|
||||
logger.info(f"Discovered {len(self.workflows)} workflows: {list(self.workflows.keys())}")
|
||||
|
||||
# Deploy each workflow
|
||||
for name, info in self.workflows.items():
|
||||
try:
|
||||
await self._deploy_workflow(name, info)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to deploy workflow '{name}': {e}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize Prefect manager: {e}")
|
||||
raise
|
||||
|
||||
async def _deploy_workflow(self, name: str, info: WorkflowInfo):
|
||||
"""
|
||||
Deploy a single workflow to Prefect with Docker image.
|
||||
|
||||
Args:
|
||||
name: Workflow name
|
||||
info: Workflow information including metadata and paths
|
||||
"""
|
||||
logger.info(f"Deploying workflow '{name}'...")
|
||||
|
||||
# Get the flow function from registry
|
||||
flow_func = self.discovery.get_flow_function(name)
|
||||
if not flow_func:
|
||||
logger.error(
|
||||
f"Failed to get flow function for '{name}' from registry. "
|
||||
f"Ensure the workflow is properly registered in toolbox/workflows/registry.py"
|
||||
)
|
||||
return
|
||||
|
||||
# Use the mandatory Dockerfile with absolute paths for Docker Compose
|
||||
# Get absolute paths for build context and dockerfile
|
||||
toolbox_path = info.path.parent.parent.resolve()
|
||||
dockerfile_abs_path = info.dockerfile.resolve()
|
||||
|
||||
# Calculate relative dockerfile path from toolbox context
|
||||
try:
|
||||
dockerfile_rel_path = dockerfile_abs_path.relative_to(toolbox_path)
|
||||
except ValueError:
|
||||
# If relative path fails, use the workflow-specific path
|
||||
dockerfile_rel_path = Path("workflows") / name / "Dockerfile"
|
||||
|
||||
# Determine deployment strategy based on Dockerfile presence
|
||||
base_image = "prefecthq/prefect:3-python3.11"
|
||||
has_custom_dockerfile = info.has_docker and info.dockerfile.exists()
|
||||
|
||||
logger.info(f"=== DEPLOYMENT DEBUG for '{name}' ===")
|
||||
logger.info(f"info.has_docker: {info.has_docker}")
|
||||
logger.info(f"info.dockerfile: {info.dockerfile}")
|
||||
logger.info(f"info.dockerfile.exists(): {info.dockerfile.exists()}")
|
||||
logger.info(f"has_custom_dockerfile: {has_custom_dockerfile}")
|
||||
logger.info(f"toolbox_path: {toolbox_path}")
|
||||
logger.info(f"dockerfile_rel_path: {dockerfile_rel_path}")
|
||||
|
||||
if has_custom_dockerfile:
|
||||
logger.info(f"Workflow '{name}' has custom Dockerfile - building custom image")
|
||||
# Decide whether to use registry or keep images local to host engine
|
||||
import os
|
||||
# Default to using the local registry; set FUZZFORGE_USE_REGISTRY=false to bypass (not recommended)
|
||||
use_registry = os.getenv("FUZZFORGE_USE_REGISTRY", "true").lower() == "true"
|
||||
|
||||
if use_registry:
|
||||
registry_url = get_registry_url(context="push")
|
||||
image_spec = DockerImage(
|
||||
name=f"{registry_url}/fuzzforge/{name}",
|
||||
tag="latest",
|
||||
dockerfile=str(dockerfile_rel_path),
|
||||
context=str(toolbox_path)
|
||||
)
|
||||
deploy_image = f"{registry_url}/fuzzforge/{name}:latest"
|
||||
build_custom = True
|
||||
push_custom = True
|
||||
logger.info(f"Using registry: {registry_url} for '{name}'")
|
||||
else:
|
||||
# Single-host mode: build into host engine cache; no push required
|
||||
image_spec = DockerImage(
|
||||
name=f"fuzzforge/{name}",
|
||||
tag="latest",
|
||||
dockerfile=str(dockerfile_rel_path),
|
||||
context=str(toolbox_path)
|
||||
)
|
||||
deploy_image = f"fuzzforge/{name}:latest"
|
||||
build_custom = True
|
||||
push_custom = False
|
||||
logger.info("Using single-host image (no registry push): %s", deploy_image)
|
||||
else:
|
||||
logger.info(f"Workflow '{name}' using base image - no custom dependencies needed")
|
||||
deploy_image = base_image
|
||||
build_custom = False
|
||||
push_custom = False
|
||||
|
||||
# Pre-validate registry connectivity when pushing
|
||||
if push_custom:
|
||||
try:
|
||||
from .setup import validate_registry_connectivity
|
||||
await validate_registry_connectivity(registry_url)
|
||||
logger.info(f"Registry connectivity validated for {registry_url}")
|
||||
except Exception as e:
|
||||
logger.error(f"Registry connectivity validation failed for {registry_url}: {e}")
|
||||
raise RuntimeError(f"Cannot deploy workflow '{name}': Registry {registry_url} is not accessible. {e}")
|
||||
|
||||
# Deploy the workflow
|
||||
try:
|
||||
# Ensure any previous deployment is removed so job variables are updated
|
||||
try:
|
||||
async with get_client() as client:
|
||||
existing = await client.read_deployment_by_name(
|
||||
f"{name}/{name}-deployment"
|
||||
)
|
||||
if existing:
|
||||
logger.info(f"Removing existing deployment for '{name}' to refresh settings...")
|
||||
await client.delete_deployment(existing.id)
|
||||
except Exception:
|
||||
# If not found or deletion fails, continue with deployment
|
||||
pass
|
||||
|
||||
# Extract resource requirements from metadata
|
||||
workflow_resource_requirements = self._extract_resource_requirements(info)
|
||||
logger.info(f"Workflow '{name}' resource requirements: {workflow_resource_requirements}")
|
||||
|
||||
# Build job variables with resource requirements
|
||||
job_variables = {
|
||||
"image": deploy_image, # Use the worker-accessible registry name
|
||||
"volumes": [], # Populated at run submission with toolbox mount
|
||||
"env": {
|
||||
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect",
|
||||
"WORKFLOW_NAME": name
|
||||
}
|
||||
}
|
||||
|
||||
# Add resource requirements to job variables if present
|
||||
if workflow_resource_requirements:
|
||||
job_variables["resources"] = workflow_resource_requirements
|
||||
|
||||
# Prepare deployment parameters
|
||||
deploy_params = {
|
||||
"name": f"{name}-deployment",
|
||||
"work_pool_name": "docker-pool",
|
||||
"image": image_spec if has_custom_dockerfile else deploy_image,
|
||||
"push": push_custom,
|
||||
"build": build_custom,
|
||||
"job_variables": job_variables
|
||||
}
|
||||
|
||||
deployment = await flow_func.deploy(**deploy_params)
|
||||
|
||||
self.deployments[name] = str(deployment.id) if hasattr(deployment, 'id') else name
|
||||
logger.info(f"Successfully deployed workflow '{name}'")
|
||||
|
||||
except Exception as e:
|
||||
# Enhanced error reporting with more context
|
||||
import traceback
|
||||
logger.error(f"Failed to deploy workflow '{name}': {e}")
|
||||
logger.error(f"Deployment traceback: {traceback.format_exc()}")
|
||||
|
||||
# Try to capture Docker-specific context
|
||||
error_context = {
|
||||
"workflow_name": name,
|
||||
"has_dockerfile": has_custom_dockerfile,
|
||||
"image_name": deploy_image if 'deploy_image' in locals() else "unknown",
|
||||
"registry_url": registry_url if 'registry_url' in locals() else "unknown",
|
||||
"error_type": type(e).__name__,
|
||||
"error_message": str(e)
|
||||
}
|
||||
|
||||
# Check for specific error patterns with detailed categorization
|
||||
error_msg_lower = str(e).lower()
|
||||
if "registry" in error_msg_lower and ("no such host" in error_msg_lower or "connection" in error_msg_lower):
|
||||
error_context["category"] = "registry_connectivity_error"
|
||||
error_context["solution"] = f"Cannot reach registry at {error_context['registry_url']}. Check Docker network and registry service."
|
||||
elif "docker" in error_msg_lower:
|
||||
error_context["category"] = "docker_error"
|
||||
if "build" in error_msg_lower:
|
||||
error_context["subcategory"] = "image_build_failed"
|
||||
error_context["solution"] = "Check Dockerfile syntax and dependencies."
|
||||
elif "pull" in error_msg_lower:
|
||||
error_context["subcategory"] = "image_pull_failed"
|
||||
error_context["solution"] = "Check if image exists in registry and network connectivity."
|
||||
elif "push" in error_msg_lower:
|
||||
error_context["subcategory"] = "image_push_failed"
|
||||
error_context["solution"] = f"Check registry connectivity and push permissions to {error_context['registry_url']}."
|
||||
elif "registry" in error_msg_lower:
|
||||
error_context["category"] = "registry_error"
|
||||
error_context["solution"] = "Check registry configuration and accessibility."
|
||||
elif "prefect" in error_msg_lower:
|
||||
error_context["category"] = "prefect_error"
|
||||
error_context["solution"] = "Check Prefect server connectivity and deployment configuration."
|
||||
else:
|
||||
error_context["category"] = "unknown_deployment_error"
|
||||
error_context["solution"] = "Check logs for more specific error details."
|
||||
|
||||
logger.error(f"Deployment error context: {error_context}")
|
||||
|
||||
# Raise enhanced exception with context
|
||||
enhanced_error = Exception(f"Deployment failed for workflow '{name}': {str(e)} | Context: {error_context}")
|
||||
enhanced_error.original_error = e
|
||||
enhanced_error.context = error_context
|
||||
raise enhanced_error
|
||||
|
||||
async def submit_workflow(
|
||||
self,
|
||||
workflow_name: str,
|
||||
target_path: str,
|
||||
volume_mode: str = "ro",
|
||||
parameters: Dict[str, Any] = None,
|
||||
resource_limits: Dict[str, str] = None,
|
||||
additional_volumes: list = None,
|
||||
timeout: int = None
|
||||
) -> FlowRun:
|
||||
"""
|
||||
Submit a workflow for execution with volume mounting.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow to execute
|
||||
target_path: Host path to mount as volume
|
||||
volume_mode: Volume mount mode ("ro" for read-only, "rw" for read-write)
|
||||
parameters: Workflow-specific parameters
|
||||
resource_limits: CPU/memory limits for container
|
||||
additional_volumes: List of additional volume mounts
|
||||
timeout: Timeout in seconds
|
||||
|
||||
Returns:
|
||||
FlowRun object with run information
|
||||
|
||||
Raises:
|
||||
ValueError: If workflow not found or volume mode not supported
|
||||
"""
|
||||
if workflow_name not in self.workflows:
|
||||
raise ValueError(f"Unknown workflow: {workflow_name}")
|
||||
|
||||
# Validate volume mode
|
||||
workflow_info = self.workflows[workflow_name]
|
||||
supported_modes = workflow_info.metadata.get("supported_volume_modes", ["ro", "rw"])
|
||||
|
||||
if volume_mode not in supported_modes:
|
||||
raise ValueError(
|
||||
f"Workflow '{workflow_name}' doesn't support volume mode '{volume_mode}'. "
|
||||
f"Supported modes: {supported_modes}"
|
||||
)
|
||||
|
||||
# Validate target path with security checks
|
||||
self._validate_target_path(target_path)
|
||||
|
||||
# Validate additional volumes if provided
|
||||
if additional_volumes:
|
||||
for volume in additional_volumes:
|
||||
self._validate_target_path(volume.host_path)
|
||||
|
||||
async with get_client() as client:
|
||||
# Get the deployment, auto-redeploy once if missing
|
||||
try:
|
||||
deployment = await client.read_deployment_by_name(
|
||||
f"{workflow_name}/{workflow_name}-deployment"
|
||||
)
|
||||
except Exception as e:
|
||||
import traceback
|
||||
logger.error(f"Failed to find deployment for workflow '{workflow_name}': {e}")
|
||||
logger.error(f"Deployment lookup traceback: {traceback.format_exc()}")
|
||||
|
||||
# Attempt a one-time auto-deploy to recover from startup races
|
||||
try:
|
||||
logger.info(f"Auto-deploying missing workflow '{workflow_name}' and retrying...")
|
||||
await self._deploy_workflow(workflow_name, workflow_info)
|
||||
deployment = await client.read_deployment_by_name(
|
||||
f"{workflow_name}/{workflow_name}-deployment"
|
||||
)
|
||||
except Exception as redeploy_exc:
|
||||
# Enhanced error with context
|
||||
error_context = {
|
||||
"workflow_name": workflow_name,
|
||||
"error_type": type(e).__name__,
|
||||
"error_message": str(e),
|
||||
"redeploy_error": str(redeploy_exc),
|
||||
"available_deployments": list(self.deployments.keys()),
|
||||
}
|
||||
enhanced_error = ValueError(
|
||||
f"Deployment not found and redeploy failed for workflow '{workflow_name}': {e} | Context: {error_context}"
|
||||
)
|
||||
enhanced_error.context = error_context
|
||||
raise enhanced_error
|
||||
|
||||
# Determine the Docker Compose network name and volume names
|
||||
# Hardcoded to 'fuzzforge' to avoid directory name dependencies
|
||||
import os
|
||||
compose_project = "fuzzforge"
|
||||
docker_network = "fuzzforge_default"
|
||||
|
||||
# Build volume mounts
|
||||
# Add toolbox volume mount for workflow code access
|
||||
backend_toolbox_path = "/app/toolbox" # Path in backend container
|
||||
|
||||
# Hardcoded volume names
|
||||
prefect_storage_volume = "fuzzforge_prefect_storage"
|
||||
toolbox_code_volume = "fuzzforge_toolbox_code"
|
||||
|
||||
volumes = [
|
||||
f"{target_path}:/workspace:{volume_mode}",
|
||||
f"{prefect_storage_volume}:/prefect-storage", # Shared storage for results
|
||||
f"{toolbox_code_volume}:/opt/prefect/toolbox:ro" # Mount workflow code
|
||||
]
|
||||
|
||||
# Add additional volumes if provided
|
||||
if additional_volumes:
|
||||
for volume in additional_volumes:
|
||||
volume_spec = f"{volume.host_path}:{volume.container_path}:{volume.mode}"
|
||||
volumes.append(volume_spec)
|
||||
|
||||
# Build environment variables
|
||||
env_vars = {
|
||||
"PREFECT_API_URL": "http://prefect-server:4200/api", # Use internal network hostname
|
||||
"PREFECT_LOGGING_LEVEL": "INFO",
|
||||
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage", # Use shared storage
|
||||
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true", # Enable result persistence
|
||||
"PREFECT_DEFAULT_RESULT_STORAGE_BLOCK": "local-file-system/fuzzforge-results", # Use our storage block
|
||||
"WORKSPACE_PATH": "/workspace",
|
||||
"VOLUME_MODE": volume_mode,
|
||||
"WORKFLOW_NAME": workflow_name
|
||||
}
|
||||
|
||||
# Add additional volume paths to environment for easy access
|
||||
if additional_volumes:
|
||||
for i, volume in enumerate(additional_volumes):
|
||||
env_vars[f"ADDITIONAL_VOLUME_{i}_PATH"] = volume.container_path
|
||||
|
||||
# Determine which image to use based on workflow configuration
|
||||
workflow_info = self.workflows[workflow_name]
|
||||
has_custom_dockerfile = workflow_info.has_docker and workflow_info.dockerfile.exists()
|
||||
# Use pull context for worker to pull from registry
|
||||
registry_url = get_registry_url(context="pull")
|
||||
workflow_image = f"{registry_url}/fuzzforge/{workflow_name}:latest" if has_custom_dockerfile else "prefecthq/prefect:3-python3.11"
|
||||
logger.debug(f"Worker will pull image: {workflow_image} (Registry: {registry_url})")
|
||||
|
||||
# Configure job variables with volume mounting and network access
|
||||
job_variables = {
|
||||
# Use custom image if available, otherwise base Prefect image
|
||||
"image": workflow_image,
|
||||
"volumes": volumes,
|
||||
"networks": [docker_network], # Connect to Docker Compose network
|
||||
"env": {
|
||||
**env_vars,
|
||||
"PYTHONPATH": "/opt/prefect/toolbox:/opt/prefect/toolbox/workflows",
|
||||
"WORKFLOW_NAME": workflow_name
|
||||
}
|
||||
}
|
||||
|
||||
# Apply resource requirements from workflow metadata and user overrides
|
||||
workflow_resource_requirements = self._extract_resource_requirements(workflow_info)
|
||||
final_resource_config = {}
|
||||
|
||||
# Start with workflow requirements as base
|
||||
if workflow_resource_requirements:
|
||||
final_resource_config.update(workflow_resource_requirements)
|
||||
|
||||
# Apply user-provided resource limits (overrides workflow defaults)
|
||||
if resource_limits:
|
||||
user_resource_config = {}
|
||||
if resource_limits.get("cpu_limit"):
|
||||
user_resource_config["cpus"] = resource_limits["cpu_limit"]
|
||||
if resource_limits.get("memory_limit"):
|
||||
user_resource_config["memory"] = resource_limits["memory_limit"]
|
||||
# Note: cpu_request and memory_request are not directly supported by Docker
|
||||
# but could be used for Kubernetes in the future
|
||||
|
||||
# User overrides take precedence
|
||||
final_resource_config.update(user_resource_config)
|
||||
|
||||
# Apply final resource configuration
|
||||
if final_resource_config:
|
||||
job_variables["resources"] = final_resource_config
|
||||
logger.info(f"Applied resource limits: {final_resource_config}")
|
||||
|
||||
# Merge parameters with defaults from metadata
|
||||
default_params = workflow_info.metadata.get("default_parameters", {})
|
||||
final_params = {**default_params, **(parameters or {})}
|
||||
|
||||
# Set flow parameters that match the flow signature
|
||||
final_params["target_path"] = "/workspace" # Container path where volume is mounted
|
||||
final_params["volume_mode"] = volume_mode
|
||||
|
||||
# Create and submit the flow run
|
||||
# Pass job_variables to ensure network, volumes, and environment are configured
|
||||
logger.info(f"Submitting flow with job_variables: {job_variables}")
|
||||
logger.info(f"Submitting flow with parameters: {final_params}")
|
||||
|
||||
# Prepare flow run creation parameters
|
||||
flow_run_params = {
|
||||
"deployment_id": deployment.id,
|
||||
"parameters": final_params,
|
||||
"job_variables": job_variables
|
||||
}
|
||||
|
||||
# Note: Timeout is handled through workflow-level configuration
|
||||
# Additional timeout configuration can be added to deployment metadata if needed
|
||||
|
||||
flow_run = await client.create_flow_run_from_deployment(**flow_run_params)
|
||||
|
||||
logger.info(
|
||||
f"Submitted workflow '{workflow_name}' with run_id: {flow_run.id}, "
|
||||
f"target: {target_path}, mode: {volume_mode}"
|
||||
)
|
||||
|
||||
return flow_run
|
||||
|
||||
async def get_flow_run_findings(self, run_id: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Retrieve findings from a completed flow run.
|
||||
|
||||
Args:
|
||||
run_id: The flow run ID
|
||||
|
||||
Returns:
|
||||
Dictionary containing SARIF-formatted findings
|
||||
|
||||
Raises:
|
||||
ValueError: If run not completed or not found
|
||||
"""
|
||||
async with get_client() as client:
|
||||
flow_run = await client.read_flow_run(run_id)
|
||||
|
||||
if not flow_run.state.is_completed():
|
||||
raise ValueError(
|
||||
f"Flow run {run_id} not completed. Current status: {flow_run.state.name}"
|
||||
)
|
||||
|
||||
# Get the findings from the flow run result
|
||||
try:
|
||||
findings = await flow_run.state.result()
|
||||
return findings
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to retrieve findings for run {run_id}: {e}")
|
||||
raise ValueError(f"Failed to retrieve findings: {e}")
|
||||
|
||||
async def get_flow_run_status(self, run_id: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get the current status of a flow run.
|
||||
|
||||
Args:
|
||||
run_id: The flow run ID
|
||||
|
||||
Returns:
|
||||
Dictionary with status information
|
||||
"""
|
||||
async with get_client() as client:
|
||||
flow_run = await client.read_flow_run(run_id)
|
||||
|
||||
return {
|
||||
"run_id": str(flow_run.id),
|
||||
"workflow": flow_run.deployment_id,
|
||||
"status": flow_run.state.name,
|
||||
"is_completed": flow_run.state.is_completed(),
|
||||
"is_failed": flow_run.state.is_failed(),
|
||||
"is_running": flow_run.state.is_running(),
|
||||
"created_at": flow_run.created,
|
||||
"updated_at": flow_run.updated
|
||||
}
|
||||
|
||||
def _validate_target_path(self, target_path: str) -> None:
|
||||
"""
|
||||
Validate target path for security before mounting as volume.
|
||||
|
||||
Args:
|
||||
target_path: Host path to validate
|
||||
|
||||
Raises:
|
||||
ValueError: If path is not allowed for security reasons
|
||||
"""
|
||||
target = Path(target_path)
|
||||
|
||||
# Path must be absolute
|
||||
if not target.is_absolute():
|
||||
raise ValueError(f"Target path must be absolute: {target_path}")
|
||||
|
||||
# Resolve path to handle symlinks and relative components
|
||||
try:
|
||||
resolved_path = target.resolve()
|
||||
except (OSError, RuntimeError) as e:
|
||||
raise ValueError(f"Cannot resolve target path: {target_path} - {e}")
|
||||
|
||||
resolved_str = str(resolved_path)
|
||||
|
||||
# Check against forbidden paths first (more restrictive)
|
||||
for forbidden in self.forbidden_paths:
|
||||
if resolved_str.startswith(forbidden):
|
||||
raise ValueError(
|
||||
f"Access denied: Path '{target_path}' resolves to forbidden directory '{forbidden}'. "
|
||||
f"This path contains sensitive system files and cannot be mounted."
|
||||
)
|
||||
|
||||
# Check if path starts with any allowed base path
|
||||
path_allowed = False
|
||||
for allowed in self.allowed_base_paths:
|
||||
if resolved_str.startswith(allowed):
|
||||
path_allowed = True
|
||||
break
|
||||
|
||||
if not path_allowed:
|
||||
allowed_list = ", ".join(self.allowed_base_paths)
|
||||
raise ValueError(
|
||||
f"Access denied: Path '{target_path}' is not in allowed directories. "
|
||||
f"Allowed base paths: {allowed_list}"
|
||||
)
|
||||
|
||||
# Additional security checks
|
||||
if resolved_str == "/":
|
||||
raise ValueError("Cannot mount root filesystem")
|
||||
|
||||
# Warn if path doesn't exist (but don't block - it might be created later)
|
||||
if not resolved_path.exists():
|
||||
logger.warning(f"Target path does not exist: {target_path}")
|
||||
|
||||
logger.info(f"Path validation passed for: {target_path} -> {resolved_str}")
|
||||
402
backend/src/core/setup.py
Normal file
402
backend/src/core/setup.py
Normal file
@@ -0,0 +1,402 @@
|
||||
"""
|
||||
Setup utilities for Prefect infrastructure
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
from prefect import get_client
|
||||
from prefect.client.schemas.actions import WorkPoolCreate
|
||||
from prefect.client.schemas.objects import WorkPool
|
||||
from .prefect_manager import get_registry_url
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def setup_docker_pool():
|
||||
"""
|
||||
Create or update the Docker work pool for container execution.
|
||||
|
||||
This work pool is configured to:
|
||||
- Connect to the local Docker daemon
|
||||
- Support volume mounting at runtime
|
||||
- Clean up containers after execution
|
||||
- Use bridge networking by default
|
||||
"""
|
||||
import os
|
||||
|
||||
async with get_client() as client:
|
||||
pool_name = "docker-pool"
|
||||
|
||||
# Add force recreation flag for debugging fresh install issues
|
||||
force_recreate = os.getenv('FORCE_RECREATE_WORK_POOL', 'false').lower() == 'true'
|
||||
debug_setup = os.getenv('DEBUG_WORK_POOL_SETUP', 'false').lower() == 'true'
|
||||
|
||||
if force_recreate:
|
||||
logger.warning(f"FORCE_RECREATE_WORK_POOL=true - Will recreate work pool regardless of existing configuration")
|
||||
if debug_setup:
|
||||
logger.warning(f"DEBUG_WORK_POOL_SETUP=true - Enhanced logging enabled")
|
||||
# Temporarily set logging level to DEBUG for this function
|
||||
original_level = logger.level
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
try:
|
||||
# Check if pool already exists and supports custom images
|
||||
existing_pools = await client.read_work_pools()
|
||||
existing_pool = None
|
||||
for pool in existing_pools:
|
||||
if pool.name == pool_name:
|
||||
existing_pool = pool
|
||||
break
|
||||
|
||||
if existing_pool and not force_recreate:
|
||||
logger.info(f"Found existing work pool '{pool_name}' - validating configuration...")
|
||||
|
||||
# Check if the existing pool has the correct configuration
|
||||
base_template = existing_pool.base_job_template or {}
|
||||
logger.debug(f"Base template keys: {list(base_template.keys())}")
|
||||
|
||||
job_config = base_template.get("job_configuration", {})
|
||||
logger.debug(f"Job config keys: {list(job_config.keys())}")
|
||||
|
||||
image_config = job_config.get("image", "")
|
||||
has_image_variable = "{{ image }}" in str(image_config)
|
||||
logger.debug(f"Image config: '{image_config}' -> has_image_variable: {has_image_variable}")
|
||||
|
||||
# Check if volume defaults include toolbox mount
|
||||
variables = base_template.get("variables", {})
|
||||
properties = variables.get("properties", {})
|
||||
volume_config = properties.get("volumes", {})
|
||||
volume_defaults = volume_config.get("default", [])
|
||||
has_toolbox_volume = any("toolbox_code" in str(vol) for vol in volume_defaults) if volume_defaults else False
|
||||
logger.debug(f"Volume defaults: {volume_defaults}")
|
||||
logger.debug(f"Has toolbox volume: {has_toolbox_volume}")
|
||||
|
||||
# Check if environment defaults include required settings
|
||||
env_config = properties.get("env", {})
|
||||
env_defaults = env_config.get("default", {})
|
||||
has_api_url = "PREFECT_API_URL" in env_defaults
|
||||
has_storage_path = "PREFECT_LOCAL_STORAGE_PATH" in env_defaults
|
||||
has_results_persist = "PREFECT_RESULTS_PERSIST_BY_DEFAULT" in env_defaults
|
||||
has_required_env = has_api_url and has_storage_path and has_results_persist
|
||||
logger.debug(f"Environment defaults: {env_defaults}")
|
||||
logger.debug(f"Has API URL: {has_api_url}, Has storage path: {has_storage_path}, Has results persist: {has_results_persist}")
|
||||
logger.debug(f"Has required env: {has_required_env}")
|
||||
|
||||
# Log the full validation result
|
||||
logger.info(f"Work pool validation - Image: {has_image_variable}, Toolbox: {has_toolbox_volume}, Environment: {has_required_env}")
|
||||
|
||||
if has_image_variable and has_toolbox_volume and has_required_env:
|
||||
logger.info(f"Docker work pool '{pool_name}' already exists with correct configuration")
|
||||
return
|
||||
else:
|
||||
reasons = []
|
||||
if not has_image_variable:
|
||||
reasons.append("missing image template")
|
||||
if not has_toolbox_volume:
|
||||
reasons.append("missing toolbox volume mount")
|
||||
if not has_required_env:
|
||||
if not has_api_url:
|
||||
reasons.append("missing PREFECT_API_URL")
|
||||
if not has_storage_path:
|
||||
reasons.append("missing PREFECT_LOCAL_STORAGE_PATH")
|
||||
if not has_results_persist:
|
||||
reasons.append("missing PREFECT_RESULTS_PERSIST_BY_DEFAULT")
|
||||
|
||||
logger.warning(f"Docker work pool '{pool_name}' exists but lacks: {', '.join(reasons)}. Recreating...")
|
||||
# Delete the old pool and recreate it
|
||||
try:
|
||||
await client.delete_work_pool(pool_name)
|
||||
logger.info(f"Deleted old work pool '{pool_name}'")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete old work pool: {e}")
|
||||
elif force_recreate and existing_pool:
|
||||
logger.warning(f"Force recreation enabled - deleting existing work pool '{pool_name}'")
|
||||
try:
|
||||
await client.delete_work_pool(pool_name)
|
||||
logger.info(f"Deleted existing work pool for force recreation")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete work pool for force recreation: {e}")
|
||||
|
||||
logger.info(f"Creating Docker work pool '{pool_name}' with custom image support...")
|
||||
|
||||
# Create the work pool with proper Docker configuration
|
||||
work_pool = WorkPoolCreate(
|
||||
name=pool_name,
|
||||
type="docker",
|
||||
description="Docker work pool for FuzzForge workflows with custom image support",
|
||||
base_job_template={
|
||||
"job_configuration": {
|
||||
"image": "{{ image }}", # Template variable for custom images
|
||||
"volumes": "{{ volumes }}", # List of volume mounts
|
||||
"env": "{{ env }}", # Environment variables
|
||||
"networks": "{{ networks }}", # Docker networks
|
||||
"stream_output": True,
|
||||
"auto_remove": True,
|
||||
"privileged": False,
|
||||
"network_mode": None, # Use networks instead
|
||||
"labels": {},
|
||||
"command": None # Let the image's CMD/ENTRYPOINT run
|
||||
},
|
||||
"variables": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"image": {
|
||||
"type": "string",
|
||||
"title": "Docker Image",
|
||||
"default": "prefecthq/prefect:3-python3.11",
|
||||
"description": "Docker image for the flow run"
|
||||
},
|
||||
"volumes": {
|
||||
"type": "array",
|
||||
"title": "Volume Mounts",
|
||||
"default": [
|
||||
"fuzzforge_prefect_storage:/prefect-storage",
|
||||
"fuzzforge_toolbox_code:/opt/prefect/toolbox:ro"
|
||||
],
|
||||
"description": "Volume mounts in format 'host:container:mode'",
|
||||
"items": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"networks": {
|
||||
"type": "array",
|
||||
"title": "Docker Networks",
|
||||
"default": ["fuzzforge_default"],
|
||||
"description": "Docker networks to connect container to",
|
||||
"items": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"env": {
|
||||
"type": "object",
|
||||
"title": "Environment Variables",
|
||||
"default": {
|
||||
"PREFECT_API_URL": "http://prefect-server:4200/api",
|
||||
"PREFECT_LOCAL_STORAGE_PATH": "/prefect-storage",
|
||||
"PREFECT_RESULTS_PERSIST_BY_DEFAULT": "true"
|
||||
},
|
||||
"description": "Environment variables for the container",
|
||||
"additionalProperties": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
await client.create_work_pool(work_pool)
|
||||
logger.info(f"Created Docker work pool '{pool_name}'")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to setup Docker work pool: {e}")
|
||||
raise
|
||||
finally:
|
||||
# Restore original logging level if debug mode was enabled
|
||||
if debug_setup and 'original_level' in locals():
|
||||
logger.setLevel(original_level)
|
||||
|
||||
|
||||
def get_actual_compose_project_name():
|
||||
"""
|
||||
Return the hardcoded compose project name for FuzzForge.
|
||||
|
||||
Always returns 'fuzzforge' as per system requirements.
|
||||
"""
|
||||
logger.info("Using hardcoded compose project name: fuzzforge")
|
||||
return "fuzzforge"
|
||||
|
||||
|
||||
async def setup_result_storage():
|
||||
"""
|
||||
Create or update Prefect result storage block for findings persistence.
|
||||
|
||||
This sets up a LocalFileSystem storage block pointing to the shared
|
||||
/prefect-storage volume for result persistence.
|
||||
"""
|
||||
from prefect.filesystems import LocalFileSystem
|
||||
|
||||
storage_name = "fuzzforge-results"
|
||||
|
||||
try:
|
||||
# Create the storage block, overwrite if it exists
|
||||
logger.info(f"Setting up storage block '{storage_name}'...")
|
||||
storage = LocalFileSystem(basepath="/prefect-storage")
|
||||
|
||||
block_doc_id = await storage.save(name=storage_name, overwrite=True)
|
||||
logger.info(f"Storage block '{storage_name}' configured successfully")
|
||||
return str(block_doc_id)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to setup result storage: {e}")
|
||||
# Don't raise the exception - continue without storage block
|
||||
logger.warning("Continuing without result storage block - findings may not persist")
|
||||
return None
|
||||
|
||||
|
||||
async def validate_docker_connection():
|
||||
"""
|
||||
Validate that Docker is accessible and running.
|
||||
|
||||
Note: In containerized deployments with Docker socket proxy,
|
||||
the backend doesn't need direct Docker access.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If Docker is not accessible
|
||||
"""
|
||||
import os
|
||||
|
||||
# Skip Docker validation if running in container without socket access
|
||||
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
|
||||
logger.info("Running in container without Docker socket - skipping Docker validation")
|
||||
return
|
||||
|
||||
try:
|
||||
import docker
|
||||
client = docker.from_env()
|
||||
client.ping()
|
||||
logger.info("Docker connection validated")
|
||||
except Exception as e:
|
||||
logger.error(f"Docker is not accessible: {e}")
|
||||
raise RuntimeError(
|
||||
"Docker is not running or not accessible. "
|
||||
"Please ensure Docker is installed and running."
|
||||
)
|
||||
|
||||
|
||||
async def validate_registry_connectivity(registry_url: str = None):
|
||||
"""
|
||||
Validate that the Docker registry is accessible.
|
||||
|
||||
Args:
|
||||
registry_url: URL of the Docker registry to validate (auto-detected if None)
|
||||
|
||||
Raises:
|
||||
RuntimeError: If registry is not accessible
|
||||
"""
|
||||
# Resolve a reachable test URL from within this process
|
||||
if registry_url is None:
|
||||
# If not specified, prefer internal service name in containers, host port on host
|
||||
import os
|
||||
if os.path.exists('/.dockerenv'):
|
||||
registry_url = "registry:5000"
|
||||
else:
|
||||
registry_url = "localhost:5001"
|
||||
|
||||
# If we're running inside a container and asked to probe localhost:PORT,
|
||||
# the probe would hit the container, not the host. Use host.docker.internal instead.
|
||||
import os
|
||||
try:
|
||||
host_part, port_part = registry_url.split(":", 1)
|
||||
except ValueError:
|
||||
host_part, port_part = registry_url, "80"
|
||||
|
||||
if os.path.exists('/.dockerenv') and host_part in ("localhost", "127.0.0.1"):
|
||||
test_host = "host.docker.internal"
|
||||
else:
|
||||
test_host = host_part
|
||||
test_url = f"http://{test_host}:{port_part}/v2/"
|
||||
|
||||
import aiohttp
|
||||
import asyncio
|
||||
|
||||
logger.info(f"Validating registry connectivity to {registry_url}...")
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session:
|
||||
async with session.get(test_url) as response:
|
||||
if response.status == 200:
|
||||
logger.info(f"Registry at {registry_url} is accessible (tested via {test_host})")
|
||||
return
|
||||
else:
|
||||
raise RuntimeError(f"Registry returned status {response.status}")
|
||||
except asyncio.TimeoutError:
|
||||
raise RuntimeError(f"Registry at {registry_url} is not responding (timeout)")
|
||||
except aiohttp.ClientError as e:
|
||||
raise RuntimeError(f"Registry at {registry_url} is not accessible: {e}")
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Failed to validate registry connectivity: {e}")
|
||||
|
||||
|
||||
async def validate_docker_network(network_name: str):
|
||||
"""
|
||||
Validate that the specified Docker network exists.
|
||||
|
||||
Args:
|
||||
network_name: Name of the Docker network to validate
|
||||
|
||||
Raises:
|
||||
RuntimeError: If network doesn't exist
|
||||
"""
|
||||
import os
|
||||
|
||||
# Skip network validation if running in container without Docker socket
|
||||
if os.path.exists("/.dockerenv") and not os.path.exists("/var/run/docker.sock"):
|
||||
logger.info("Running in container without Docker socket - skipping network validation")
|
||||
return
|
||||
|
||||
try:
|
||||
import docker
|
||||
client = docker.from_env()
|
||||
|
||||
# List all networks
|
||||
networks = client.networks.list(names=[network_name])
|
||||
|
||||
if not networks:
|
||||
# Try to find networks with similar names
|
||||
all_networks = client.networks.list()
|
||||
similar_networks = [n.name for n in all_networks if "fuzzforge" in n.name.lower()]
|
||||
|
||||
error_msg = f"Docker network '{network_name}' not found."
|
||||
if similar_networks:
|
||||
error_msg += f" Available networks: {similar_networks}"
|
||||
else:
|
||||
error_msg += " Please ensure Docker Compose is running."
|
||||
|
||||
raise RuntimeError(error_msg)
|
||||
|
||||
logger.info(f"Docker network '{network_name}' validated")
|
||||
|
||||
except Exception as e:
|
||||
if isinstance(e, RuntimeError):
|
||||
raise
|
||||
logger.error(f"Network validation failed: {e}")
|
||||
raise RuntimeError(f"Failed to validate Docker network: {e}")
|
||||
|
||||
|
||||
async def validate_infrastructure():
|
||||
"""
|
||||
Validate all required infrastructure components.
|
||||
|
||||
This should be called during startup to ensure everything is ready.
|
||||
"""
|
||||
logger.info("Validating infrastructure...")
|
||||
|
||||
# Validate Docker connection
|
||||
await validate_docker_connection()
|
||||
|
||||
# Validate registry connectivity for custom image building
|
||||
await validate_registry_connectivity()
|
||||
|
||||
# Validate network (hardcoded to avoid directory name dependencies)
|
||||
import os
|
||||
compose_project = "fuzzforge"
|
||||
docker_network = "fuzzforge_default"
|
||||
|
||||
try:
|
||||
await validate_docker_network(docker_network)
|
||||
except RuntimeError as e:
|
||||
logger.warning(f"Network validation failed: {e}")
|
||||
logger.warning("Workflows may not be able to connect to Prefect services")
|
||||
|
||||
logger.info("Infrastructure validation completed")
|
||||
459
backend/src/core/workflow_discovery.py
Normal file
459
backend/src/core/workflow_discovery.py
Normal file
@@ -0,0 +1,459 @@
|
||||
"""
|
||||
Workflow Discovery - Registry-based discovery and loading of workflows
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from typing import Dict, Optional, Any, Callable
|
||||
from pydantic import BaseModel, Field, ConfigDict
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class WorkflowInfo(BaseModel):
|
||||
"""Information about a discovered workflow"""
|
||||
name: str = Field(..., description="Workflow name")
|
||||
path: Path = Field(..., description="Path to workflow directory")
|
||||
workflow_file: Path = Field(..., description="Path to workflow.py file")
|
||||
dockerfile: Path = Field(..., description="Path to Dockerfile")
|
||||
has_docker: bool = Field(..., description="Whether workflow has custom Dockerfile")
|
||||
metadata: Dict[str, Any] = Field(..., description="Workflow metadata from YAML")
|
||||
flow_function_name: str = Field(default="main_flow", description="Name of the flow function")
|
||||
|
||||
model_config = ConfigDict(arbitrary_types_allowed=True)
|
||||
|
||||
|
||||
class WorkflowDiscovery:
|
||||
"""
|
||||
Discovers workflows from the filesystem and validates them against the registry.
|
||||
|
||||
This system:
|
||||
1. Scans for workflows with metadata.yaml files
|
||||
2. Cross-references them with the manual registry
|
||||
3. Provides registry-based flow functions for deployment
|
||||
|
||||
Workflows must have:
|
||||
- workflow.py: Contains the Prefect flow
|
||||
- metadata.yaml: Mandatory metadata file
|
||||
- Entry in toolbox/workflows/registry.py: Manual registration
|
||||
- Dockerfile (optional): Custom container definition
|
||||
- requirements.txt (optional): Python dependencies
|
||||
"""
|
||||
|
||||
def __init__(self, workflows_dir: Path):
|
||||
"""
|
||||
Initialize workflow discovery.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to the workflows directory
|
||||
"""
|
||||
self.workflows_dir = workflows_dir
|
||||
if not self.workflows_dir.exists():
|
||||
self.workflows_dir.mkdir(parents=True, exist_ok=True)
|
||||
logger.info(f"Created workflows directory: {self.workflows_dir}")
|
||||
|
||||
# Import registry - this validates it on import
|
||||
try:
|
||||
from toolbox.workflows.registry import WORKFLOW_REGISTRY, list_registered_workflows
|
||||
self.registry = WORKFLOW_REGISTRY
|
||||
logger.info(f"Loaded workflow registry with {len(self.registry)} registered workflows")
|
||||
except ImportError as e:
|
||||
logger.error(f"Failed to import workflow registry: {e}")
|
||||
self.registry = {}
|
||||
except Exception as e:
|
||||
logger.error(f"Registry validation failed: {e}")
|
||||
self.registry = {}
|
||||
|
||||
# Cache for discovered workflows
|
||||
self._workflow_cache: Optional[Dict[str, WorkflowInfo]] = None
|
||||
self._cache_timestamp: Optional[float] = None
|
||||
self._cache_ttl = 60.0 # Cache TTL in seconds
|
||||
|
||||
async def discover_workflows(self) -> Dict[str, WorkflowInfo]:
|
||||
"""
|
||||
Discover workflows by cross-referencing filesystem with registry.
|
||||
Uses caching to avoid frequent filesystem scans.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping workflow names to their information
|
||||
"""
|
||||
# Check cache validity
|
||||
import time
|
||||
current_time = time.time()
|
||||
|
||||
if (self._workflow_cache is not None and
|
||||
self._cache_timestamp is not None and
|
||||
(current_time - self._cache_timestamp) < self._cache_ttl):
|
||||
# Return cached results
|
||||
logger.debug(f"Returning cached workflow discovery ({len(self._workflow_cache)} workflows)")
|
||||
return self._workflow_cache
|
||||
workflows = {}
|
||||
discovered_dirs = set()
|
||||
registry_names = set(self.registry.keys())
|
||||
|
||||
if not self.workflows_dir.exists():
|
||||
logger.warning(f"Workflows directory does not exist: {self.workflows_dir}")
|
||||
return workflows
|
||||
|
||||
# Recursively scan all directories and subdirectories
|
||||
await self._scan_directory_recursive(self.workflows_dir, workflows, discovered_dirs)
|
||||
|
||||
# Check for registry entries without corresponding directories
|
||||
missing_dirs = registry_names - discovered_dirs
|
||||
if missing_dirs:
|
||||
logger.warning(
|
||||
f"Registry contains workflows without filesystem directories: {missing_dirs}. "
|
||||
f"These workflows cannot be deployed."
|
||||
)
|
||||
|
||||
logger.info(
|
||||
f"Discovery complete: {len(workflows)} workflows ready for deployment, "
|
||||
f"{len(missing_dirs)} registry entries missing directories, "
|
||||
f"{len(discovered_dirs - registry_names)} filesystem workflows not registered"
|
||||
)
|
||||
|
||||
# Update cache
|
||||
self._workflow_cache = workflows
|
||||
self._cache_timestamp = current_time
|
||||
|
||||
return workflows
|
||||
|
||||
async def _scan_directory_recursive(self, directory: Path, workflows: Dict[str, WorkflowInfo], discovered_dirs: set):
|
||||
"""
|
||||
Recursively scan directory for workflows.
|
||||
|
||||
Args:
|
||||
directory: Directory to scan
|
||||
workflows: Dictionary to populate with discovered workflows
|
||||
discovered_dirs: Set to track discovered workflow names
|
||||
"""
|
||||
for item in directory.iterdir():
|
||||
if not item.is_dir():
|
||||
continue
|
||||
|
||||
if item.name.startswith('_') or item.name.startswith('.'):
|
||||
continue # Skip hidden or private directories
|
||||
|
||||
# Check if this directory contains workflow files (workflow.py and metadata.yaml)
|
||||
workflow_file = item / "workflow.py"
|
||||
metadata_file = item / "metadata.yaml"
|
||||
|
||||
if workflow_file.exists() and metadata_file.exists():
|
||||
# This is a workflow directory
|
||||
workflow_name = item.name
|
||||
discovered_dirs.add(workflow_name)
|
||||
|
||||
# Only process workflows that are in the registry
|
||||
if workflow_name not in self.registry:
|
||||
logger.warning(
|
||||
f"Workflow '{workflow_name}' found in filesystem but not in registry. "
|
||||
f"Add it to toolbox/workflows/registry.py to enable deployment."
|
||||
)
|
||||
continue
|
||||
|
||||
try:
|
||||
workflow_info = await self._load_workflow(item)
|
||||
if workflow_info:
|
||||
workflows[workflow_info.name] = workflow_info
|
||||
logger.info(f"Discovered and registered workflow: {workflow_info.name}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load workflow from {item}: {e}")
|
||||
else:
|
||||
# This is a category directory, recurse into it
|
||||
await self._scan_directory_recursive(item, workflows, discovered_dirs)
|
||||
|
||||
async def _load_workflow(self, workflow_dir: Path) -> Optional[WorkflowInfo]:
|
||||
"""
|
||||
Load and validate a single workflow.
|
||||
|
||||
Args:
|
||||
workflow_dir: Path to the workflow directory
|
||||
|
||||
Returns:
|
||||
WorkflowInfo if valid, None otherwise
|
||||
"""
|
||||
workflow_name = workflow_dir.name
|
||||
|
||||
# Check for mandatory files
|
||||
workflow_file = workflow_dir / "workflow.py"
|
||||
metadata_file = workflow_dir / "metadata.yaml"
|
||||
|
||||
if not workflow_file.exists():
|
||||
logger.warning(f"Workflow {workflow_name} missing workflow.py")
|
||||
return None
|
||||
|
||||
if not metadata_file.exists():
|
||||
logger.error(f"Workflow {workflow_name} missing mandatory metadata.yaml")
|
||||
return None
|
||||
|
||||
# Load and validate metadata
|
||||
try:
|
||||
metadata = self._load_metadata(metadata_file)
|
||||
if not self._validate_metadata(metadata, workflow_name):
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load metadata for {workflow_name}: {e}")
|
||||
return None
|
||||
|
||||
# Check for mandatory Dockerfile
|
||||
dockerfile = workflow_dir / "Dockerfile"
|
||||
if not dockerfile.exists():
|
||||
logger.error(f"Workflow {workflow_name} missing mandatory Dockerfile")
|
||||
return None
|
||||
|
||||
has_docker = True # Always True since Dockerfile is mandatory
|
||||
|
||||
# Get flow function name from metadata or use default
|
||||
flow_function_name = metadata.get("flow_function", "main_flow")
|
||||
|
||||
return WorkflowInfo(
|
||||
name=workflow_name,
|
||||
path=workflow_dir,
|
||||
workflow_file=workflow_file,
|
||||
dockerfile=dockerfile,
|
||||
has_docker=has_docker,
|
||||
metadata=metadata,
|
||||
flow_function_name=flow_function_name
|
||||
)
|
||||
|
||||
def _load_metadata(self, metadata_file: Path) -> Dict[str, Any]:
|
||||
"""
|
||||
Load metadata from YAML file.
|
||||
|
||||
Args:
|
||||
metadata_file: Path to metadata.yaml
|
||||
|
||||
Returns:
|
||||
Dictionary containing metadata
|
||||
"""
|
||||
with open(metadata_file, 'r') as f:
|
||||
metadata = yaml.safe_load(f)
|
||||
|
||||
if metadata is None:
|
||||
raise ValueError("Empty metadata file")
|
||||
|
||||
return metadata
|
||||
|
||||
def _validate_metadata(self, metadata: Dict[str, Any], workflow_name: str) -> bool:
|
||||
"""
|
||||
Validate that metadata contains all required fields.
|
||||
|
||||
Args:
|
||||
metadata: Metadata dictionary
|
||||
workflow_name: Name of the workflow for logging
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
"""
|
||||
required_fields = ["name", "version", "description", "author", "category", "parameters", "requirements"]
|
||||
|
||||
missing_fields = []
|
||||
for field in required_fields:
|
||||
if field not in metadata:
|
||||
missing_fields.append(field)
|
||||
|
||||
if missing_fields:
|
||||
logger.error(
|
||||
f"Workflow {workflow_name} metadata missing required fields: {missing_fields}"
|
||||
)
|
||||
return False
|
||||
|
||||
# Validate version format (semantic versioning)
|
||||
version = metadata.get("version", "")
|
||||
if not self._is_valid_version(version):
|
||||
logger.error(f"Workflow {workflow_name} has invalid version format: {version}")
|
||||
return False
|
||||
|
||||
# Validate parameters structure
|
||||
parameters = metadata.get("parameters", {})
|
||||
if not isinstance(parameters, dict):
|
||||
logger.error(f"Workflow {workflow_name} parameters must be a dictionary")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _is_valid_version(self, version: str) -> bool:
|
||||
"""
|
||||
Check if version follows semantic versioning (x.y.z).
|
||||
|
||||
Args:
|
||||
version: Version string
|
||||
|
||||
Returns:
|
||||
True if valid semantic version
|
||||
"""
|
||||
try:
|
||||
parts = version.split('.')
|
||||
if len(parts) != 3:
|
||||
return False
|
||||
for part in parts:
|
||||
int(part) # Check if each part is a number
|
||||
return True
|
||||
except (ValueError, AttributeError):
|
||||
return False
|
||||
|
||||
def invalidate_cache(self) -> None:
|
||||
"""
|
||||
Invalidate the workflow discovery cache.
|
||||
Useful when workflows are added or modified.
|
||||
"""
|
||||
self._workflow_cache = None
|
||||
self._cache_timestamp = None
|
||||
logger.debug("Workflow discovery cache invalidated")
|
||||
|
||||
def get_flow_function(self, workflow_name: str) -> Optional[Callable]:
|
||||
"""
|
||||
Get the flow function from the registry.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
The flow function if found in registry, None otherwise
|
||||
"""
|
||||
if workflow_name not in self.registry:
|
||||
logger.error(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {list(self.registry.keys())}"
|
||||
)
|
||||
return None
|
||||
|
||||
try:
|
||||
from toolbox.workflows.registry import get_workflow_flow
|
||||
flow_func = get_workflow_flow(workflow_name)
|
||||
logger.debug(f"Retrieved flow function for '{workflow_name}' from registry")
|
||||
return flow_func
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get flow function for '{workflow_name}': {e}")
|
||||
return None
|
||||
|
||||
def get_registry_info(self, workflow_name: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Get registry information for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Registry information if found, None otherwise
|
||||
"""
|
||||
if workflow_name not in self.registry:
|
||||
return None
|
||||
|
||||
try:
|
||||
from toolbox.workflows.registry import get_workflow_info
|
||||
return get_workflow_info(workflow_name)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get registry info for '{workflow_name}': {e}")
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def get_metadata_schema() -> Dict[str, Any]:
|
||||
"""
|
||||
Get the JSON schema for workflow metadata.
|
||||
|
||||
Returns:
|
||||
JSON schema dictionary
|
||||
"""
|
||||
return {
|
||||
"type": "object",
|
||||
"required": ["name", "version", "description", "author", "category", "parameters", "requirements"],
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "Workflow name"
|
||||
},
|
||||
"version": {
|
||||
"type": "string",
|
||||
"pattern": "^\\d+\\.\\d+\\.\\d+$",
|
||||
"description": "Semantic version (x.y.z)"
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"description": "Workflow description"
|
||||
},
|
||||
"author": {
|
||||
"type": "string",
|
||||
"description": "Workflow author"
|
||||
},
|
||||
"category": {
|
||||
"type": "string",
|
||||
"enum": ["comprehensive", "specialized", "fuzzing", "focused"],
|
||||
"description": "Workflow category"
|
||||
},
|
||||
"tags": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Workflow tags for categorization"
|
||||
},
|
||||
"requirements": {
|
||||
"type": "object",
|
||||
"required": ["tools", "resources"],
|
||||
"properties": {
|
||||
"tools": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Required security tools"
|
||||
},
|
||||
"resources": {
|
||||
"type": "object",
|
||||
"required": ["memory", "cpu", "timeout"],
|
||||
"properties": {
|
||||
"memory": {
|
||||
"type": "string",
|
||||
"pattern": "^\\d+[GMK]i$",
|
||||
"description": "Memory limit (e.g., 1Gi, 512Mi)"
|
||||
},
|
||||
"cpu": {
|
||||
"type": "string",
|
||||
"pattern": "^\\d+m?$",
|
||||
"description": "CPU limit (e.g., 1000m, 2)"
|
||||
},
|
||||
"timeout": {
|
||||
"type": "integer",
|
||||
"minimum": 60,
|
||||
"maximum": 7200,
|
||||
"description": "Workflow timeout in seconds"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"description": "Workflow parameters schema"
|
||||
},
|
||||
"default_parameters": {
|
||||
"type": "object",
|
||||
"description": "Default parameter values"
|
||||
},
|
||||
"required_modules": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Required module names"
|
||||
},
|
||||
"supported_volume_modes": {
|
||||
"type": "array",
|
||||
"items": {"enum": ["ro", "rw"]},
|
||||
"default": ["ro", "rw"],
|
||||
"description": "Supported volume mount modes"
|
||||
},
|
||||
"flow_function": {
|
||||
"type": "string",
|
||||
"default": "main_flow",
|
||||
"description": "Name of the flow function in workflow.py"
|
||||
}
|
||||
}
|
||||
}
|
||||
864
backend/src/main.py
Normal file
864
backend/src/main.py
Normal file
@@ -0,0 +1,864 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from uuid import UUID
|
||||
from contextlib import AsyncExitStack, asynccontextmanager, suppress
|
||||
from typing import Any, Dict, Optional, List
|
||||
|
||||
import uvicorn
|
||||
from fastapi import FastAPI
|
||||
from starlette.applications import Starlette
|
||||
from starlette.routing import Mount
|
||||
|
||||
from fastmcp.server.http import create_sse_app
|
||||
|
||||
from src.core.prefect_manager import PrefectManager
|
||||
from src.core.setup import setup_docker_pool, setup_result_storage, validate_infrastructure
|
||||
from src.core.workflow_discovery import WorkflowDiscovery
|
||||
from src.api import workflows, runs, fuzzing
|
||||
from src.services.prefect_stats_monitor import prefect_stats_monitor
|
||||
|
||||
from fastmcp import FastMCP
|
||||
from prefect.client.orchestration import get_client
|
||||
from prefect.client.schemas.filters import (
|
||||
FlowRunFilter,
|
||||
FlowRunFilterDeploymentId,
|
||||
FlowRunFilterState,
|
||||
FlowRunFilterStateType,
|
||||
)
|
||||
from prefect.client.schemas.sorting import FlowRunSort
|
||||
from prefect.states import StateType
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
prefect_mgr = PrefectManager()
|
||||
|
||||
|
||||
class PrefectBootstrapState:
|
||||
"""Tracks Prefect initialization progress for API and MCP consumers."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.ready: bool = False
|
||||
self.status: str = "not_started"
|
||||
self.last_error: Optional[str] = None
|
||||
self.task_running: bool = False
|
||||
|
||||
def as_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"ready": self.ready,
|
||||
"status": self.status,
|
||||
"last_error": self.last_error,
|
||||
"task_running": self.task_running,
|
||||
}
|
||||
|
||||
|
||||
prefect_bootstrap_state = PrefectBootstrapState()
|
||||
|
||||
# Configure retry strategy for bootstrapping Prefect + infrastructure
|
||||
STARTUP_RETRY_SECONDS = max(1, int(os.getenv("FUZZFORGE_STARTUP_RETRY_SECONDS", "5")))
|
||||
STARTUP_RETRY_MAX_SECONDS = max(
|
||||
STARTUP_RETRY_SECONDS,
|
||||
int(os.getenv("FUZZFORGE_STARTUP_RETRY_MAX_SECONDS", "60")),
|
||||
)
|
||||
|
||||
prefect_bootstrap_task: Optional[asyncio.Task] = None
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# FastAPI application (REST API remains unchanged)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
app = FastAPI(
|
||||
title="FuzzForge API",
|
||||
description="Security testing workflow orchestration API with fuzzing support",
|
||||
version="0.6.0",
|
||||
)
|
||||
|
||||
app.include_router(workflows.router)
|
||||
app.include_router(runs.router)
|
||||
app.include_router(fuzzing.router)
|
||||
|
||||
|
||||
def get_prefect_status() -> Dict[str, Any]:
|
||||
"""Return a snapshot of Prefect bootstrap state for diagnostics."""
|
||||
status = prefect_bootstrap_state.as_dict()
|
||||
status["workflows_loaded"] = len(prefect_mgr.workflows)
|
||||
status["deployments_tracked"] = len(prefect_mgr.deployments)
|
||||
status["bootstrap_task_running"] = (
|
||||
prefect_bootstrap_task is not None and not prefect_bootstrap_task.done()
|
||||
)
|
||||
return status
|
||||
|
||||
|
||||
def _prefect_not_ready_status() -> Optional[Dict[str, Any]]:
|
||||
"""Return status details if Prefect is not ready yet."""
|
||||
status = get_prefect_status()
|
||||
if status.get("ready"):
|
||||
return None
|
||||
return status
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def root() -> Dict[str, Any]:
|
||||
status = get_prefect_status()
|
||||
return {
|
||||
"name": "FuzzForge API",
|
||||
"version": "0.6.0",
|
||||
"status": "ready" if status.get("ready") else "initializing",
|
||||
"workflows_loaded": status.get("workflows_loaded", 0),
|
||||
"prefect": status,
|
||||
}
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health() -> Dict[str, str]:
|
||||
status = get_prefect_status()
|
||||
health_status = "healthy" if status.get("ready") else "initializing"
|
||||
return {"status": health_status}
|
||||
|
||||
|
||||
# Map FastAPI OpenAPI operationIds to readable MCP tool names
|
||||
FASTAPI_MCP_NAME_OVERRIDES: Dict[str, str] = {
|
||||
"list_workflows_workflows__get": "api_list_workflows",
|
||||
"get_metadata_schema_workflows_metadata_schema_get": "api_get_metadata_schema",
|
||||
"get_workflow_metadata_workflows__workflow_name__metadata_get": "api_get_workflow_metadata",
|
||||
"submit_workflow_workflows__workflow_name__submit_post": "api_submit_workflow",
|
||||
"get_workflow_parameters_workflows__workflow_name__parameters_get": "api_get_workflow_parameters",
|
||||
"get_run_status_runs__run_id__status_get": "api_get_run_status",
|
||||
"get_run_findings_runs__run_id__findings_get": "api_get_run_findings",
|
||||
"get_workflow_findings_runs__workflow_name__findings__run_id__get": "api_get_workflow_findings",
|
||||
"get_fuzzing_stats_fuzzing__run_id__stats_get": "api_get_fuzzing_stats",
|
||||
"update_fuzzing_stats_fuzzing__run_id__stats_post": "api_update_fuzzing_stats",
|
||||
"get_crash_reports_fuzzing__run_id__crashes_get": "api_get_crash_reports",
|
||||
"report_crash_fuzzing__run_id__crash_post": "api_report_crash",
|
||||
"stream_fuzzing_updates_fuzzing__run_id__stream_get": "api_stream_fuzzing_updates",
|
||||
"cleanup_fuzzing_run_fuzzing__run_id__delete": "api_cleanup_fuzzing_run",
|
||||
"root__get": "api_root",
|
||||
"health_health_get": "api_health",
|
||||
}
|
||||
|
||||
|
||||
# Create an MCP adapter exposing all FastAPI endpoints via OpenAPI parsing
|
||||
FASTAPI_MCP_ADAPTER = FastMCP.from_fastapi(
|
||||
app,
|
||||
name="FuzzForge FastAPI",
|
||||
mcp_names=FASTAPI_MCP_NAME_OVERRIDES,
|
||||
)
|
||||
_fastapi_mcp_imported = False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# FastMCP server (runs on dedicated port outside FastAPI)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
mcp = FastMCP(name="FuzzForge MCP")
|
||||
|
||||
|
||||
async def _bootstrap_prefect_with_retries() -> None:
|
||||
"""Initialize Prefect infrastructure with exponential backoff retries."""
|
||||
|
||||
attempt = 0
|
||||
|
||||
while True:
|
||||
attempt += 1
|
||||
prefect_bootstrap_state.task_running = True
|
||||
prefect_bootstrap_state.status = "starting"
|
||||
prefect_bootstrap_state.ready = False
|
||||
prefect_bootstrap_state.last_error = None
|
||||
|
||||
try:
|
||||
logger.info("Bootstrapping Prefect infrastructure...")
|
||||
await validate_infrastructure()
|
||||
await setup_docker_pool()
|
||||
await setup_result_storage()
|
||||
await prefect_mgr.initialize()
|
||||
await prefect_stats_monitor.start_monitoring()
|
||||
|
||||
prefect_bootstrap_state.ready = True
|
||||
prefect_bootstrap_state.status = "ready"
|
||||
prefect_bootstrap_state.task_running = False
|
||||
logger.info("Prefect infrastructure ready")
|
||||
return
|
||||
|
||||
except asyncio.CancelledError:
|
||||
prefect_bootstrap_state.status = "cancelled"
|
||||
prefect_bootstrap_state.task_running = False
|
||||
logger.info("Prefect bootstrap task cancelled")
|
||||
raise
|
||||
|
||||
except Exception as exc: # pragma: no cover - defensive logging on infra startup
|
||||
logger.exception("Prefect bootstrap failed")
|
||||
prefect_bootstrap_state.ready = False
|
||||
prefect_bootstrap_state.status = "error"
|
||||
prefect_bootstrap_state.last_error = str(exc)
|
||||
|
||||
# Ensure partial initialization does not leave stale state behind
|
||||
prefect_mgr.workflows.clear()
|
||||
prefect_mgr.deployments.clear()
|
||||
await prefect_stats_monitor.stop_monitoring()
|
||||
|
||||
wait_time = min(
|
||||
STARTUP_RETRY_SECONDS * (2 ** (attempt - 1)),
|
||||
STARTUP_RETRY_MAX_SECONDS,
|
||||
)
|
||||
logger.info("Retrying Prefect bootstrap in %s second(s)", wait_time)
|
||||
|
||||
try:
|
||||
await asyncio.sleep(wait_time)
|
||||
except asyncio.CancelledError:
|
||||
prefect_bootstrap_state.status = "cancelled"
|
||||
prefect_bootstrap_state.task_running = False
|
||||
raise
|
||||
|
||||
|
||||
def _lookup_workflow(workflow_name: str):
|
||||
info = prefect_mgr.workflows.get(workflow_name)
|
||||
if not info:
|
||||
return None
|
||||
metadata = info.metadata
|
||||
defaults = metadata.get("default_parameters", {})
|
||||
default_target_path = metadata.get("default_target_path") or defaults.get("target_path")
|
||||
supported_modes = metadata.get("supported_volume_modes") or ["ro", "rw"]
|
||||
if not isinstance(supported_modes, list) or not supported_modes:
|
||||
supported_modes = ["ro", "rw"]
|
||||
default_volume_mode = (
|
||||
metadata.get("default_volume_mode")
|
||||
or defaults.get("volume_mode")
|
||||
or supported_modes[0]
|
||||
)
|
||||
return {
|
||||
"name": workflow_name,
|
||||
"version": metadata.get("version", "0.6.0"),
|
||||
"description": metadata.get("description", ""),
|
||||
"author": metadata.get("author"),
|
||||
"tags": metadata.get("tags", []),
|
||||
"parameters": metadata.get("parameters", {}),
|
||||
"default_parameters": metadata.get("default_parameters", {}),
|
||||
"required_modules": metadata.get("required_modules", []),
|
||||
"supported_volume_modes": supported_modes,
|
||||
"default_target_path": default_target_path,
|
||||
"default_volume_mode": default_volume_mode,
|
||||
"has_custom_docker": bool(info.has_docker),
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def list_workflows_mcp() -> Dict[str, Any]:
|
||||
"""List all discovered workflows and their metadata summary."""
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"workflows": [],
|
||||
"prefect": not_ready,
|
||||
"message": "Prefect infrastructure is still initializing",
|
||||
}
|
||||
|
||||
workflows_summary = []
|
||||
for name, info in prefect_mgr.workflows.items():
|
||||
metadata = info.metadata
|
||||
defaults = metadata.get("default_parameters", {})
|
||||
workflows_summary.append({
|
||||
"name": name,
|
||||
"version": metadata.get("version", "0.6.0"),
|
||||
"description": metadata.get("description", ""),
|
||||
"author": metadata.get("author"),
|
||||
"tags": metadata.get("tags", []),
|
||||
"supported_volume_modes": metadata.get("supported_volume_modes", ["ro", "rw"]),
|
||||
"default_volume_mode": metadata.get("default_volume_mode")
|
||||
or defaults.get("volume_mode")
|
||||
or "ro",
|
||||
"default_target_path": metadata.get("default_target_path")
|
||||
or defaults.get("target_path"),
|
||||
"has_custom_docker": bool(info.has_docker),
|
||||
})
|
||||
return {"workflows": workflows_summary, "prefect": get_prefect_status()}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_workflow_metadata_mcp(workflow_name: str) -> Dict[str, Any]:
|
||||
"""Fetch detailed metadata for a workflow."""
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
data = _lookup_workflow(workflow_name)
|
||||
if not data:
|
||||
return {"error": f"Workflow not found: {workflow_name}"}
|
||||
return data
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_workflow_parameters_mcp(workflow_name: str) -> Dict[str, Any]:
|
||||
"""Return the parameter schema and defaults for a workflow."""
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
data = _lookup_workflow(workflow_name)
|
||||
if not data:
|
||||
return {"error": f"Workflow not found: {workflow_name}"}
|
||||
return {
|
||||
"parameters": data.get("parameters", {}),
|
||||
"defaults": data.get("default_parameters", {}),
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_workflow_metadata_schema_mcp() -> Dict[str, Any]:
|
||||
"""Return the JSON schema describing workflow metadata files."""
|
||||
return WorkflowDiscovery.get_metadata_schema()
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def submit_security_scan_mcp(
|
||||
workflow_name: str,
|
||||
target_path: str | None = None,
|
||||
volume_mode: str | None = None,
|
||||
parameters: Dict[str, Any] | None = None,
|
||||
) -> Dict[str, Any] | Dict[str, str]:
|
||||
"""Submit a Prefect workflow via MCP."""
|
||||
try:
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
workflow_info = prefect_mgr.workflows.get(workflow_name)
|
||||
if not workflow_info:
|
||||
return {"error": f"Workflow '{workflow_name}' not found"}
|
||||
|
||||
metadata = workflow_info.metadata or {}
|
||||
defaults = metadata.get("default_parameters", {})
|
||||
|
||||
resolved_target_path = target_path or metadata.get("default_target_path") or defaults.get("target_path")
|
||||
if not resolved_target_path:
|
||||
return {
|
||||
"error": (
|
||||
"target_path is required and no default_target_path is defined in metadata"
|
||||
),
|
||||
"metadata": {
|
||||
"workflow": workflow_name,
|
||||
"default_target_path": metadata.get("default_target_path"),
|
||||
},
|
||||
}
|
||||
|
||||
requested_volume_mode = volume_mode or metadata.get("default_volume_mode") or defaults.get("volume_mode")
|
||||
if not requested_volume_mode:
|
||||
requested_volume_mode = "ro"
|
||||
|
||||
normalised_volume_mode = (
|
||||
str(requested_volume_mode).strip().lower().replace("-", "_")
|
||||
)
|
||||
if normalised_volume_mode in {"read_only", "readonly", "ro"}:
|
||||
normalised_volume_mode = "ro"
|
||||
elif normalised_volume_mode in {"read_write", "readwrite", "rw"}:
|
||||
normalised_volume_mode = "rw"
|
||||
else:
|
||||
supported_modes = metadata.get("supported_volume_modes", ["ro", "rw"])
|
||||
if isinstance(supported_modes, list) and normalised_volume_mode in supported_modes:
|
||||
pass
|
||||
else:
|
||||
normalised_volume_mode = "ro"
|
||||
|
||||
parameters = parameters or {}
|
||||
|
||||
cleaned_parameters: Dict[str, Any] = {**defaults, **parameters}
|
||||
|
||||
# Ensure *_config structures default to dicts so Prefect validation passes.
|
||||
for key, value in list(cleaned_parameters.items()):
|
||||
if isinstance(key, str) and key.endswith("_config") and value is None:
|
||||
cleaned_parameters[key] = {}
|
||||
|
||||
# Some workflows expect configuration dictionaries even when omitted.
|
||||
parameter_definitions = (
|
||||
metadata.get("parameters", {}).get("properties", {})
|
||||
if isinstance(metadata.get("parameters"), dict)
|
||||
else {}
|
||||
)
|
||||
for key, definition in parameter_definitions.items():
|
||||
if not isinstance(key, str) or not key.endswith("_config"):
|
||||
continue
|
||||
if key not in cleaned_parameters:
|
||||
default_value = definition.get("default") if isinstance(definition, dict) else None
|
||||
cleaned_parameters[key] = default_value if default_value is not None else {}
|
||||
elif cleaned_parameters[key] is None:
|
||||
cleaned_parameters[key] = {}
|
||||
|
||||
flow_run = await prefect_mgr.submit_workflow(
|
||||
workflow_name=workflow_name,
|
||||
target_path=resolved_target_path,
|
||||
volume_mode=normalised_volume_mode,
|
||||
parameters=cleaned_parameters,
|
||||
)
|
||||
|
||||
return {
|
||||
"run_id": str(flow_run.id),
|
||||
"status": flow_run.state.name if flow_run.state else "PENDING",
|
||||
"workflow": workflow_name,
|
||||
"message": f"Workflow '{workflow_name}' submitted successfully",
|
||||
"target_path": resolved_target_path,
|
||||
"volume_mode": normalised_volume_mode,
|
||||
"parameters": cleaned_parameters,
|
||||
"mcp_enabled": True,
|
||||
}
|
||||
except Exception as exc: # pragma: no cover - defensive logging
|
||||
logger.exception("MCP submit failed")
|
||||
return {"error": f"Failed to submit workflow: {exc}"}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_comprehensive_scan_summary(run_id: str) -> Dict[str, Any] | Dict[str, str]:
|
||||
"""Return a summary for the given flow run via MCP."""
|
||||
try:
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
status = await prefect_mgr.get_flow_run_status(run_id)
|
||||
findings = await prefect_mgr.get_flow_run_findings(run_id)
|
||||
|
||||
workflow_name = "unknown"
|
||||
deployment_id = status.get("workflow", "")
|
||||
for name, deployment in prefect_mgr.deployments.items():
|
||||
if str(deployment) == str(deployment_id):
|
||||
workflow_name = name
|
||||
break
|
||||
|
||||
total_findings = 0
|
||||
severity_summary = {"critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0}
|
||||
|
||||
if findings and "sarif" in findings:
|
||||
sarif = findings["sarif"]
|
||||
if isinstance(sarif, dict):
|
||||
total_findings = sarif.get("total_findings", 0)
|
||||
|
||||
return {
|
||||
"run_id": run_id,
|
||||
"workflow": workflow_name,
|
||||
"status": status.get("status", "unknown"),
|
||||
"is_completed": status.get("is_completed", False),
|
||||
"total_findings": total_findings,
|
||||
"severity_summary": severity_summary,
|
||||
"scan_duration": status.get("updated_at", "")
|
||||
if status.get("is_completed")
|
||||
else "In progress",
|
||||
"recommendations": (
|
||||
[
|
||||
"Review high and critical severity findings first",
|
||||
"Implement security fixes based on finding recommendations",
|
||||
"Re-run scan after applying fixes to verify remediation",
|
||||
]
|
||||
if total_findings > 0
|
||||
else ["No security issues found"]
|
||||
),
|
||||
"mcp_analysis": True,
|
||||
}
|
||||
except Exception as exc: # pragma: no cover
|
||||
logger.exception("MCP summary failed")
|
||||
return {"error": f"Failed to summarize run: {exc}"}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_run_status_mcp(run_id: str) -> Dict[str, Any]:
|
||||
"""Return current status information for a Prefect run."""
|
||||
try:
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
status = await prefect_mgr.get_flow_run_status(run_id)
|
||||
workflow_name = "unknown"
|
||||
deployment_id = status.get("workflow", "")
|
||||
for name, deployment in prefect_mgr.deployments.items():
|
||||
if str(deployment) == str(deployment_id):
|
||||
workflow_name = name
|
||||
break
|
||||
|
||||
return {
|
||||
"run_id": status["run_id"],
|
||||
"workflow": workflow_name,
|
||||
"status": status["status"],
|
||||
"is_completed": status["is_completed"],
|
||||
"is_failed": status["is_failed"],
|
||||
"is_running": status["is_running"],
|
||||
"created_at": status["created_at"],
|
||||
"updated_at": status["updated_at"],
|
||||
}
|
||||
except Exception as exc:
|
||||
logger.exception("MCP run status failed")
|
||||
return {"error": f"Failed to get run status: {exc}"}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_run_findings_mcp(run_id: str) -> Dict[str, Any]:
|
||||
"""Return SARIF findings for a completed run."""
|
||||
try:
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
status = await prefect_mgr.get_flow_run_status(run_id)
|
||||
if not status.get("is_completed"):
|
||||
return {"error": f"Run {run_id} not completed. Status: {status.get('status')}"}
|
||||
|
||||
findings = await prefect_mgr.get_flow_run_findings(run_id)
|
||||
|
||||
workflow_name = "unknown"
|
||||
deployment_id = status.get("workflow", "")
|
||||
for name, deployment in prefect_mgr.deployments.items():
|
||||
if str(deployment) == str(deployment_id):
|
||||
workflow_name = name
|
||||
break
|
||||
|
||||
metadata = {
|
||||
"completion_time": status.get("updated_at"),
|
||||
"workflow_version": "unknown",
|
||||
}
|
||||
info = prefect_mgr.workflows.get(workflow_name)
|
||||
if info:
|
||||
metadata["workflow_version"] = info.metadata.get("version", "unknown")
|
||||
|
||||
return {
|
||||
"workflow": workflow_name,
|
||||
"run_id": run_id,
|
||||
"sarif": findings,
|
||||
"metadata": metadata,
|
||||
}
|
||||
except Exception as exc:
|
||||
logger.exception("MCP findings failed")
|
||||
return {"error": f"Failed to retrieve findings: {exc}"}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def list_recent_runs_mcp(
|
||||
limit: int = 10,
|
||||
workflow_name: str | None = None,
|
||||
states: List[str] | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""List recent Prefect runs with optional workflow/state filters."""
|
||||
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"runs": [],
|
||||
"prefect": not_ready,
|
||||
"message": "Prefect infrastructure is still initializing",
|
||||
}
|
||||
|
||||
try:
|
||||
limit_value = int(limit)
|
||||
except (TypeError, ValueError):
|
||||
limit_value = 10
|
||||
limit_value = max(1, min(limit_value, 100))
|
||||
|
||||
deployment_map = {
|
||||
str(deployment_id): workflow
|
||||
for workflow, deployment_id in prefect_mgr.deployments.items()
|
||||
}
|
||||
|
||||
deployment_filter_value = None
|
||||
if workflow_name:
|
||||
deployment_id = prefect_mgr.deployments.get(workflow_name)
|
||||
if not deployment_id:
|
||||
return {
|
||||
"runs": [],
|
||||
"prefect": get_prefect_status(),
|
||||
"error": f"Workflow '{workflow_name}' has no registered deployment",
|
||||
}
|
||||
try:
|
||||
deployment_filter_value = UUID(str(deployment_id))
|
||||
except ValueError:
|
||||
return {
|
||||
"runs": [],
|
||||
"prefect": get_prefect_status(),
|
||||
"error": (
|
||||
f"Deployment id '{deployment_id}' for workflow '{workflow_name}' is invalid"
|
||||
),
|
||||
}
|
||||
|
||||
desired_state_types: List[StateType] = []
|
||||
if states:
|
||||
for raw_state in states:
|
||||
if not raw_state:
|
||||
continue
|
||||
normalised = raw_state.strip().upper()
|
||||
if normalised == "ALL":
|
||||
desired_state_types = []
|
||||
break
|
||||
try:
|
||||
desired_state_types.append(StateType[normalised])
|
||||
except KeyError:
|
||||
continue
|
||||
if not desired_state_types:
|
||||
desired_state_types = [
|
||||
StateType.RUNNING,
|
||||
StateType.COMPLETED,
|
||||
StateType.FAILED,
|
||||
StateType.CANCELLED,
|
||||
]
|
||||
|
||||
flow_filter = FlowRunFilter()
|
||||
if desired_state_types:
|
||||
flow_filter.state = FlowRunFilterState(
|
||||
type=FlowRunFilterStateType(any_=desired_state_types)
|
||||
)
|
||||
if deployment_filter_value:
|
||||
flow_filter.deployment_id = FlowRunFilterDeploymentId(
|
||||
any_=[deployment_filter_value]
|
||||
)
|
||||
|
||||
async with get_client() as client:
|
||||
flow_runs = await client.read_flow_runs(
|
||||
limit=limit_value,
|
||||
flow_run_filter=flow_filter,
|
||||
sort=FlowRunSort.START_TIME_DESC,
|
||||
)
|
||||
|
||||
results: List[Dict[str, Any]] = []
|
||||
for flow_run in flow_runs:
|
||||
deployment_id = getattr(flow_run, "deployment_id", None)
|
||||
workflow = deployment_map.get(str(deployment_id), "unknown")
|
||||
state = getattr(flow_run, "state", None)
|
||||
state_name = getattr(state, "name", None) if state else None
|
||||
state_type = getattr(state, "type", None) if state else None
|
||||
|
||||
results.append(
|
||||
{
|
||||
"run_id": str(flow_run.id),
|
||||
"workflow": workflow,
|
||||
"deployment_id": str(deployment_id) if deployment_id else None,
|
||||
"state": state_name or (state_type.name if state_type else None),
|
||||
"state_type": state_type.name if state_type else None,
|
||||
"is_completed": bool(getattr(state, "is_completed", lambda: False)()),
|
||||
"is_running": bool(getattr(state, "is_running", lambda: False)()),
|
||||
"is_failed": bool(getattr(state, "is_failed", lambda: False)()),
|
||||
"created_at": getattr(flow_run, "created", None),
|
||||
"updated_at": getattr(flow_run, "updated", None),
|
||||
"expected_start_time": getattr(flow_run, "expected_start_time", None),
|
||||
"start_time": getattr(flow_run, "start_time", None),
|
||||
}
|
||||
)
|
||||
|
||||
# Normalise datetimes to ISO 8601 strings for serialization
|
||||
for entry in results:
|
||||
for key in ("created_at", "updated_at", "expected_start_time", "start_time"):
|
||||
value = entry.get(key)
|
||||
if value is None:
|
||||
continue
|
||||
try:
|
||||
entry[key] = value.isoformat()
|
||||
except AttributeError:
|
||||
entry[key] = str(value)
|
||||
|
||||
return {"runs": results, "prefect": get_prefect_status()}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_fuzzing_stats_mcp(run_id: str) -> Dict[str, Any]:
|
||||
"""Return fuzzing statistics for a run if available."""
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
stats = fuzzing.fuzzing_stats.get(run_id)
|
||||
if not stats:
|
||||
return {"error": f"Fuzzing run not found: {run_id}"}
|
||||
# Be resilient if a plain dict slipped into the cache
|
||||
if isinstance(stats, dict):
|
||||
return stats
|
||||
if hasattr(stats, "model_dump"):
|
||||
return stats.model_dump()
|
||||
if hasattr(stats, "dict"):
|
||||
return stats.dict()
|
||||
# Last resort
|
||||
return getattr(stats, "__dict__", {"run_id": run_id})
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_fuzzing_crash_reports_mcp(run_id: str) -> Dict[str, Any]:
|
||||
"""Return crash reports collected for a fuzzing run."""
|
||||
not_ready = _prefect_not_ready_status()
|
||||
if not_ready:
|
||||
return {
|
||||
"error": "Prefect infrastructure not ready",
|
||||
"prefect": not_ready,
|
||||
}
|
||||
|
||||
reports = fuzzing.crash_reports.get(run_id)
|
||||
if reports is None:
|
||||
return {"error": f"Fuzzing run not found: {run_id}"}
|
||||
return {"run_id": run_id, "crashes": [report.model_dump() for report in reports]}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
async def get_backend_status_mcp() -> Dict[str, Any]:
|
||||
"""Expose backend readiness, workflows, and registered MCP tools."""
|
||||
|
||||
status = get_prefect_status()
|
||||
response: Dict[str, Any] = {"prefect": status}
|
||||
|
||||
if status.get("ready"):
|
||||
response["workflows"] = list(prefect_mgr.workflows.keys())
|
||||
|
||||
try:
|
||||
tools = await mcp._tool_manager.list_tools()
|
||||
response["mcp_tools"] = sorted(tool.name for tool in tools)
|
||||
except Exception as exc: # pragma: no cover - defensive logging
|
||||
logger.debug("Failed to enumerate MCP tools: %s", exc)
|
||||
|
||||
return response
|
||||
|
||||
|
||||
def create_mcp_transport_app() -> Starlette:
|
||||
"""Build a Starlette app serving HTTP + SSE transports on one port."""
|
||||
|
||||
http_app = mcp.http_app(path="/", transport="streamable-http")
|
||||
sse_app = create_sse_app(
|
||||
server=mcp,
|
||||
message_path="/messages",
|
||||
sse_path="/",
|
||||
auth=mcp.auth,
|
||||
)
|
||||
|
||||
routes = [
|
||||
Mount("/mcp", app=http_app),
|
||||
Mount("/mcp/sse", app=sse_app),
|
||||
]
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: Starlette): # pragma: no cover - integration wiring
|
||||
async with AsyncExitStack() as stack:
|
||||
await stack.enter_async_context(
|
||||
http_app.router.lifespan_context(http_app)
|
||||
)
|
||||
await stack.enter_async_context(
|
||||
sse_app.router.lifespan_context(sse_app)
|
||||
)
|
||||
yield
|
||||
|
||||
combined_app = Starlette(routes=routes, lifespan=lifespan)
|
||||
combined_app.state.fastmcp_server = mcp
|
||||
combined_app.state.http_app = http_app
|
||||
combined_app.state.sse_app = sse_app
|
||||
return combined_app
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Combined lifespan: Prefect init + dedicated MCP transports
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@asynccontextmanager
|
||||
async def combined_lifespan(app: FastAPI):
|
||||
global prefect_bootstrap_task, _fastapi_mcp_imported
|
||||
|
||||
logger.info("Starting FuzzForge backend...")
|
||||
|
||||
# Ensure FastAPI endpoints are exposed via MCP once
|
||||
if not _fastapi_mcp_imported:
|
||||
try:
|
||||
await mcp.import_server(FASTAPI_MCP_ADAPTER)
|
||||
_fastapi_mcp_imported = True
|
||||
logger.info("Mounted FastAPI endpoints as MCP tools")
|
||||
except Exception as exc:
|
||||
logger.exception("Failed to import FastAPI endpoints into MCP", exc_info=exc)
|
||||
|
||||
# Kick off Prefect bootstrap in the background if needed
|
||||
if prefect_bootstrap_task is None or prefect_bootstrap_task.done():
|
||||
prefect_bootstrap_task = asyncio.create_task(_bootstrap_prefect_with_retries())
|
||||
logger.info("Prefect bootstrap task started")
|
||||
else:
|
||||
logger.info("Prefect bootstrap task already running")
|
||||
|
||||
# Start MCP transports on shared port (HTTP + SSE)
|
||||
mcp_app = create_mcp_transport_app()
|
||||
mcp_config = uvicorn.Config(
|
||||
app=mcp_app,
|
||||
host="0.0.0.0",
|
||||
port=8010,
|
||||
log_level="info",
|
||||
lifespan="on",
|
||||
)
|
||||
mcp_server = uvicorn.Server(mcp_config)
|
||||
mcp_server.install_signal_handlers = lambda: None # type: ignore[assignment]
|
||||
mcp_task = asyncio.create_task(mcp_server.serve())
|
||||
|
||||
async def _wait_for_uvicorn_startup() -> None:
|
||||
started_attr = getattr(mcp_server, "started", None)
|
||||
if hasattr(started_attr, "wait"):
|
||||
await asyncio.wait_for(started_attr.wait(), timeout=10)
|
||||
return
|
||||
|
||||
# Fallback for uvicorn versions where "started" is a bool
|
||||
poll_interval = 0.1
|
||||
checks = int(10 / poll_interval)
|
||||
for _ in range(checks):
|
||||
if getattr(mcp_server, "started", False):
|
||||
return
|
||||
await asyncio.sleep(poll_interval)
|
||||
raise asyncio.TimeoutError
|
||||
|
||||
try:
|
||||
await _wait_for_uvicorn_startup()
|
||||
except asyncio.TimeoutError: # pragma: no cover - defensive logging
|
||||
if mcp_task.done():
|
||||
raise RuntimeError("MCP server failed to start") from mcp_task.exception()
|
||||
logger.warning("Timed out waiting for MCP server startup; continuing anyway")
|
||||
|
||||
logger.info("MCP HTTP available at http://0.0.0.0:8010/mcp")
|
||||
logger.info("MCP SSE available at http://0.0.0.0:8010/mcp/sse")
|
||||
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
logger.info("Shutting down MCP transports...")
|
||||
mcp_server.should_exit = True
|
||||
mcp_server.force_exit = True
|
||||
await asyncio.gather(mcp_task, return_exceptions=True)
|
||||
|
||||
if prefect_bootstrap_task and not prefect_bootstrap_task.done():
|
||||
prefect_bootstrap_task.cancel()
|
||||
with suppress(asyncio.CancelledError):
|
||||
await prefect_bootstrap_task
|
||||
prefect_bootstrap_state.task_running = False
|
||||
if not prefect_bootstrap_state.ready:
|
||||
prefect_bootstrap_state.status = "stopped"
|
||||
prefect_bootstrap_state.next_retry_seconds = None
|
||||
prefect_bootstrap_task = None
|
||||
|
||||
logger.info("Shutting down Prefect statistics monitor...")
|
||||
await prefect_stats_monitor.stop_monitoring()
|
||||
logger.info("Shutting down FuzzForge backend...")
|
||||
|
||||
|
||||
app.router.lifespan_context = combined_lifespan
|
||||
11
backend/src/models/__init__.py
Normal file
11
backend/src/models/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
182
backend/src/models/findings.py
Normal file
182
backend/src/models/findings.py
Normal file
@@ -0,0 +1,182 @@
|
||||
"""
|
||||
Models for workflow findings and submissions
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from pydantic import BaseModel, Field, field_validator
|
||||
from typing import Dict, Any, Optional, Literal, List
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class WorkflowFindings(BaseModel):
|
||||
"""Findings from a workflow execution in SARIF format"""
|
||||
workflow: str = Field(..., description="Workflow name")
|
||||
run_id: str = Field(..., description="Unique run identifier")
|
||||
sarif: Dict[str, Any] = Field(..., description="SARIF formatted findings")
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
|
||||
|
||||
|
||||
class ResourceLimits(BaseModel):
|
||||
"""Resource limits for workflow execution"""
|
||||
cpu_limit: Optional[str] = Field(None, description="CPU limit (e.g., '2' for 2 cores, '500m' for 0.5 cores)")
|
||||
memory_limit: Optional[str] = Field(None, description="Memory limit (e.g., '1Gi', '512Mi')")
|
||||
cpu_request: Optional[str] = Field(None, description="CPU request (guaranteed)")
|
||||
memory_request: Optional[str] = Field(None, description="Memory request (guaranteed)")
|
||||
|
||||
|
||||
class VolumeMount(BaseModel):
|
||||
"""Volume mount specification"""
|
||||
host_path: str = Field(..., description="Host path to mount")
|
||||
container_path: str = Field(..., description="Container path for mount")
|
||||
mode: Literal["ro", "rw"] = Field(default="ro", description="Mount mode")
|
||||
|
||||
@field_validator("host_path")
|
||||
@classmethod
|
||||
def validate_host_path(cls, v):
|
||||
"""Validate that the host path is absolute (existence checked at runtime)"""
|
||||
path = Path(v)
|
||||
if not path.is_absolute():
|
||||
raise ValueError(f"Host path must be absolute: {v}")
|
||||
# Note: Path existence is validated at workflow runtime
|
||||
# We can't validate existence here as this runs inside Docker container
|
||||
return str(path)
|
||||
|
||||
@field_validator("container_path")
|
||||
@classmethod
|
||||
def validate_container_path(cls, v):
|
||||
"""Validate that the container path is absolute"""
|
||||
if not v.startswith('/'):
|
||||
raise ValueError(f"Container path must be absolute: {v}")
|
||||
return v
|
||||
|
||||
|
||||
class WorkflowSubmission(BaseModel):
|
||||
"""Submit a workflow with configurable settings"""
|
||||
target_path: str = Field(..., description="Absolute path to analyze")
|
||||
volume_mode: Literal["ro", "rw"] = Field(
|
||||
default="ro",
|
||||
description="Volume mount mode: read-only (ro) or read-write (rw)"
|
||||
)
|
||||
parameters: Dict[str, Any] = Field(
|
||||
default_factory=dict,
|
||||
description="Workflow-specific parameters"
|
||||
)
|
||||
timeout: Optional[int] = Field(
|
||||
default=None, # Allow workflow-specific defaults
|
||||
description="Timeout in seconds (None for workflow default)",
|
||||
ge=1,
|
||||
le=604800 # Max 7 days to support fuzzing campaigns
|
||||
)
|
||||
resource_limits: Optional[ResourceLimits] = Field(
|
||||
None,
|
||||
description="Resource limits for workflow container"
|
||||
)
|
||||
additional_volumes: List[VolumeMount] = Field(
|
||||
default_factory=list,
|
||||
description="Additional volume mounts (e.g., for corpus, output directories)"
|
||||
)
|
||||
|
||||
@field_validator("target_path")
|
||||
@classmethod
|
||||
def validate_path(cls, v):
|
||||
"""Validate that the target path is absolute (existence checked at runtime)"""
|
||||
path = Path(v)
|
||||
if not path.is_absolute():
|
||||
raise ValueError(f"Path must be absolute: {v}")
|
||||
# Note: Path existence is validated at workflow runtime when volumes are mounted
|
||||
# We can't validate existence here as this runs inside Docker container
|
||||
return str(path)
|
||||
|
||||
|
||||
class WorkflowStatus(BaseModel):
|
||||
"""Status of a workflow run"""
|
||||
run_id: str = Field(..., description="Unique run identifier")
|
||||
workflow: str = Field(..., description="Workflow name")
|
||||
status: str = Field(..., description="Current status")
|
||||
is_completed: bool = Field(..., description="Whether the run is completed")
|
||||
is_failed: bool = Field(..., description="Whether the run failed")
|
||||
is_running: bool = Field(..., description="Whether the run is currently running")
|
||||
created_at: datetime = Field(..., description="Run creation time")
|
||||
updated_at: datetime = Field(..., description="Last update time")
|
||||
|
||||
|
||||
class WorkflowMetadata(BaseModel):
|
||||
"""Complete metadata for a workflow"""
|
||||
name: str = Field(..., description="Workflow name")
|
||||
version: str = Field(..., description="Semantic version")
|
||||
description: str = Field(..., description="Workflow description")
|
||||
author: Optional[str] = Field(None, description="Workflow author")
|
||||
tags: List[str] = Field(default_factory=list, description="Workflow tags")
|
||||
parameters: Dict[str, Any] = Field(..., description="Parameters schema")
|
||||
default_parameters: Dict[str, Any] = Field(
|
||||
default_factory=dict,
|
||||
description="Default parameter values"
|
||||
)
|
||||
required_modules: List[str] = Field(
|
||||
default_factory=list,
|
||||
description="Required module names"
|
||||
)
|
||||
supported_volume_modes: List[Literal["ro", "rw"]] = Field(
|
||||
default=["ro", "rw"],
|
||||
description="Supported volume mount modes"
|
||||
)
|
||||
has_custom_docker: bool = Field(
|
||||
default=False,
|
||||
description="Whether workflow has custom Dockerfile"
|
||||
)
|
||||
|
||||
|
||||
class WorkflowListItem(BaseModel):
|
||||
"""Summary information for a workflow in list views"""
|
||||
name: str = Field(..., description="Workflow name")
|
||||
version: str = Field(..., description="Semantic version")
|
||||
description: str = Field(..., description="Workflow description")
|
||||
author: Optional[str] = Field(None, description="Workflow author")
|
||||
tags: List[str] = Field(default_factory=list, description="Workflow tags")
|
||||
|
||||
|
||||
class RunSubmissionResponse(BaseModel):
|
||||
"""Response after submitting a workflow"""
|
||||
run_id: str = Field(..., description="Unique run identifier")
|
||||
status: str = Field(..., description="Initial status")
|
||||
workflow: str = Field(..., description="Workflow name")
|
||||
message: str = Field(default="Workflow submitted successfully")
|
||||
|
||||
|
||||
class FuzzingStats(BaseModel):
|
||||
"""Real-time fuzzing statistics"""
|
||||
run_id: str = Field(..., description="Unique run identifier")
|
||||
workflow: str = Field(..., description="Workflow name")
|
||||
executions: int = Field(default=0, description="Total executions")
|
||||
executions_per_sec: float = Field(default=0.0, description="Current execution rate")
|
||||
crashes: int = Field(default=0, description="Total crashes found")
|
||||
unique_crashes: int = Field(default=0, description="Unique crashes")
|
||||
coverage: Optional[float] = Field(None, description="Code coverage percentage")
|
||||
corpus_size: int = Field(default=0, description="Current corpus size")
|
||||
elapsed_time: int = Field(default=0, description="Elapsed time in seconds")
|
||||
last_crash_time: Optional[datetime] = Field(None, description="Time of last crash")
|
||||
|
||||
|
||||
class CrashReport(BaseModel):
|
||||
"""Individual crash report from fuzzing"""
|
||||
run_id: str = Field(..., description="Run identifier")
|
||||
crash_id: str = Field(..., description="Unique crash identifier")
|
||||
timestamp: datetime = Field(default_factory=datetime.utcnow)
|
||||
signal: Optional[str] = Field(None, description="Crash signal (SIGSEGV, etc.)")
|
||||
crash_type: Optional[str] = Field(None, description="Type of crash")
|
||||
stack_trace: Optional[str] = Field(None, description="Stack trace")
|
||||
input_file: Optional[str] = Field(None, description="Path to crashing input")
|
||||
reproducer: Optional[str] = Field(None, description="Minimized reproducer")
|
||||
severity: str = Field(default="medium", description="Crash severity")
|
||||
exploitability: Optional[str] = Field(None, description="Exploitability assessment")
|
||||
394
backend/src/services/prefect_stats_monitor.py
Normal file
394
backend/src/services/prefect_stats_monitor.py
Normal file
@@ -0,0 +1,394 @@
|
||||
"""
|
||||
Generic Prefect Statistics Monitor Service
|
||||
|
||||
This service monitors ALL workflows for structured live data logging and
|
||||
updates the appropriate statistics APIs. Works with any workflow that follows
|
||||
the standard LIVE_STATS logging pattern.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Dict, Any, Optional
|
||||
from prefect.client.orchestration import get_client
|
||||
from prefect.client.schemas.objects import FlowRun, TaskRun
|
||||
from src.models.findings import FuzzingStats
|
||||
from src.api.fuzzing import fuzzing_stats, initialize_fuzzing_tracking, active_connections
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PrefectStatsMonitor:
|
||||
"""Monitors Prefect flows and tasks for live statistics from any workflow"""
|
||||
|
||||
def __init__(self):
|
||||
self.monitoring = False
|
||||
self.monitor_task = None
|
||||
self.monitored_runs = set()
|
||||
self.last_log_ts: Dict[str, datetime] = {}
|
||||
self._client = None
|
||||
self._client_refresh_time = None
|
||||
self._client_refresh_interval = 300 # Refresh connection every 5 minutes
|
||||
|
||||
async def start_monitoring(self):
|
||||
"""Start the Prefect statistics monitoring service"""
|
||||
if self.monitoring:
|
||||
logger.warning("Prefect stats monitor already running")
|
||||
return
|
||||
|
||||
self.monitoring = True
|
||||
self.monitor_task = asyncio.create_task(self._monitor_flows())
|
||||
logger.info("Started Prefect statistics monitor")
|
||||
|
||||
async def stop_monitoring(self):
|
||||
"""Stop the monitoring service"""
|
||||
self.monitoring = False
|
||||
if self.monitor_task:
|
||||
self.monitor_task.cancel()
|
||||
try:
|
||||
await self.monitor_task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
logger.info("Stopped Prefect statistics monitor")
|
||||
|
||||
async def _get_or_refresh_client(self):
|
||||
"""Get or refresh Prefect client with connection pooling."""
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
if (self._client is None or
|
||||
self._client_refresh_time is None or
|
||||
(now - self._client_refresh_time).total_seconds() > self._client_refresh_interval):
|
||||
|
||||
if self._client:
|
||||
try:
|
||||
await self._client.aclose()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
self._client = get_client()
|
||||
self._client_refresh_time = now
|
||||
await self._client.__aenter__()
|
||||
|
||||
return self._client
|
||||
|
||||
async def _monitor_flows(self):
|
||||
"""Main monitoring loop that watches Prefect flows"""
|
||||
try:
|
||||
while self.monitoring:
|
||||
try:
|
||||
# Use connection pooling for better performance
|
||||
client = await self._get_or_refresh_client()
|
||||
|
||||
# Get recent flow runs (limit to reduce load)
|
||||
flow_runs = await client.read_flow_runs(
|
||||
limit=50,
|
||||
sort="START_TIME_DESC",
|
||||
)
|
||||
|
||||
# Only consider runs from the last 15 minutes
|
||||
recent_cutoff = datetime.now(timezone.utc) - timedelta(minutes=15)
|
||||
for flow_run in flow_runs:
|
||||
created = getattr(flow_run, "created", None)
|
||||
if created is None:
|
||||
continue
|
||||
try:
|
||||
# Ensure timezone-aware comparison
|
||||
if created.tzinfo is None:
|
||||
created = created.replace(tzinfo=timezone.utc)
|
||||
if created >= recent_cutoff:
|
||||
await self._monitor_flow_run(client, flow_run)
|
||||
except Exception:
|
||||
# If comparison fails, attempt monitoring anyway
|
||||
await self._monitor_flow_run(client, flow_run)
|
||||
|
||||
await asyncio.sleep(5) # Check every 5 seconds
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in Prefect monitoring: {e}")
|
||||
await asyncio.sleep(10)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("Prefect monitoring cancelled")
|
||||
except Exception as e:
|
||||
logger.error(f"Fatal error in Prefect monitoring: {e}")
|
||||
finally:
|
||||
# Clean up client on exit
|
||||
if self._client:
|
||||
try:
|
||||
await self._client.__aexit__(None, None, None)
|
||||
except Exception:
|
||||
pass
|
||||
self._client = None
|
||||
|
||||
async def _monitor_flow_run(self, client, flow_run: FlowRun):
|
||||
"""Monitor a specific flow run for statistics"""
|
||||
run_id = str(flow_run.id)
|
||||
workflow_name = flow_run.name or "unknown"
|
||||
|
||||
try:
|
||||
# Initialize tracking if not exists - only for workflows that might have live stats
|
||||
if run_id not in fuzzing_stats:
|
||||
initialize_fuzzing_tracking(run_id, workflow_name)
|
||||
self.monitored_runs.add(run_id)
|
||||
|
||||
# Skip corrupted entries (should not happen after startup cleanup, but defensive)
|
||||
elif not isinstance(fuzzing_stats[run_id], FuzzingStats):
|
||||
logger.warning(f"Skipping corrupted stats entry for {run_id}, reinitializing")
|
||||
initialize_fuzzing_tracking(run_id, workflow_name)
|
||||
self.monitored_runs.add(run_id)
|
||||
|
||||
# Get task runs for this flow
|
||||
task_runs = await client.read_task_runs(
|
||||
flow_run_filter={"id": {"any_": [flow_run.id]}},
|
||||
limit=25,
|
||||
)
|
||||
|
||||
# Check all tasks for live statistics logging
|
||||
for task_run in task_runs:
|
||||
await self._extract_stats_from_task(client, run_id, task_run, workflow_name)
|
||||
|
||||
# Also scan flow-level logs as a fallback
|
||||
await self._extract_stats_from_flow_logs(client, run_id, flow_run, workflow_name)
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Error monitoring flow run {run_id}: {e}")
|
||||
|
||||
async def _extract_stats_from_task(self, client, run_id: str, task_run: TaskRun, workflow_name: str):
|
||||
"""Extract statistics from any task that logs live stats"""
|
||||
try:
|
||||
# Get task run logs
|
||||
logs = await client.read_logs(
|
||||
log_filter={
|
||||
"task_run_id": {"any_": [task_run.id]}
|
||||
},
|
||||
limit=100,
|
||||
sort="TIMESTAMP_ASC"
|
||||
)
|
||||
|
||||
# Parse logs for LIVE_STATS entries (generic pattern for any workflow)
|
||||
latest_stats = None
|
||||
for log in logs:
|
||||
# Prefer structured extra field if present
|
||||
extra_data = getattr(log, "extra", None) or getattr(log, "extra_fields", None) or None
|
||||
if isinstance(extra_data, dict):
|
||||
stat_type = extra_data.get("stats_type")
|
||||
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
|
||||
latest_stats = extra_data
|
||||
continue
|
||||
|
||||
# Fallback to parsing from message text
|
||||
if ("FUZZ_STATS" in log.message or "LIVE_STATS" in log.message):
|
||||
stats = self._parse_stats_from_log(log.message)
|
||||
if stats:
|
||||
latest_stats = stats
|
||||
|
||||
# Update statistics if we found any
|
||||
if latest_stats:
|
||||
# Calculate elapsed time from task start
|
||||
elapsed_time = 0
|
||||
if task_run.start_time:
|
||||
# Ensure timezone-aware arithmetic
|
||||
now = datetime.now(timezone.utc)
|
||||
try:
|
||||
elapsed_time = int((now - task_run.start_time).total_seconds())
|
||||
except Exception:
|
||||
# Fallback to naive UTC if types mismatch
|
||||
elapsed_time = int((datetime.utcnow() - task_run.start_time.replace(tzinfo=None)).total_seconds())
|
||||
|
||||
updated_stats = FuzzingStats(
|
||||
run_id=run_id,
|
||||
workflow=workflow_name,
|
||||
executions=latest_stats.get("executions", 0),
|
||||
executions_per_sec=latest_stats.get("executions_per_sec", 0.0),
|
||||
crashes=latest_stats.get("crashes", 0),
|
||||
unique_crashes=latest_stats.get("unique_crashes", 0),
|
||||
corpus_size=latest_stats.get("corpus_size", 0),
|
||||
elapsed_time=elapsed_time
|
||||
)
|
||||
|
||||
# Update the global stats
|
||||
previous = fuzzing_stats.get(run_id)
|
||||
fuzzing_stats[run_id] = updated_stats
|
||||
|
||||
# Broadcast to any active WebSocket clients for this run
|
||||
if active_connections.get(run_id):
|
||||
# Handle both Pydantic objects and plain dicts
|
||||
if isinstance(updated_stats, dict):
|
||||
stats_data = updated_stats
|
||||
elif hasattr(updated_stats, 'model_dump'):
|
||||
stats_data = updated_stats.model_dump()
|
||||
elif hasattr(updated_stats, 'dict'):
|
||||
stats_data = updated_stats.dict()
|
||||
else:
|
||||
stats_data = updated_stats.__dict__
|
||||
|
||||
message = {
|
||||
"type": "stats_update",
|
||||
"data": stats_data,
|
||||
}
|
||||
disconnected = []
|
||||
for ws in active_connections[run_id]:
|
||||
try:
|
||||
await ws.send_text(json.dumps(message))
|
||||
except Exception:
|
||||
disconnected.append(ws)
|
||||
# Clean up disconnected sockets
|
||||
for ws in disconnected:
|
||||
try:
|
||||
active_connections[run_id].remove(ws)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
logger.debug(f"Updated Prefect stats for {run_id}: {updated_stats.executions} execs")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Error extracting stats from task {task_run.id}: {e}")
|
||||
|
||||
async def _extract_stats_from_flow_logs(self, client, run_id: str, flow_run: FlowRun, workflow_name: str):
|
||||
"""Extract statistics by scanning flow-level logs for LIVE/FUZZ stats"""
|
||||
try:
|
||||
logs = await client.read_logs(
|
||||
log_filter={
|
||||
"flow_run_id": {"any_": [flow_run.id]}
|
||||
},
|
||||
limit=200,
|
||||
sort="TIMESTAMP_ASC"
|
||||
)
|
||||
|
||||
latest_stats = None
|
||||
last_seen = self.last_log_ts.get(run_id)
|
||||
max_ts = last_seen
|
||||
|
||||
for log in logs:
|
||||
# Skip logs we've already processed
|
||||
ts = getattr(log, "timestamp", None)
|
||||
if last_seen and ts and ts <= last_seen:
|
||||
continue
|
||||
if ts and (max_ts is None or ts > max_ts):
|
||||
max_ts = ts
|
||||
|
||||
# Prefer structured extra field if available
|
||||
extra_data = getattr(log, "extra", None) or getattr(log, "extra_fields", None) or None
|
||||
if isinstance(extra_data, dict):
|
||||
stat_type = extra_data.get("stats_type")
|
||||
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
|
||||
latest_stats = extra_data
|
||||
continue
|
||||
|
||||
# Fallback to message parse
|
||||
if ("FUZZ_STATS" in log.message or "LIVE_STATS" in log.message):
|
||||
stats = self._parse_stats_from_log(log.message)
|
||||
if stats:
|
||||
latest_stats = stats
|
||||
|
||||
if max_ts:
|
||||
self.last_log_ts[run_id] = max_ts
|
||||
|
||||
if latest_stats:
|
||||
# Use flow_run timestamps for elapsed time if available
|
||||
elapsed_time = 0
|
||||
start_time = getattr(flow_run, "start_time", None) or getattr(flow_run, "start_time", None)
|
||||
if start_time:
|
||||
now = datetime.now(timezone.utc)
|
||||
try:
|
||||
if start_time.tzinfo is None:
|
||||
start_time = start_time.replace(tzinfo=timezone.utc)
|
||||
elapsed_time = int((now - start_time).total_seconds())
|
||||
except Exception:
|
||||
elapsed_time = int((datetime.utcnow() - start_time.replace(tzinfo=None)).total_seconds())
|
||||
|
||||
updated_stats = FuzzingStats(
|
||||
run_id=run_id,
|
||||
workflow=workflow_name,
|
||||
executions=latest_stats.get("executions", 0),
|
||||
executions_per_sec=latest_stats.get("executions_per_sec", 0.0),
|
||||
crashes=latest_stats.get("crashes", 0),
|
||||
unique_crashes=latest_stats.get("unique_crashes", 0),
|
||||
corpus_size=latest_stats.get("corpus_size", 0),
|
||||
elapsed_time=elapsed_time
|
||||
)
|
||||
|
||||
fuzzing_stats[run_id] = updated_stats
|
||||
|
||||
# Broadcast if listeners exist
|
||||
if active_connections.get(run_id):
|
||||
# Handle both Pydantic objects and plain dicts
|
||||
if isinstance(updated_stats, dict):
|
||||
stats_data = updated_stats
|
||||
elif hasattr(updated_stats, 'model_dump'):
|
||||
stats_data = updated_stats.model_dump()
|
||||
elif hasattr(updated_stats, 'dict'):
|
||||
stats_data = updated_stats.dict()
|
||||
else:
|
||||
stats_data = updated_stats.__dict__
|
||||
|
||||
message = {
|
||||
"type": "stats_update",
|
||||
"data": stats_data,
|
||||
}
|
||||
disconnected = []
|
||||
for ws in active_connections[run_id]:
|
||||
try:
|
||||
await ws.send_text(json.dumps(message))
|
||||
except Exception:
|
||||
disconnected.append(ws)
|
||||
for ws in disconnected:
|
||||
try:
|
||||
active_connections[run_id].remove(ws)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Error extracting stats from flow logs {run_id}: {e}")
|
||||
|
||||
def _parse_stats_from_log(self, log_message: str) -> Optional[Dict[str, Any]]:
|
||||
"""Parse statistics from a log message"""
|
||||
try:
|
||||
import re
|
||||
|
||||
# Prefer explicit JSON after marker tokens
|
||||
m = re.search(r'(?:FUZZ_STATS|LIVE_STATS)\s+(\{.*\})', log_message)
|
||||
if m:
|
||||
try:
|
||||
return json.loads(m.group(1))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Fallback: Extract the extra= dict and coerce to JSON
|
||||
stats_match = re.search(r'extra=({.*?})', log_message)
|
||||
if not stats_match:
|
||||
return None
|
||||
|
||||
extra_str = stats_match.group(1)
|
||||
extra_str = extra_str.replace("'", '"')
|
||||
extra_str = extra_str.replace('None', 'null')
|
||||
extra_str = extra_str.replace('True', 'true')
|
||||
extra_str = extra_str.replace('False', 'false')
|
||||
|
||||
stats_data = json.loads(extra_str)
|
||||
|
||||
# Support multiple stat types for different workflows
|
||||
stat_type = stats_data.get("stats_type")
|
||||
if stat_type in ["fuzzing_live_update", "scan_progress", "analysis_update", "live_stats"]:
|
||||
return stats_data
|
||||
|
||||
except Exception as e:
|
||||
logger.debug(f"Error parsing log stats: {e}")
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# Global instance
|
||||
prefect_stats_monitor = PrefectStatsMonitor()
|
||||
19
backend/tests/conftest.py
Normal file
19
backend/tests/conftest.py
Normal file
@@ -0,0 +1,19 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Ensure project root is on sys.path so `src` is importable
|
||||
ROOT = Path(__file__).resolve().parents[1]
|
||||
if str(ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(ROOT))
|
||||
|
||||
82
backend/tests/test_prefect_stats_monitor.py
Normal file
82
backend/tests/test_prefect_stats_monitor.py
Normal file
@@ -0,0 +1,82 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import asyncio
|
||||
from datetime import datetime, timezone, timedelta
|
||||
|
||||
|
||||
from src.services.prefect_stats_monitor import PrefectStatsMonitor
|
||||
from src.api import fuzzing
|
||||
|
||||
|
||||
class FakeLog:
|
||||
def __init__(self, message: str):
|
||||
self.message = message
|
||||
|
||||
|
||||
class FakeClient:
|
||||
def __init__(self, logs):
|
||||
self._logs = logs
|
||||
|
||||
async def read_logs(self, log_filter=None, limit=100, sort="TIMESTAMP_ASC"):
|
||||
return self._logs
|
||||
|
||||
|
||||
class FakeTaskRun:
|
||||
def __init__(self):
|
||||
self.id = "task-1"
|
||||
self.start_time = datetime.now(timezone.utc) - timedelta(seconds=5)
|
||||
|
||||
|
||||
def test_parse_stats_from_log_fuzzing():
|
||||
mon = PrefectStatsMonitor()
|
||||
msg = (
|
||||
"INFO LIVE_STATS extra={'stats_type': 'fuzzing_live_update', "
|
||||
"'executions': 42, 'executions_per_sec': 3.14, 'crashes': 1, 'unique_crashes': 1, 'corpus_size': 9}"
|
||||
)
|
||||
stats = mon._parse_stats_from_log(msg)
|
||||
assert stats is not None
|
||||
assert stats["stats_type"] == "fuzzing_live_update"
|
||||
assert stats["executions"] == 42
|
||||
|
||||
|
||||
def test_extract_stats_updates_and_broadcasts():
|
||||
mon = PrefectStatsMonitor()
|
||||
run_id = "run-123"
|
||||
workflow = "wf"
|
||||
fuzzing.initialize_fuzzing_tracking(run_id, workflow)
|
||||
|
||||
# Prepare a fake websocket to capture messages
|
||||
sent = []
|
||||
|
||||
class FakeWS:
|
||||
async def send_text(self, text: str):
|
||||
sent.append(text)
|
||||
|
||||
fuzzing.active_connections[run_id] = [FakeWS()]
|
||||
|
||||
# Craft a log line the parser understands
|
||||
msg = (
|
||||
"INFO LIVE_STATS extra={'stats_type': 'fuzzing_live_update', "
|
||||
"'executions': 10, 'executions_per_sec': 1.5, 'crashes': 0, 'unique_crashes': 0, 'corpus_size': 2}"
|
||||
)
|
||||
fake_client = FakeClient([FakeLog(msg)])
|
||||
task_run = FakeTaskRun()
|
||||
|
||||
asyncio.run(mon._extract_stats_from_task(fake_client, run_id, task_run, workflow))
|
||||
|
||||
# Verify stats updated
|
||||
stats = fuzzing.fuzzing_stats[run_id]
|
||||
assert stats.executions == 10
|
||||
assert stats.executions_per_sec == 1.5
|
||||
|
||||
# Verify a message was sent to WebSocket
|
||||
assert sent, "Expected a stats_update message to be sent"
|
||||
11
backend/toolbox/__init__.py
Normal file
11
backend/toolbox/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
11
backend/toolbox/modules/__init__.py
Normal file
11
backend/toolbox/modules/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
14
backend/toolbox/modules/analyzer/__init__.py
Normal file
14
backend/toolbox/modules/analyzer/__init__.py
Normal file
@@ -0,0 +1,14 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from .security_analyzer import SecurityAnalyzer
|
||||
|
||||
__all__ = ["SecurityAnalyzer"]
|
||||
368
backend/toolbox/modules/analyzer/security_analyzer.py
Normal file
368
backend/toolbox/modules/analyzer/security_analyzer.py
Normal file
@@ -0,0 +1,368 @@
|
||||
"""
|
||||
Security Analyzer Module - Analyzes code for security vulnerabilities
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
|
||||
try:
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
except ImportError:
|
||||
try:
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
except ImportError:
|
||||
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SecurityAnalyzer(BaseModule):
|
||||
"""
|
||||
Analyzes source code for common security vulnerabilities.
|
||||
|
||||
This module:
|
||||
- Detects hardcoded secrets and credentials
|
||||
- Identifies dangerous function calls
|
||||
- Finds SQL injection vulnerabilities
|
||||
- Detects insecure configurations
|
||||
"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="security_analyzer",
|
||||
version="1.0.0",
|
||||
description="Analyzes code for security vulnerabilities",
|
||||
author="FuzzForge Team",
|
||||
category="analyzer",
|
||||
tags=["security", "vulnerabilities", "static-analysis"],
|
||||
input_schema={
|
||||
"file_extensions": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "File extensions to analyze",
|
||||
"default": [".py", ".js", ".java", ".php", ".rb", ".go"]
|
||||
},
|
||||
"check_secrets": {
|
||||
"type": "boolean",
|
||||
"description": "Check for hardcoded secrets",
|
||||
"default": True
|
||||
},
|
||||
"check_sql": {
|
||||
"type": "boolean",
|
||||
"description": "Check for SQL injection risks",
|
||||
"default": True
|
||||
},
|
||||
"check_dangerous_functions": {
|
||||
"type": "boolean",
|
||||
"description": "Check for dangerous function calls",
|
||||
"default": True
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"description": "List of security findings"
|
||||
}
|
||||
},
|
||||
requires_workspace=True
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate module configuration"""
|
||||
extensions = config.get("file_extensions", [])
|
||||
if not isinstance(extensions, list):
|
||||
raise ValueError("file_extensions must be a list")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""
|
||||
Execute the security analysis module.
|
||||
|
||||
Args:
|
||||
config: Module configuration
|
||||
workspace: Path to the workspace directory
|
||||
|
||||
Returns:
|
||||
ModuleResult with security findings
|
||||
"""
|
||||
self.start_timer()
|
||||
self.validate_workspace(workspace)
|
||||
self.validate_config(config)
|
||||
|
||||
findings = []
|
||||
files_analyzed = 0
|
||||
|
||||
# Get configuration
|
||||
file_extensions = config.get("file_extensions", [".py", ".js", ".java", ".php", ".rb", ".go"])
|
||||
check_secrets = config.get("check_secrets", True)
|
||||
check_sql = config.get("check_sql", True)
|
||||
check_dangerous = config.get("check_dangerous_functions", True)
|
||||
|
||||
logger.info(f"Analyzing files with extensions: {file_extensions}")
|
||||
|
||||
try:
|
||||
# Analyze each file
|
||||
for ext in file_extensions:
|
||||
for file_path in workspace.rglob(f"*{ext}"):
|
||||
if not file_path.is_file():
|
||||
continue
|
||||
|
||||
files_analyzed += 1
|
||||
relative_path = file_path.relative_to(workspace)
|
||||
|
||||
try:
|
||||
content = file_path.read_text(encoding='utf-8', errors='ignore')
|
||||
lines = content.splitlines()
|
||||
|
||||
# Check for secrets
|
||||
if check_secrets:
|
||||
secret_findings = self._check_hardcoded_secrets(
|
||||
content, lines, relative_path
|
||||
)
|
||||
findings.extend(secret_findings)
|
||||
|
||||
# Check for SQL injection
|
||||
if check_sql and ext in [".py", ".php", ".java", ".js"]:
|
||||
sql_findings = self._check_sql_injection(
|
||||
content, lines, relative_path
|
||||
)
|
||||
findings.extend(sql_findings)
|
||||
|
||||
# Check for dangerous functions
|
||||
if check_dangerous:
|
||||
dangerous_findings = self._check_dangerous_functions(
|
||||
content, lines, relative_path, ext
|
||||
)
|
||||
findings.extend(dangerous_findings)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error analyzing file {relative_path}: {e}")
|
||||
|
||||
# Create summary
|
||||
summary = {
|
||||
"files_analyzed": files_analyzed,
|
||||
"total_findings": len(findings),
|
||||
"extensions_scanned": file_extensions
|
||||
}
|
||||
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="success" if files_analyzed > 0 else "partial",
|
||||
summary=summary,
|
||||
metadata={
|
||||
"workspace": str(workspace),
|
||||
"config": config
|
||||
}
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Security analyzer failed: {e}")
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _check_hardcoded_secrets(
|
||||
self, content: str, lines: List[str], file_path: Path
|
||||
) -> List[ModuleFinding]:
|
||||
"""
|
||||
Check for hardcoded secrets in code.
|
||||
|
||||
Args:
|
||||
content: File content
|
||||
lines: File lines
|
||||
file_path: Relative file path
|
||||
|
||||
Returns:
|
||||
List of findings
|
||||
"""
|
||||
findings = []
|
||||
|
||||
# Patterns for secrets
|
||||
secret_patterns = [
|
||||
(r'api[_-]?key\s*=\s*["\']([^"\']{20,})["\']', 'API Key'),
|
||||
(r'api[_-]?secret\s*=\s*["\']([^"\']{20,})["\']', 'API Secret'),
|
||||
(r'password\s*=\s*["\']([^"\']+)["\']', 'Hardcoded Password'),
|
||||
(r'token\s*=\s*["\']([^"\']{20,})["\']', 'Authentication Token'),
|
||||
(r'aws[_-]?access[_-]?key\s*=\s*["\']([^"\']+)["\']', 'AWS Access Key'),
|
||||
(r'aws[_-]?secret[_-]?key\s*=\s*["\']([^"\']+)["\']', 'AWS Secret Key'),
|
||||
(r'private[_-]?key\s*=\s*["\']([^"\']+)["\']', 'Private Key'),
|
||||
(r'["\']([A-Za-z0-9]{32,})["\']', 'Potential Secret Hash'),
|
||||
(r'Bearer\s+([A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+)', 'JWT Token'),
|
||||
]
|
||||
|
||||
for pattern, secret_type in secret_patterns:
|
||||
for match in re.finditer(pattern, content, re.IGNORECASE):
|
||||
# Find line number
|
||||
line_num = content[:match.start()].count('\n') + 1
|
||||
line_content = lines[line_num - 1] if line_num <= len(lines) else ""
|
||||
|
||||
# Skip common false positives
|
||||
if self._is_false_positive_secret(match.group(0)):
|
||||
continue
|
||||
|
||||
findings.append(self.create_finding(
|
||||
title=f"Hardcoded {secret_type} detected",
|
||||
description=f"Found potential hardcoded {secret_type} in {file_path}",
|
||||
severity="high" if "key" in secret_type.lower() else "medium",
|
||||
category="hardcoded_secret",
|
||||
file_path=str(file_path),
|
||||
line_start=line_num,
|
||||
code_snippet=line_content.strip()[:100],
|
||||
recommendation=f"Remove hardcoded {secret_type} and use environment variables or secure vault",
|
||||
metadata={"secret_type": secret_type}
|
||||
))
|
||||
|
||||
return findings
|
||||
|
||||
def _check_sql_injection(
|
||||
self, content: str, lines: List[str], file_path: Path
|
||||
) -> List[ModuleFinding]:
|
||||
"""
|
||||
Check for potential SQL injection vulnerabilities.
|
||||
|
||||
Args:
|
||||
content: File content
|
||||
lines: File lines
|
||||
file_path: Relative file path
|
||||
|
||||
Returns:
|
||||
List of findings
|
||||
"""
|
||||
findings = []
|
||||
|
||||
# SQL injection patterns
|
||||
sql_patterns = [
|
||||
(r'(SELECT|INSERT|UPDATE|DELETE).*\+\s*[\'"]?\s*\+?\s*\w+', 'String concatenation in SQL'),
|
||||
(r'(SELECT|INSERT|UPDATE|DELETE).*%\s*[\'"]?\s*%?\s*\w+', 'String formatting in SQL'),
|
||||
(r'f[\'"].*?(SELECT|INSERT|UPDATE|DELETE).*?\{.*?\}', 'F-string in SQL query'),
|
||||
(r'query\s*=.*?\+', 'Dynamic query building'),
|
||||
(r'execute\s*\(.*?\+.*?\)', 'Dynamic execute statement'),
|
||||
]
|
||||
|
||||
for pattern, vuln_type in sql_patterns:
|
||||
for match in re.finditer(pattern, content, re.IGNORECASE):
|
||||
line_num = content[:match.start()].count('\n') + 1
|
||||
line_content = lines[line_num - 1] if line_num <= len(lines) else ""
|
||||
|
||||
findings.append(self.create_finding(
|
||||
title=f"Potential SQL Injection: {vuln_type}",
|
||||
description=f"Detected potential SQL injection vulnerability via {vuln_type}",
|
||||
severity="high",
|
||||
category="sql_injection",
|
||||
file_path=str(file_path),
|
||||
line_start=line_num,
|
||||
code_snippet=line_content.strip()[:100],
|
||||
recommendation="Use parameterized queries or prepared statements instead",
|
||||
metadata={"vulnerability_type": vuln_type}
|
||||
))
|
||||
|
||||
return findings
|
||||
|
||||
def _check_dangerous_functions(
|
||||
self, content: str, lines: List[str], file_path: Path, ext: str
|
||||
) -> List[ModuleFinding]:
|
||||
"""
|
||||
Check for dangerous function calls.
|
||||
|
||||
Args:
|
||||
content: File content
|
||||
lines: File lines
|
||||
file_path: Relative file path
|
||||
ext: File extension
|
||||
|
||||
Returns:
|
||||
List of findings
|
||||
"""
|
||||
findings = []
|
||||
|
||||
# Language-specific dangerous functions
|
||||
dangerous_functions = {
|
||||
".py": [
|
||||
(r'eval\s*\(', 'eval()', 'Arbitrary code execution'),
|
||||
(r'exec\s*\(', 'exec()', 'Arbitrary code execution'),
|
||||
(r'os\.system\s*\(', 'os.system()', 'Command injection risk'),
|
||||
(r'subprocess\.call\s*\(.*shell=True', 'subprocess with shell=True', 'Command injection risk'),
|
||||
(r'pickle\.loads?\s*\(', 'pickle.load()', 'Deserialization vulnerability'),
|
||||
],
|
||||
".js": [
|
||||
(r'eval\s*\(', 'eval()', 'Arbitrary code execution'),
|
||||
(r'new\s+Function\s*\(', 'new Function()', 'Arbitrary code execution'),
|
||||
(r'innerHTML\s*=', 'innerHTML', 'XSS vulnerability'),
|
||||
(r'document\.write\s*\(', 'document.write()', 'XSS vulnerability'),
|
||||
],
|
||||
".php": [
|
||||
(r'eval\s*\(', 'eval()', 'Arbitrary code execution'),
|
||||
(r'exec\s*\(', 'exec()', 'Command execution'),
|
||||
(r'system\s*\(', 'system()', 'Command execution'),
|
||||
(r'shell_exec\s*\(', 'shell_exec()', 'Command execution'),
|
||||
(r'\$_GET\[', 'Direct $_GET usage', 'Input validation missing'),
|
||||
(r'\$_POST\[', 'Direct $_POST usage', 'Input validation missing'),
|
||||
]
|
||||
}
|
||||
|
||||
if ext in dangerous_functions:
|
||||
for pattern, func_name, risk_type in dangerous_functions[ext]:
|
||||
for match in re.finditer(pattern, content):
|
||||
line_num = content[:match.start()].count('\n') + 1
|
||||
line_content = lines[line_num - 1] if line_num <= len(lines) else ""
|
||||
|
||||
findings.append(self.create_finding(
|
||||
title=f"Dangerous function: {func_name}",
|
||||
description=f"Use of potentially dangerous function {func_name}: {risk_type}",
|
||||
severity="medium",
|
||||
category="dangerous_function",
|
||||
file_path=str(file_path),
|
||||
line_start=line_num,
|
||||
code_snippet=line_content.strip()[:100],
|
||||
recommendation=f"Consider safer alternatives to {func_name}",
|
||||
metadata={
|
||||
"function": func_name,
|
||||
"risk": risk_type
|
||||
}
|
||||
))
|
||||
|
||||
return findings
|
||||
|
||||
def _is_false_positive_secret(self, value: str) -> bool:
|
||||
"""
|
||||
Check if a potential secret is likely a false positive.
|
||||
|
||||
Args:
|
||||
value: Potential secret value
|
||||
|
||||
Returns:
|
||||
True if likely false positive
|
||||
"""
|
||||
false_positive_patterns = [
|
||||
'example',
|
||||
'test',
|
||||
'demo',
|
||||
'sample',
|
||||
'dummy',
|
||||
'placeholder',
|
||||
'xxx',
|
||||
'123',
|
||||
'change',
|
||||
'your',
|
||||
'here'
|
||||
]
|
||||
|
||||
value_lower = value.lower()
|
||||
return any(pattern in value_lower for pattern in false_positive_patterns)
|
||||
25
backend/toolbox/modules/android/__init__.py
Normal file
25
backend/toolbox/modules/android/__init__.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""
|
||||
Android Security Modules
|
||||
|
||||
This package contains modules for android static code analysis and security testing.
|
||||
|
||||
Available modules:
|
||||
- MobSF: Mobile Security Framework
|
||||
- Jadx: Dex to Java decompiler
|
||||
- OpenGrep: Open-source pattern-based static analysis tool
|
||||
"""
|
||||
|
||||
from typing import List, Type
|
||||
from ..base import BaseModule
|
||||
|
||||
# Module registry for automatic discovery
|
||||
ANDROID_MODULES: List[Type[BaseModule]] = []
|
||||
|
||||
def register_module(module_class: Type[BaseModule]):
|
||||
"""Register a android security module"""
|
||||
ANDROID_MODULES.append(module_class)
|
||||
return module_class
|
||||
|
||||
def get_available_modules() -> List[Type[BaseModule]]:
|
||||
"""Get all available android modules"""
|
||||
return ANDROID_MODULES.copy()
|
||||
@@ -0,0 +1,15 @@
|
||||
rules:
|
||||
- id: clipboard-sensitive-data
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Sensitive data may be copied to the clipboard."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
category: security
|
||||
area: clipboard
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
pattern: "$CLIPBOARD.setPrimaryClip($CLIP)"
|
||||
@@ -0,0 +1,23 @@
|
||||
rules:
|
||||
- id: hardcoded-secrets
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Possible hardcoded secret found in variable '$NAME'."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M2
|
||||
category: secrets
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
patterns:
|
||||
- pattern-either:
|
||||
- pattern: 'String $NAME = "$VAL";'
|
||||
- pattern: 'final String $NAME = "$VAL";'
|
||||
- pattern: 'private String $NAME = "$VAL";'
|
||||
- pattern: 'public static String $NAME = "$VAL";'
|
||||
- pattern: 'static final String $NAME = "$VAL";'
|
||||
- pattern-regex: "$NAME =~ /(?i).*(api|key|token|secret|pass|auth|session|bearer|access|private).*/"
|
||||
|
||||
@@ -0,0 +1,18 @@
|
||||
rules:
|
||||
- id: insecure-data-storage
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Potential insecure data storage (external storage)."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M2
|
||||
category: security
|
||||
area: storage
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
pattern-either:
|
||||
- pattern: "$CTX.openFileOutput($NAME, $MODE)"
|
||||
- pattern: "Environment.getExternalStorageDirectory()"
|
||||
@@ -0,0 +1,16 @@
|
||||
rules:
|
||||
- id: insecure-deeplink
|
||||
severity: WARNING
|
||||
languages: [xml]
|
||||
message: "Potential insecure deeplink found in intent-filter."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
category: component
|
||||
area: manifest
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/AndroidManifest.xml"
|
||||
pattern: |
|
||||
<intent-filter>
|
||||
@@ -0,0 +1,21 @@
|
||||
rules:
|
||||
- id: insecure-logging
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Sensitive data logged via Android Log API."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M2
|
||||
category: logging
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
patterns:
|
||||
- pattern-either:
|
||||
- pattern: "Log.d($TAG, $MSG)"
|
||||
- pattern: "Log.e($TAG, $MSG)"
|
||||
- pattern: "System.out.println($MSG)"
|
||||
- pattern-regex: "$MSG =~ /(?i).*(password|token|secret|api|auth|session).*/"
|
||||
|
||||
@@ -0,0 +1,15 @@
|
||||
rules:
|
||||
- id: intent-redirection
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Potential intent redirection: using getIntent().getExtras() without validation."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
category: intent
|
||||
area: intercomponent
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
pattern: "$ACT.getIntent().getExtras()"
|
||||
@@ -0,0 +1,18 @@
|
||||
rules:
|
||||
- id: sensitive-data-in-shared-preferences
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Sensitive data may be stored in SharedPreferences. Please review the key '$KEY'."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M2
|
||||
category: security
|
||||
area: storage
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
patterns:
|
||||
- pattern: "$EDITOR.putString($KEY, $VAL);"
|
||||
- pattern-regex: "$KEY =~ /(?i).*(username|password|pass|token|auth_token|api_key|secret|sessionid|email).*/"
|
||||
@@ -0,0 +1,21 @@
|
||||
rules:
|
||||
- id: sqlite-injection
|
||||
severity: ERROR
|
||||
languages: [java]
|
||||
message: "Possible SQL injection: concatenated input in rawQuery or execSQL."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M7
|
||||
category: injection
|
||||
area: database
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
patterns:
|
||||
- pattern-either:
|
||||
- pattern: "$DB.rawQuery($QUERY, ...)"
|
||||
- pattern: "$DB.execSQL($QUERY)"
|
||||
- pattern-regex: "$QUERY =~ /.*\".*\".*\\+.*/"
|
||||
|
||||
@@ -0,0 +1,16 @@
|
||||
rules:
|
||||
- id: vulnerable-activity
|
||||
severity: WARNING
|
||||
languages: [xml]
|
||||
message: "Activity exported without permission."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
category: component
|
||||
area: manifest
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/AndroidManifest.xml"
|
||||
pattern: |
|
||||
<activity android:exported="true"
|
||||
@@ -0,0 +1,16 @@
|
||||
rules:
|
||||
- id: vulnerable-content-provider
|
||||
severity: WARNING
|
||||
languages: [xml]
|
||||
message: "ContentProvider exported without permission."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
category: component
|
||||
area: manifest
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/AndroidManifest.xml"
|
||||
pattern: |
|
||||
<provider android:exported="true"
|
||||
@@ -0,0 +1,16 @@
|
||||
rules:
|
||||
- id: vulnerable-service
|
||||
severity: WARNING
|
||||
languages: [xml]
|
||||
message: "Service exported without permission."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
category: component
|
||||
area: manifest
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/AndroidManifest.xml"
|
||||
pattern: |
|
||||
<service android:exported="true"
|
||||
@@ -0,0 +1,16 @@
|
||||
rules:
|
||||
- id: webview-javascript-enabled
|
||||
severity: ERROR
|
||||
languages: [java]
|
||||
message: "WebView with JavaScript enabled can be dangerous if loading untrusted content."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M7
|
||||
category: webview
|
||||
area: ui
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
pattern: "$W.getSettings().setJavaScriptEnabled(true)"
|
||||
@@ -0,0 +1,16 @@
|
||||
rules:
|
||||
- id: webview-load-arbitrary-url
|
||||
severity: WARNING
|
||||
languages: [java]
|
||||
message: "Loading unvalidated URL in WebView may cause open redirect or XSS."
|
||||
metadata:
|
||||
authors:
|
||||
- Guerric ELOI (FuzzingLabs)
|
||||
owasp-mobile: M7
|
||||
category: webview
|
||||
area: ui
|
||||
verification-level: [L1]
|
||||
paths:
|
||||
include:
|
||||
- "**/*.java"
|
||||
pattern: "$W.loadUrl($URL)"
|
||||
197
backend/toolbox/modules/android/jadx.py
Normal file
197
backend/toolbox/modules/android/jadx.py
Normal file
@@ -0,0 +1,197 @@
|
||||
"""Jadx APK Decompilation Module"""
|
||||
|
||||
import asyncio
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
import logging
|
||||
|
||||
from ..base import BaseModule, ModuleMetadata, ModuleResult
|
||||
from . import register_module
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@register_module
|
||||
class JadxModule(BaseModule):
|
||||
"""Module responsible for decompiling APK files with Jadx"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
return ModuleMetadata(
|
||||
name="jadx",
|
||||
version="1.5.0",
|
||||
description="Android APK decompilation using Jadx",
|
||||
author="FuzzForge Team",
|
||||
category="android",
|
||||
tags=["android", "jadx", "decompilation", "reverse"],
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"apk_path": {
|
||||
"type": "string",
|
||||
"description": "Path to the APK to decompile (absolute or relative to workspace)",
|
||||
},
|
||||
"output_dir": {
|
||||
"type": "string",
|
||||
"description": "Directory (relative to workspace) where Jadx output should be written",
|
||||
"default": "jadx_output",
|
||||
},
|
||||
"overwrite": {
|
||||
"type": "boolean",
|
||||
"description": "Overwrite existing output directory if present",
|
||||
"default": True,
|
||||
},
|
||||
"threads": {
|
||||
"type": "integer",
|
||||
"description": "Number of Jadx decompilation threads",
|
||||
"default": 4,
|
||||
},
|
||||
"decompiler_args": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Additional arguments passed directly to Jadx",
|
||||
},
|
||||
},
|
||||
"required": ["apk_path"],
|
||||
},
|
||||
output_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"output_dir": {"type": "string"},
|
||||
"source_dir": {"type": "string"},
|
||||
"resource_dir": {"type": "string"},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
apk_path = config.get("apk_path")
|
||||
if not apk_path:
|
||||
raise ValueError("'apk_path' must be provided for Jadx decompilation")
|
||||
|
||||
threads = config.get("threads", 4)
|
||||
if not isinstance(threads, int) or threads < 1 or threads > 32:
|
||||
raise ValueError("threads must be between 1 and 32")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
self.start_timer()
|
||||
|
||||
try:
|
||||
self.validate_config(config)
|
||||
|
||||
workspace = workspace.resolve()
|
||||
if not workspace.exists():
|
||||
raise ValueError(f"Workspace does not exist: {workspace}")
|
||||
|
||||
apk_path = Path(config["apk_path"])
|
||||
if not apk_path.is_absolute():
|
||||
apk_path = (workspace / apk_path).resolve()
|
||||
|
||||
if not apk_path.exists():
|
||||
raise ValueError(f"APK not found: {apk_path}")
|
||||
|
||||
if apk_path.is_dir():
|
||||
raise ValueError(f"APK path must be a file, not a directory: {apk_path}")
|
||||
|
||||
output_dir = Path(config.get("output_dir", "jadx_output"))
|
||||
if not output_dir.is_absolute():
|
||||
output_dir = (workspace / output_dir).resolve()
|
||||
|
||||
if output_dir.exists():
|
||||
if config.get("overwrite", True):
|
||||
shutil.rmtree(output_dir)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Output directory already exists: {output_dir}. Set overwrite=true to replace it."
|
||||
)
|
||||
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
threads = str(config.get("threads", 4))
|
||||
extra_args = config.get("decompiler_args", []) or []
|
||||
|
||||
cmd = [
|
||||
"jadx",
|
||||
"--threads-count",
|
||||
threads,
|
||||
"--deobf",
|
||||
"--output-dir",
|
||||
str(output_dir),
|
||||
]
|
||||
cmd.extend(extra_args)
|
||||
cmd.append(str(apk_path))
|
||||
|
||||
logger.info("Running Jadx decompilation: %s", " ".join(cmd))
|
||||
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=str(workspace),
|
||||
)
|
||||
|
||||
stdout, stderr = await process.communicate()
|
||||
stdout_str = stdout.decode(errors="ignore") if stdout else ""
|
||||
stderr_str = stderr.decode(errors="ignore") if stderr else ""
|
||||
|
||||
if stdout_str:
|
||||
logger.debug("Jadx stdout: %s", stdout_str[:200])
|
||||
if stderr_str:
|
||||
logger.debug("Jadx stderr: %s", stderr_str[:200])
|
||||
|
||||
if process.returncode != 0:
|
||||
error_output = stderr_str or stdout_str or "No error output"
|
||||
raise RuntimeError(
|
||||
f"Jadx failed with exit code {process.returncode}: {error_output[:500]}"
|
||||
)
|
||||
|
||||
logger.debug("Jadx stdout: %s", stdout.decode(errors="ignore")[:200])
|
||||
|
||||
source_dir = output_dir / "sources"
|
||||
resource_dir = output_dir / "resources"
|
||||
|
||||
if not source_dir.exists():
|
||||
logger.warning("Jadx sources directory not found at expected path: %s", source_dir)
|
||||
else:
|
||||
sample_files = []
|
||||
for idx, file_path in enumerate(source_dir.rglob("*.java")):
|
||||
sample_files.append(str(file_path))
|
||||
if idx >= 4:
|
||||
break
|
||||
logger.info("Sample Jadx Java files: %s", sample_files or "<none>")
|
||||
|
||||
java_files = 0
|
||||
if source_dir.exists():
|
||||
java_files = sum(1 for _ in source_dir.rglob("*.java"))
|
||||
|
||||
summary = {
|
||||
"output_dir": str(output_dir),
|
||||
"source_dir": str(source_dir if source_dir.exists() else output_dir),
|
||||
"resource_dir": str(resource_dir if resource_dir.exists() else output_dir),
|
||||
"java_files": java_files,
|
||||
}
|
||||
|
||||
metadata = {
|
||||
"apk_path": str(apk_path),
|
||||
"output_dir": str(output_dir),
|
||||
"source_dir": summary["source_dir"],
|
||||
"resource_dir": summary["resource_dir"],
|
||||
"threads": threads,
|
||||
}
|
||||
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="success",
|
||||
summary=summary,
|
||||
metadata=metadata,
|
||||
)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Jadx module failed: %s", exc)
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(exc),
|
||||
)
|
||||
293
backend/toolbox/modules/android/mobsf.py
Normal file
293
backend/toolbox/modules/android/mobsf.py
Normal file
@@ -0,0 +1,293 @@
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
from toolbox.modules.base import BaseModule, ModuleResult, ModuleMetadata, ModuleFinding
|
||||
import requests
|
||||
import os
|
||||
import time
|
||||
import json
|
||||
from collections import Counter
|
||||
|
||||
"""
|
||||
TODO:
|
||||
* Configure workspace storage for apk and reports
|
||||
* Think about mobsf repo implementation inside workflow
|
||||
* Curl mobsf pdf report
|
||||
* Save Json mobsf report
|
||||
* Export Web server interface from the Workflow docker
|
||||
"""
|
||||
|
||||
class MobSFModule(BaseModule):
|
||||
|
||||
def __init__(self):
|
||||
self.mobsf_url = "http://localhost:8877"
|
||||
self.file_path = ""
|
||||
self.api_key = ""
|
||||
self.scan_id = None
|
||||
self.scan_hash = ""
|
||||
self.report_file = ""
|
||||
self._metadata = self.get_metadata()
|
||||
self.start_timer() # <-- Add this line
|
||||
|
||||
|
||||
def upload_file(self):
|
||||
"""
|
||||
Upload file to MobSF VM
|
||||
Returns scan hash if upload succeeded
|
||||
"""
|
||||
# Ensure file_path is set and valid
|
||||
if not self.file_path or not os.path.isfile(self.file_path):
|
||||
raise ValueError("Invalid or missing file_path for upload.")
|
||||
|
||||
# Don't set Content-Type manually - let requests handle it
|
||||
# MobSF expects API key in X-Mobsf-Api-Key header
|
||||
headers = {'X-Mobsf-Api-Key': self.api_key}
|
||||
|
||||
# Keep the file open during the entire request
|
||||
with open(self.file_path, 'rb') as f:
|
||||
f.seek(0)
|
||||
# Extract just the filename from the full path
|
||||
filename = os.path.basename(self.file_path)
|
||||
files = {'file': (filename, f, 'application/vnd.android.package-archive')}
|
||||
|
||||
# Make the request while the file is still open
|
||||
response = requests.post(f"{self.mobsf_url}/api/v1/upload", files=files, headers=headers)
|
||||
|
||||
if response.status_code == 200:
|
||||
resp_json = response.json()
|
||||
if resp_json.get('hash'):
|
||||
print("[+] Upload succeeded, scan hash:", resp_json['hash'])
|
||||
return resp_json['hash']
|
||||
else:
|
||||
raise Exception(f"File upload failed: {resp_json}")
|
||||
else:
|
||||
raise Exception(f"Failed to upload file: {response.text}")
|
||||
|
||||
def start_scan(self, re_scan: int = 0, max_attempts: int = 10, delay: int = 3):
|
||||
"""
|
||||
Scan file that is already uploaded. Retries if scan is not ready.
|
||||
Returns scan result or raises Exception after max_attempts.
|
||||
"""
|
||||
print("[+] Starting scan for hash", self.scan_hash)
|
||||
data = {'hash': self.scan_hash}
|
||||
headers = {'X-Mobsf-Api-Key': self.api_key}
|
||||
response = requests.post(f"{self.mobsf_url}/api/v1/scan", data=data, headers=headers)
|
||||
if response.status_code == 200:
|
||||
try:
|
||||
result = response.json()
|
||||
# Heuristic: check for expected keys in result
|
||||
if result:
|
||||
print("[+] Scan succeeded for hash", self.scan_hash)
|
||||
return result
|
||||
except Exception as e:
|
||||
print(f"Error parsing scan result: {e}")
|
||||
|
||||
def get_json_results(self):
|
||||
"""
|
||||
Retrieve JSON results for the scanned file
|
||||
"""
|
||||
headers = {'X-Mobsf-Api-Key': self.api_key}
|
||||
data = {'hash': self.scan_hash}
|
||||
response = requests.post(f"{self.mobsf_url}/api/v1/report_json", data=data, headers=headers)
|
||||
if response.status_code == 200:
|
||||
f = open('dump.json', 'w').write(json.dumps(response.json(), indent=2))
|
||||
print("[+] Retrieved JSON results")
|
||||
return response.json()
|
||||
else:
|
||||
raise Exception(f"Failed to retrieve JSON results: {response.text}")
|
||||
|
||||
def create_summary(self, findings):
|
||||
"""
|
||||
Summarize findings by severity.
|
||||
Returns a dict like {'high': 3, 'info': 2, ...}
|
||||
"""
|
||||
severity_counter = Counter()
|
||||
for finding in findings:
|
||||
sev = getattr(finding, "severity", None)
|
||||
if sev is None and isinstance(finding, dict):
|
||||
sev = finding.get("severity")
|
||||
if sev:
|
||||
severity_counter[sev] += 1
|
||||
res = dict(severity_counter)
|
||||
print("Total Findings:", len(findings))
|
||||
print("Severity counts:")
|
||||
print(res)
|
||||
return res
|
||||
|
||||
|
||||
|
||||
def parse_json_results(self):
|
||||
if self.report_file=="" or not os.path.isfile(self.report_file):
|
||||
raise ValueError("Invalid or missing report_file for parsing.")
|
||||
f = open(self.report_file, 'r')
|
||||
data = json.load(f)
|
||||
|
||||
findings = []
|
||||
|
||||
# Check specific sections
|
||||
sections_to_parse = ['permissions', 'manifest_analysis', 'code_analysis', 'behaviour']
|
||||
|
||||
for section_name in sections_to_parse:
|
||||
if section_name in data:
|
||||
section = data[section_name]
|
||||
|
||||
#Permissions
|
||||
if section_name == 'permissions':
|
||||
for name, attrs in section.items():
|
||||
findings.append(self.create_finding(
|
||||
title=name,
|
||||
description=attrs.get('description'),
|
||||
severity=attrs.get('status'),
|
||||
category="permission",
|
||||
metadata={
|
||||
'info': attrs.get('info'),
|
||||
}
|
||||
))
|
||||
|
||||
#Manifest Analysis
|
||||
elif section_name == 'manifest_analysis':
|
||||
findings_list = section.get('manifest_findings', [])
|
||||
for x in findings_list:
|
||||
findings.append(self.create_finding(
|
||||
title=attrs.get('title') or attrs.get('name') or "unknown",
|
||||
description=attrs.get('description', "No description"),
|
||||
severity=attrs.get('severity', "unknown"),
|
||||
category=section_name,
|
||||
metadata={
|
||||
'tag': attrs.get('rule')
|
||||
}))
|
||||
#Code Analysis
|
||||
elif section_name == 'code_analysis':
|
||||
findings_list = section.get('findings', [])
|
||||
for name, attrs in findings_list.items():
|
||||
metadata = attrs.get('metadata', {})
|
||||
findings.append(self.create_finding(
|
||||
title=name,
|
||||
description=metadata.get('description'),
|
||||
severity=metadata.get('severity'),
|
||||
category="code_analysis",
|
||||
metadata={
|
||||
'cwe': metadata.get('cwe'),
|
||||
'owasp': metadata.get('owasp'),
|
||||
'files': attrs.get('file')
|
||||
}))
|
||||
|
||||
#Behaviour
|
||||
elif section_name == 'behaviour':
|
||||
finding = []
|
||||
for key, value in data['behaviour'].items():
|
||||
metadata = value.get('metadata', {})
|
||||
findings.append(self.create_finding(
|
||||
title="behaviour_"+metadata.get('label')[0],
|
||||
description=metadata.get('description'),
|
||||
severity=metadata.get('severity'),
|
||||
category="behaviour",
|
||||
metadata={
|
||||
'file': value.get('files', {})
|
||||
}
|
||||
))
|
||||
return findings
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
findings = []
|
||||
|
||||
#Checking that mobsf server is reachable
|
||||
self.mobsf_url = config.get("mobsf_url", "")
|
||||
self.file_path = config.get("file_path", "")
|
||||
# Get API key from config first, fallback to environment variable
|
||||
self.api_key = config.get("api_key", "") or os.environ.get("MOBSF_API_KEY", "")
|
||||
#Checking that the file to scan exists
|
||||
file_path = config.get("file_path", None)
|
||||
if not file_path or not os.path.isfile(file_path):
|
||||
raise ValueError(f"Invalid or missing file_path in configuration: {file_path}")
|
||||
|
||||
try:
|
||||
self.scan_hash = self.upload_file()
|
||||
except Exception as e:
|
||||
raise Exception(f"Failed to upload file to MobSF: {e}")
|
||||
|
||||
if self.scan_hash=="":
|
||||
raise Exception("scan_hash not returned after upload.")
|
||||
try:
|
||||
scan_result = self.start_scan()
|
||||
except Exception as e:
|
||||
raise Exception(f"Failed to scan file in MobSF: {e}")
|
||||
|
||||
# Parse scan_result and convert to findings
|
||||
# This is a placeholder; actual parsing logic will depend on MobSF's JSON structure
|
||||
# Here we just create a dummy finding for illustration
|
||||
|
||||
try:
|
||||
json_data = self.get_json_results()
|
||||
except json.JSONDecodeError:
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
summary={"error": "Invalid JSON output from MOBSF"},
|
||||
metadata={"engine": "mobsf", "file_scanned": file_path, "mobsf_url": root_uri}
|
||||
)
|
||||
|
||||
self.report_file = 'dump.json'
|
||||
findings = self.parse_json_results()
|
||||
"""
|
||||
findings.append(ModuleFinding(
|
||||
title="MobSF Finding",
|
||||
description="Finding generated by the MobSF module",
|
||||
severity="medium",
|
||||
category="mobsf",
|
||||
metadata={"scan_result": scan_result}
|
||||
))
|
||||
"""
|
||||
tmp_summary = self.create_summary(findings)
|
||||
summary = {
|
||||
"total_findings": len(findings),
|
||||
"dangerous_severity": tmp_summary.get('dangerous', 0),
|
||||
"warning_severity": tmp_summary.get('warning', 0),
|
||||
"high_severity": tmp_summary.get('high', 0),
|
||||
"medium_severity": tmp_summary.get('medium', 0),
|
||||
"low_severity": tmp_summary.get('low', 0),
|
||||
"info_severity": tmp_summary.get('info', 0),
|
||||
}
|
||||
metadata={"engine": "mobsf", "file_scanned": file_path, "mobsf_url": self.mobsf_url}#Add: "json_report": str(json_output_path
|
||||
|
||||
return self.create_result(findings=findings, status="success",summary=summary, metadata=metadata)
|
||||
return ModuleResult(findings=findings, status="success",summary=summary, metadata=metadata)
|
||||
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
return ModuleMetadata(
|
||||
name="Mobile Security Framework (MobSF)",
|
||||
version="1.0.0",
|
||||
description="Integrates MobSF for mobile app security scanning",
|
||||
author="FuzzForge Team",
|
||||
category="scanner",
|
||||
tags=["mobsf", "mobile", "sast", "scanner"]
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Config pattern:
|
||||
**config
|
||||
findings: []
|
||||
"tool_name": "FuzzForge Hello World",
|
||||
"tool_version": "1.0.0",
|
||||
"mobsf_uri": "(default: http://localhost:8000)",
|
||||
"file_path": "(path to the APK or IPA file to scan)"
|
||||
"""
|
||||
if "mobsf_url" in config and not isinstance(config["mobsf_url"], str):
|
||||
return False
|
||||
# Check that mobsf_url does not render 404 when curling /
|
||||
|
||||
if "file_path" in config and not isinstance(config["file_path"], str):
|
||||
return False
|
||||
return True
|
||||
|
||||
if __name__ == "__main__":
|
||||
import asyncio
|
||||
module = MobSFModule()
|
||||
config = {
|
||||
"mobsf_url": "http://localhost:8877",
|
||||
"file_path": "./toolbox/modules/android/beetlebug.apk",
|
||||
}
|
||||
workspace = Path("./toolbox/modules/android/")
|
||||
result = asyncio.run(module.execute(config, workspace))
|
||||
print(result)
|
||||
411
backend/toolbox/modules/android/opengrep.py
Normal file
411
backend/toolbox/modules/android/opengrep.py
Normal file
@@ -0,0 +1,411 @@
|
||||
"""
|
||||
OpenGrep Static Analysis Module
|
||||
|
||||
This module uses OpenGrep (open-source version of Semgrep) for pattern-based
|
||||
static analysis across multiple programming languages.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
import subprocess
|
||||
import logging
|
||||
|
||||
from ..base import BaseModule, ModuleMetadata, ModuleFinding, ModuleResult
|
||||
from . import register_module
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@register_module
|
||||
class OpenGrepModule(BaseModule):
|
||||
"""OpenGrep static analysis module"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="opengrep",
|
||||
version="1.45.0",
|
||||
description="Open-source pattern-based static analysis tool for security vulnerabilities",
|
||||
author="FuzzForge Team",
|
||||
category="static_analysis",
|
||||
tags=["sast", "pattern-matching", "multi-language", "security"],
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"config": {
|
||||
"type": "string",
|
||||
"enum": ["auto", "p/security-audit", "p/owasp-top-ten", "p/cwe-top-25"],
|
||||
"default": "auto",
|
||||
"description": "Rule configuration to use"
|
||||
},
|
||||
"custom_rules_path": {
|
||||
"type": "string",
|
||||
"description": "Path to a directory containing custom OpenGrep rules"
|
||||
},
|
||||
"languages": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Specific languages to analyze"
|
||||
},
|
||||
"include_patterns": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "File patterns to include"
|
||||
},
|
||||
"exclude_patterns": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "File patterns to exclude"
|
||||
},
|
||||
"max_target_bytes": {
|
||||
"type": "integer",
|
||||
"default": 1000000,
|
||||
"description": "Maximum file size to analyze (bytes)"
|
||||
},
|
||||
"timeout": {
|
||||
"type": "integer",
|
||||
"default": 300,
|
||||
"description": "Analysis timeout in seconds"
|
||||
},
|
||||
"severity": {
|
||||
"type": "array",
|
||||
"items": {"type": "string", "enum": ["ERROR", "WARNING", "INFO"]},
|
||||
"default": ["ERROR", "WARNING", "INFO"],
|
||||
"description": "Minimum severity levels to report"
|
||||
},
|
||||
"confidence": {
|
||||
"type": "array",
|
||||
"items": {"type": "string", "enum": ["HIGH", "MEDIUM", "LOW"]},
|
||||
"default": ["HIGH", "MEDIUM", "LOW"],
|
||||
"description": "Minimum confidence levels to report"
|
||||
}
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"rule_id": {"type": "string"},
|
||||
"severity": {"type": "string"},
|
||||
"confidence": {"type": "string"},
|
||||
"file_path": {"type": "string"},
|
||||
"line_number": {"type": "integer"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate configuration"""
|
||||
timeout = config.get("timeout", 300)
|
||||
if not isinstance(timeout, int) or timeout < 30 or timeout > 3600:
|
||||
raise ValueError("Timeout must be between 30 and 3600 seconds")
|
||||
|
||||
max_bytes = config.get("max_target_bytes", 1000000)
|
||||
if not isinstance(max_bytes, int) or max_bytes < 1000 or max_bytes > 10000000:
|
||||
raise ValueError("max_target_bytes must be between 1000 and 10000000")
|
||||
|
||||
custom_rules_path = config.get("custom_rules_path")
|
||||
if custom_rules_path:
|
||||
if not Path(custom_rules_path).is_dir():
|
||||
raise ValueError(f"Custom rules path must be a valid directory: {custom_rules_path}")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""Execute OpenGrep static analysis"""
|
||||
self.start_timer()
|
||||
|
||||
try:
|
||||
# Validate inputs
|
||||
self.validate_config(config)
|
||||
self.validate_workspace(workspace)
|
||||
|
||||
logger.info(f"Running OpenGrep analysis on {workspace}")
|
||||
|
||||
# Build opengrep command
|
||||
cmd = ["opengrep", "scan", "--json"]
|
||||
|
||||
# Add configuration
|
||||
custom_rules_path = config.get("custom_rules_path")
|
||||
use_custom_rules = False
|
||||
if custom_rules_path:
|
||||
cmd.extend(["--config", custom_rules_path])
|
||||
use_custom_rules = True
|
||||
else:
|
||||
config_type = config.get("config", "auto")
|
||||
if config_type == "auto":
|
||||
cmd.extend(["--config", "auto"])
|
||||
else:
|
||||
cmd.extend(["--config", config_type])
|
||||
|
||||
# Add timeout
|
||||
cmd.extend(["--timeout", str(config.get("timeout", 300))])
|
||||
|
||||
# Add max target bytes
|
||||
cmd.extend(["--max-target-bytes", str(config.get("max_target_bytes", 1000000))])
|
||||
|
||||
# Add languages if specified (but NOT when using custom rules, as rules define their own languages)
|
||||
if config.get("languages") and not use_custom_rules:
|
||||
langs = ",".join(config["languages"])
|
||||
cmd.extend(["--lang", langs])
|
||||
|
||||
# Add include patterns
|
||||
if config.get("include_patterns"):
|
||||
for pattern in config["include_patterns"]:
|
||||
cmd.extend(["--include", pattern])
|
||||
|
||||
# Add exclude patterns
|
||||
if config.get("exclude_patterns"):
|
||||
for pattern in config["exclude_patterns"]:
|
||||
cmd.extend(["--exclude", pattern])
|
||||
|
||||
# Add severity filter only if a single level is requested.
|
||||
severity_levels = config.get("severity", ["ERROR", "WARNING", "INFO"])
|
||||
if severity_levels and len(severity_levels) == 1:
|
||||
cmd.extend(["--severity", severity_levels[0]])
|
||||
|
||||
# Add confidence filter (if supported in this version)
|
||||
confidence_levels = config.get("confidence", ["HIGH", "MEDIUM"])
|
||||
if confidence_levels and len(confidence_levels) < 3: # Only if not all levels
|
||||
# Note: confidence filtering might need to be done post-processing
|
||||
pass
|
||||
|
||||
# Disable metrics collection
|
||||
cmd.append("--disable-version-check")
|
||||
cmd.append("--no-git-ignore")
|
||||
|
||||
# Add target directory
|
||||
cmd.append(str(workspace))
|
||||
|
||||
logger.debug(f"Running command: {' '.join(cmd)}")
|
||||
|
||||
# Run OpenGrep
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=workspace
|
||||
)
|
||||
|
||||
stdout, stderr = await process.communicate()
|
||||
|
||||
# Parse results
|
||||
findings = []
|
||||
if process.returncode in [0, 1]: # 0 = no findings, 1 = findings found
|
||||
findings = self._parse_opengrep_output(stdout.decode(), workspace, config)
|
||||
else:
|
||||
error_msg = stderr.decode()
|
||||
logger.error(f"OpenGrep failed: {error_msg}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=f"OpenGrep execution failed: {error_msg}"
|
||||
)
|
||||
|
||||
# Create summary
|
||||
summary = self._create_summary(findings)
|
||||
|
||||
logger.info(f"OpenGrep found {len(findings)} potential issues")
|
||||
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="success",
|
||||
summary=summary
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"OpenGrep module failed: {e}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _parse_opengrep_output(self, output: str, workspace: Path, config: Dict[str, Any]) -> List[ModuleFinding]:
|
||||
"""Parse OpenGrep JSON output into findings"""
|
||||
findings = []
|
||||
|
||||
if not output.strip():
|
||||
return findings
|
||||
|
||||
try:
|
||||
data = json.loads(output)
|
||||
results = data.get("results", [])
|
||||
logger.debug(f"OpenGrep returned {len(results)} raw results")
|
||||
|
||||
# Get filtering criteria
|
||||
allowed_severities = set(config.get("severity", ["ERROR", "WARNING", "INFO"]))
|
||||
allowed_confidences = set(config.get("confidence", ["HIGH", "MEDIUM", "LOW"]))
|
||||
|
||||
for result in results:
|
||||
# Extract basic info
|
||||
rule_id = result.get("check_id", "unknown")
|
||||
message = result.get("message", "")
|
||||
extra = result.get("extra", {})
|
||||
severity = extra.get("severity", "INFO").upper()
|
||||
|
||||
# File location info
|
||||
path_info = result.get("path", "")
|
||||
start_line = result.get("start", {}).get("line", 0)
|
||||
end_line = result.get("end", {}).get("line", 0)
|
||||
start_col = result.get("start", {}).get("col", 0)
|
||||
end_col = result.get("end", {}).get("col", 0)
|
||||
|
||||
# Code snippet
|
||||
lines = extra.get("lines", "")
|
||||
|
||||
# Metadata
|
||||
rule_metadata = extra.get("metadata", {})
|
||||
cwe = rule_metadata.get("cwe", [])
|
||||
owasp = rule_metadata.get("owasp", [])
|
||||
confidence = extra.get("confidence", rule_metadata.get("confidence", "MEDIUM")).upper()
|
||||
|
||||
# Apply severity filter
|
||||
if severity not in allowed_severities:
|
||||
continue
|
||||
|
||||
# Apply confidence filter
|
||||
if confidence not in allowed_confidences:
|
||||
continue
|
||||
|
||||
# Make file path relative to workspace
|
||||
if path_info:
|
||||
try:
|
||||
rel_path = Path(path_info).relative_to(workspace)
|
||||
path_info = str(rel_path)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Map severity to our standard levels
|
||||
finding_severity = self._map_severity(severity)
|
||||
|
||||
# Create finding
|
||||
finding = self.create_finding(
|
||||
title=f"Security issue: {rule_id}",
|
||||
description=message or f"OpenGrep rule {rule_id} triggered",
|
||||
severity=finding_severity,
|
||||
category=self._get_category(rule_id, extra),
|
||||
file_path=path_info if path_info else None,
|
||||
line_start=start_line if start_line > 0 else None,
|
||||
line_end=end_line if end_line > 0 and end_line != start_line else None,
|
||||
code_snippet=lines.strip() if lines else None,
|
||||
recommendation=self._get_recommendation(rule_id, extra),
|
||||
metadata={
|
||||
"rule_id": rule_id,
|
||||
"opengrep_severity": severity,
|
||||
"confidence": confidence,
|
||||
"cwe": cwe,
|
||||
"owasp": owasp,
|
||||
"fix": extra.get("fix", ""),
|
||||
"impact": extra.get("impact", ""),
|
||||
"likelihood": extra.get("likelihood", ""),
|
||||
"references": extra.get("references", [])
|
||||
}
|
||||
)
|
||||
|
||||
findings.append(finding)
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
logger.warning(f"Failed to parse OpenGrep output: {e}. Output snippet: {output[:200]}...")
|
||||
except Exception as e:
|
||||
logger.warning(f"Error processing OpenGrep results: {e}")
|
||||
|
||||
return findings
|
||||
|
||||
def _map_severity(self, opengrep_severity: str) -> str:
|
||||
"""Map OpenGrep severity to our standard severity levels"""
|
||||
severity_map = {
|
||||
"ERROR": "high",
|
||||
"WARNING": "medium",
|
||||
"INFO": "low"
|
||||
}
|
||||
return severity_map.get(opengrep_severity.upper(), "medium")
|
||||
|
||||
def _get_category(self, rule_id: str, extra: Dict[str, Any]) -> str:
|
||||
"""Determine finding category based on rule and metadata"""
|
||||
rule_metadata = extra.get("metadata", {})
|
||||
cwe_list = rule_metadata.get("cwe", [])
|
||||
owasp_list = rule_metadata.get("owasp", [])
|
||||
|
||||
# Check for common security categories
|
||||
if any("injection" in rule_id.lower() for x in [rule_id]):
|
||||
return "injection"
|
||||
elif any("xss" in rule_id.lower() for x in [rule_id]):
|
||||
return "xss"
|
||||
elif any("csrf" in rule_id.lower() for x in [rule_id]):
|
||||
return "csrf"
|
||||
elif any("auth" in rule_id.lower() for x in [rule_id]):
|
||||
return "authentication"
|
||||
elif any("crypto" in rule_id.lower() for x in [rule_id]):
|
||||
return "cryptography"
|
||||
elif cwe_list:
|
||||
return f"cwe-{cwe_list[0]}"
|
||||
elif owasp_list:
|
||||
return f"owasp-{owasp_list[0].replace(' ', '-').lower()}"
|
||||
else:
|
||||
return "security"
|
||||
|
||||
def _get_recommendation(self, rule_id: str, extra: Dict[str, Any]) -> str:
|
||||
"""Generate recommendation based on rule and metadata"""
|
||||
fix_suggestion = extra.get("fix", "")
|
||||
if fix_suggestion:
|
||||
return fix_suggestion
|
||||
|
||||
# Generic recommendations based on rule type
|
||||
if "injection" in rule_id.lower():
|
||||
return "Use parameterized queries or prepared statements to prevent injection attacks."
|
||||
elif "xss" in rule_id.lower():
|
||||
return "Properly encode/escape user input before displaying it in web pages."
|
||||
elif "crypto" in rule_id.lower():
|
||||
return "Use cryptographically secure algorithms and proper key management."
|
||||
elif "hardcode" in rule_id.lower():
|
||||
return "Remove hardcoded secrets and use secure configuration management."
|
||||
else:
|
||||
return "Review this security issue and apply appropriate fixes based on your security requirements."
|
||||
|
||||
def _create_summary(self, findings: List[ModuleFinding]) -> Dict[str, Any]:
|
||||
"""Create analysis summary"""
|
||||
severity_counts = {"critical": 0, "high": 0, "medium": 0, "low": 0}
|
||||
category_counts = {}
|
||||
rule_counts = {}
|
||||
|
||||
for finding in findings:
|
||||
# Count by severity
|
||||
severity_counts[finding.severity] += 1
|
||||
|
||||
# Count by category
|
||||
category = finding.category
|
||||
category_counts[category] = category_counts.get(category, 0) + 1
|
||||
|
||||
# Count by rule
|
||||
rule_id = finding.metadata.get("rule_id", "unknown")
|
||||
rule_counts[rule_id] = rule_counts.get(rule_id, 0) + 1
|
||||
|
||||
return {
|
||||
"total_findings": len(findings),
|
||||
"severity_counts": severity_counts,
|
||||
"category_counts": category_counts,
|
||||
"top_rules": dict(sorted(rule_counts.items(), key=lambda x: x[1], reverse=True)[:10]),
|
||||
"files_analyzed": len(set(f.file_path for f in findings if f.file_path))
|
||||
}
|
||||
272
backend/toolbox/modules/base.py
Normal file
272
backend/toolbox/modules/base.py
Normal file
@@ -0,0 +1,272 @@
|
||||
"""
|
||||
Base module interface for all FuzzForge modules
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from pydantic import BaseModel, Field
|
||||
from datetime import datetime
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ModuleMetadata(BaseModel):
|
||||
"""Metadata describing a module's capabilities and requirements"""
|
||||
name: str = Field(..., description="Module name")
|
||||
version: str = Field(..., description="Module version")
|
||||
description: str = Field(..., description="Module description")
|
||||
author: Optional[str] = Field(None, description="Module author")
|
||||
category: str = Field(..., description="Module category (scanner, analyzer, reporter, etc.)")
|
||||
tags: List[str] = Field(default_factory=list, description="Module tags")
|
||||
input_schema: Dict[str, Any] = Field(default_factory=dict, description="Expected input schema")
|
||||
output_schema: Dict[str, Any] = Field(default_factory=dict, description="Output schema")
|
||||
requires_workspace: bool = Field(True, description="Whether module requires workspace access")
|
||||
|
||||
|
||||
class ModuleFinding(BaseModel):
|
||||
"""Individual finding from a module"""
|
||||
id: str = Field(..., description="Unique finding ID")
|
||||
title: str = Field(..., description="Finding title")
|
||||
description: str = Field(..., description="Detailed description")
|
||||
severity: str = Field(..., description="Severity level (info, low, medium, high, critical)")
|
||||
category: str = Field(..., description="Finding category")
|
||||
file_path: Optional[str] = Field(None, description="Affected file path relative to workspace")
|
||||
line_start: Optional[int] = Field(None, description="Starting line number")
|
||||
line_end: Optional[int] = Field(None, description="Ending line number")
|
||||
code_snippet: Optional[str] = Field(None, description="Relevant code snippet")
|
||||
recommendation: Optional[str] = Field(None, description="Remediation recommendation")
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
|
||||
|
||||
|
||||
class ModuleResult(BaseModel):
|
||||
"""Standard result format from module execution"""
|
||||
module: str = Field(..., description="Module name")
|
||||
version: str = Field(..., description="Module version")
|
||||
status: str = Field(default="success", description="Execution status (success, partial, failed)")
|
||||
execution_time: float = Field(..., description="Execution time in seconds")
|
||||
findings: List[ModuleFinding] = Field(default_factory=list, description="List of findings")
|
||||
summary: Dict[str, Any] = Field(default_factory=dict, description="Summary statistics")
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
|
||||
error: Optional[str] = Field(None, description="Error message if failed")
|
||||
sarif: Optional[Dict[str, Any]] = Field(None, description="SARIF report if generated by reporter module")
|
||||
|
||||
|
||||
class BaseModule(ABC):
|
||||
"""
|
||||
Base interface for all security testing modules.
|
||||
|
||||
All modules must inherit from this class and implement the required methods.
|
||||
Modules are designed to be stateless and reusable across different workflows.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the module"""
|
||||
self._metadata = self.get_metadata()
|
||||
self._start_time = None
|
||||
logger.info(f"Initialized module: {self._metadata.name} v{self._metadata.version}")
|
||||
|
||||
@abstractmethod
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""
|
||||
Get module metadata.
|
||||
|
||||
Returns:
|
||||
ModuleMetadata object describing the module
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""
|
||||
Execute the module with given configuration and workspace.
|
||||
|
||||
Args:
|
||||
config: Module-specific configuration parameters
|
||||
workspace: Path to the mounted workspace directory
|
||||
|
||||
Returns:
|
||||
ModuleResult containing findings and metadata
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Validate the provided configuration against module requirements.
|
||||
|
||||
Args:
|
||||
config: Configuration to validate
|
||||
|
||||
Returns:
|
||||
True if configuration is valid, False otherwise
|
||||
|
||||
Raises:
|
||||
ValueError: If configuration is invalid with details
|
||||
"""
|
||||
pass
|
||||
|
||||
def validate_workspace(self, workspace: Path) -> bool:
|
||||
"""
|
||||
Validate that the workspace exists and is accessible.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
|
||||
Returns:
|
||||
True if workspace is valid
|
||||
|
||||
Raises:
|
||||
ValueError: If workspace is invalid
|
||||
"""
|
||||
if not workspace.exists():
|
||||
raise ValueError(f"Workspace does not exist: {workspace}")
|
||||
|
||||
if not workspace.is_dir():
|
||||
raise ValueError(f"Workspace is not a directory: {workspace}")
|
||||
|
||||
return True
|
||||
|
||||
def create_finding(
|
||||
self,
|
||||
title: str,
|
||||
description: str,
|
||||
severity: str,
|
||||
category: str,
|
||||
**kwargs
|
||||
) -> ModuleFinding:
|
||||
"""
|
||||
Helper method to create a standardized finding.
|
||||
|
||||
Args:
|
||||
title: Finding title
|
||||
description: Detailed description
|
||||
severity: Severity level
|
||||
category: Finding category
|
||||
**kwargs: Additional finding fields
|
||||
|
||||
Returns:
|
||||
ModuleFinding object
|
||||
"""
|
||||
import uuid
|
||||
finding_id = str(uuid.uuid4())
|
||||
|
||||
return ModuleFinding(
|
||||
id=finding_id,
|
||||
title=title,
|
||||
description=description,
|
||||
severity=severity,
|
||||
category=category,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
def start_timer(self):
|
||||
"""Start the execution timer"""
|
||||
from time import time
|
||||
self._start_time = time()
|
||||
|
||||
def get_execution_time(self) -> float:
|
||||
"""Get the execution time in seconds"""
|
||||
from time import time
|
||||
if self._start_time is None:
|
||||
return 0.0
|
||||
return time() - self._start_time
|
||||
|
||||
def create_result(
|
||||
self,
|
||||
findings: List[ModuleFinding],
|
||||
status: str = "success",
|
||||
summary: Dict[str, Any] = None,
|
||||
metadata: Dict[str, Any] = None,
|
||||
error: str = None
|
||||
) -> ModuleResult:
|
||||
"""
|
||||
Helper method to create a module result.
|
||||
|
||||
Args:
|
||||
findings: List of findings
|
||||
status: Execution status
|
||||
summary: Summary statistics
|
||||
metadata: Additional metadata
|
||||
error: Error message if failed
|
||||
|
||||
Returns:
|
||||
ModuleResult object
|
||||
"""
|
||||
return ModuleResult(
|
||||
module=self._metadata.name,
|
||||
version=self._metadata.version,
|
||||
status=status,
|
||||
execution_time=self.get_execution_time(),
|
||||
findings=findings,
|
||||
summary=summary or self._generate_summary(findings),
|
||||
metadata=metadata or {},
|
||||
error=error
|
||||
)
|
||||
|
||||
def _generate_summary(self, findings: List[ModuleFinding]) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate summary statistics from findings.
|
||||
|
||||
Args:
|
||||
findings: List of findings
|
||||
|
||||
Returns:
|
||||
Summary dictionary
|
||||
"""
|
||||
severity_counts = {
|
||||
"info": 0,
|
||||
"low": 0,
|
||||
"medium": 0,
|
||||
"high": 0,
|
||||
"critical": 0
|
||||
}
|
||||
|
||||
category_counts = {}
|
||||
|
||||
for finding in findings:
|
||||
# Count by severity
|
||||
if finding.severity in severity_counts:
|
||||
severity_counts[finding.severity] += 1
|
||||
|
||||
# Count by category
|
||||
if finding.category not in category_counts:
|
||||
category_counts[finding.category] = 0
|
||||
category_counts[finding.category] += 1
|
||||
|
||||
return {
|
||||
"total_findings": len(findings),
|
||||
"severity_counts": severity_counts,
|
||||
"category_counts": category_counts,
|
||||
"highest_severity": self._get_highest_severity(findings)
|
||||
}
|
||||
|
||||
def _get_highest_severity(self, findings: List[ModuleFinding]) -> str:
|
||||
"""
|
||||
Get the highest severity from findings.
|
||||
|
||||
Args:
|
||||
findings: List of findings
|
||||
|
||||
Returns:
|
||||
Highest severity level
|
||||
"""
|
||||
severity_order = ["critical", "high", "medium", "low", "info"]
|
||||
|
||||
for severity in severity_order:
|
||||
if any(f.severity == severity for f in findings):
|
||||
return severity
|
||||
|
||||
return "none"
|
||||
14
backend/toolbox/modules/reporter/__init__.py
Normal file
14
backend/toolbox/modules/reporter/__init__.py
Normal file
@@ -0,0 +1,14 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from .sarif_reporter import SARIFReporter
|
||||
|
||||
__all__ = ["SARIFReporter"]
|
||||
401
backend/toolbox/modules/reporter/sarif_reporter.py
Normal file
401
backend/toolbox/modules/reporter/sarif_reporter.py
Normal file
@@ -0,0 +1,401 @@
|
||||
"""
|
||||
SARIF Reporter Module - Generates SARIF-formatted security reports
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
from datetime import datetime
|
||||
import json
|
||||
|
||||
try:
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
except ImportError:
|
||||
try:
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
except ImportError:
|
||||
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SARIFReporter(BaseModule):
|
||||
"""
|
||||
Generates SARIF (Static Analysis Results Interchange Format) reports.
|
||||
|
||||
This module:
|
||||
- Converts findings to SARIF format
|
||||
- Aggregates results from multiple modules
|
||||
- Adds metadata and context
|
||||
- Provides actionable recommendations
|
||||
"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="sarif_reporter",
|
||||
version="1.0.0",
|
||||
description="Generates SARIF-formatted security reports",
|
||||
author="FuzzForge Team",
|
||||
category="reporter",
|
||||
tags=["reporting", "sarif", "output"],
|
||||
input_schema={
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"description": "List of findings to report",
|
||||
"required": True
|
||||
},
|
||||
"tool_name": {
|
||||
"type": "string",
|
||||
"description": "Name of the tool",
|
||||
"default": "FuzzForge Security Assessment"
|
||||
},
|
||||
"tool_version": {
|
||||
"type": "string",
|
||||
"description": "Tool version",
|
||||
"default": "1.0.0"
|
||||
},
|
||||
"include_code_flows": {
|
||||
"type": "boolean",
|
||||
"description": "Include code flow information",
|
||||
"default": False
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"sarif": {
|
||||
"type": "object",
|
||||
"description": "SARIF 2.1.0 formatted report"
|
||||
}
|
||||
},
|
||||
requires_workspace=False # Reporter doesn't need direct workspace access
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate module configuration"""
|
||||
if "findings" not in config and "modules_results" not in config:
|
||||
raise ValueError("Either 'findings' or 'modules_results' must be provided")
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path = None) -> ModuleResult:
|
||||
"""
|
||||
Execute the SARIF reporter module.
|
||||
|
||||
Args:
|
||||
config: Module configuration with findings
|
||||
workspace: Optional workspace path for context
|
||||
|
||||
Returns:
|
||||
ModuleResult with SARIF report
|
||||
"""
|
||||
self.start_timer()
|
||||
self.validate_config(config)
|
||||
|
||||
# Get configuration
|
||||
tool_name = config.get("tool_name", "FuzzForge Security Assessment")
|
||||
tool_version = config.get("tool_version", "1.0.0")
|
||||
include_code_flows = config.get("include_code_flows", False)
|
||||
|
||||
# Collect findings from either direct findings or module results
|
||||
all_findings = []
|
||||
|
||||
if "findings" in config:
|
||||
# Direct findings provided
|
||||
all_findings = config["findings"]
|
||||
if isinstance(all_findings, list) and all(isinstance(f, dict) for f in all_findings):
|
||||
# Convert dict findings to ModuleFinding objects
|
||||
all_findings = [ModuleFinding(**f) if isinstance(f, dict) else f for f in all_findings]
|
||||
elif "modules_results" in config:
|
||||
# Aggregate from module results
|
||||
for module_result in config["modules_results"]:
|
||||
if isinstance(module_result, dict):
|
||||
findings = module_result.get("findings", [])
|
||||
all_findings.extend(findings)
|
||||
elif hasattr(module_result, "findings"):
|
||||
all_findings.extend(module_result.findings)
|
||||
|
||||
logger.info(f"Generating SARIF report for {len(all_findings)} findings")
|
||||
|
||||
try:
|
||||
# Generate SARIF report
|
||||
sarif_report = self._generate_sarif(
|
||||
findings=all_findings,
|
||||
tool_name=tool_name,
|
||||
tool_version=tool_version,
|
||||
include_code_flows=include_code_flows,
|
||||
workspace_path=str(workspace) if workspace else None
|
||||
)
|
||||
|
||||
# Create summary
|
||||
summary = self._generate_report_summary(all_findings)
|
||||
|
||||
return ModuleResult(
|
||||
module=self.get_metadata().name,
|
||||
version=self.get_metadata().version,
|
||||
status="success",
|
||||
execution_time=self.get_execution_time(),
|
||||
findings=[], # Reporter doesn't generate new findings
|
||||
summary=summary,
|
||||
metadata={
|
||||
"tool_name": tool_name,
|
||||
"tool_version": tool_version,
|
||||
"report_format": "SARIF 2.1.0",
|
||||
"total_findings": len(all_findings)
|
||||
},
|
||||
error=None,
|
||||
sarif=sarif_report # Add SARIF as custom field
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"SARIF reporter failed: {e}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _generate_sarif(
|
||||
self,
|
||||
findings: List[ModuleFinding],
|
||||
tool_name: str,
|
||||
tool_version: str,
|
||||
include_code_flows: bool,
|
||||
workspace_path: str = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate SARIF 2.1.0 formatted report.
|
||||
|
||||
Args:
|
||||
findings: List of findings to report
|
||||
tool_name: Name of the tool
|
||||
tool_version: Tool version
|
||||
include_code_flows: Whether to include code flow information
|
||||
workspace_path: Optional workspace path
|
||||
|
||||
Returns:
|
||||
SARIF formatted dictionary
|
||||
"""
|
||||
# Create rules from unique finding types
|
||||
rules = self._create_rules(findings)
|
||||
|
||||
# Create results from findings
|
||||
results = self._create_results(findings, include_code_flows)
|
||||
|
||||
# Build SARIF structure
|
||||
sarif = {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": tool_name,
|
||||
"version": tool_version,
|
||||
"informationUri": "https://fuzzforge.io",
|
||||
"rules": rules
|
||||
}
|
||||
},
|
||||
"results": results,
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": True,
|
||||
"endTimeUtc": datetime.utcnow().isoformat() + "Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
# Add workspace information if available
|
||||
if workspace_path:
|
||||
sarif["runs"][0]["originalUriBaseIds"] = {
|
||||
"WORKSPACE": {
|
||||
"uri": f"file://{workspace_path}/",
|
||||
"description": "The workspace root directory"
|
||||
}
|
||||
}
|
||||
|
||||
return sarif
|
||||
|
||||
def _create_rules(self, findings: List[ModuleFinding]) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Create SARIF rules from findings.
|
||||
|
||||
Args:
|
||||
findings: List of findings
|
||||
|
||||
Returns:
|
||||
List of SARIF rule objects
|
||||
"""
|
||||
rules_dict = {}
|
||||
|
||||
for finding in findings:
|
||||
rule_id = f"{finding.category}_{finding.severity}"
|
||||
|
||||
if rule_id not in rules_dict:
|
||||
rules_dict[rule_id] = {
|
||||
"id": rule_id,
|
||||
"name": finding.category.replace("_", " ").title(),
|
||||
"shortDescription": {
|
||||
"text": f"{finding.category} vulnerability"
|
||||
},
|
||||
"fullDescription": {
|
||||
"text": f"Detection rule for {finding.category} vulnerabilities with {finding.severity} severity"
|
||||
},
|
||||
"defaultConfiguration": {
|
||||
"level": self._severity_to_sarif_level(finding.severity)
|
||||
},
|
||||
"properties": {
|
||||
"category": finding.category,
|
||||
"severity": finding.severity,
|
||||
"tags": ["security", finding.category, finding.severity]
|
||||
}
|
||||
}
|
||||
|
||||
return list(rules_dict.values())
|
||||
|
||||
def _create_results(
|
||||
self, findings: List[ModuleFinding], include_code_flows: bool
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Create SARIF results from findings.
|
||||
|
||||
Args:
|
||||
findings: List of findings
|
||||
include_code_flows: Whether to include code flows
|
||||
|
||||
Returns:
|
||||
List of SARIF result objects
|
||||
"""
|
||||
results = []
|
||||
|
||||
for finding in findings:
|
||||
result = {
|
||||
"ruleId": f"{finding.category}_{finding.severity}",
|
||||
"level": self._severity_to_sarif_level(finding.severity),
|
||||
"message": {
|
||||
"text": finding.description
|
||||
},
|
||||
"locations": []
|
||||
}
|
||||
|
||||
# Add location information if available
|
||||
if finding.file_path:
|
||||
location = {
|
||||
"physicalLocation": {
|
||||
"artifactLocation": {
|
||||
"uri": finding.file_path,
|
||||
"uriBaseId": "WORKSPACE"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Add line information if available
|
||||
if finding.line_start:
|
||||
location["physicalLocation"]["region"] = {
|
||||
"startLine": finding.line_start
|
||||
}
|
||||
if finding.line_end:
|
||||
location["physicalLocation"]["region"]["endLine"] = finding.line_end
|
||||
|
||||
# Add code snippet if available
|
||||
if finding.code_snippet:
|
||||
location["physicalLocation"]["region"]["snippet"] = {
|
||||
"text": finding.code_snippet
|
||||
}
|
||||
|
||||
result["locations"].append(location)
|
||||
|
||||
# Add fix suggestions if available
|
||||
if finding.recommendation:
|
||||
result["fixes"] = [
|
||||
{
|
||||
"description": {
|
||||
"text": finding.recommendation
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
# Add properties
|
||||
result["properties"] = {
|
||||
"findingId": finding.id,
|
||||
"title": finding.title,
|
||||
"metadata": finding.metadata
|
||||
}
|
||||
|
||||
results.append(result)
|
||||
|
||||
return results
|
||||
|
||||
def _severity_to_sarif_level(self, severity: str) -> str:
|
||||
"""
|
||||
Convert severity to SARIF level.
|
||||
|
||||
Args:
|
||||
severity: Finding severity
|
||||
|
||||
Returns:
|
||||
SARIF level string
|
||||
"""
|
||||
mapping = {
|
||||
"critical": "error",
|
||||
"high": "error",
|
||||
"medium": "warning",
|
||||
"low": "note",
|
||||
"info": "none"
|
||||
}
|
||||
return mapping.get(severity.lower(), "warning")
|
||||
|
||||
def _generate_report_summary(self, findings: List[ModuleFinding]) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate summary statistics for the report.
|
||||
|
||||
Args:
|
||||
findings: List of findings
|
||||
|
||||
Returns:
|
||||
Summary dictionary
|
||||
"""
|
||||
severity_counts = {
|
||||
"critical": 0,
|
||||
"high": 0,
|
||||
"medium": 0,
|
||||
"low": 0,
|
||||
"info": 0
|
||||
}
|
||||
|
||||
category_counts = {}
|
||||
affected_files = set()
|
||||
|
||||
for finding in findings:
|
||||
# Count by severity
|
||||
if finding.severity in severity_counts:
|
||||
severity_counts[finding.severity] += 1
|
||||
|
||||
# Count by category
|
||||
if finding.category not in category_counts:
|
||||
category_counts[finding.category] = 0
|
||||
category_counts[finding.category] += 1
|
||||
|
||||
# Track affected files
|
||||
if finding.file_path:
|
||||
affected_files.add(finding.file_path)
|
||||
|
||||
return {
|
||||
"total_findings": len(findings),
|
||||
"severity_distribution": severity_counts,
|
||||
"category_distribution": category_counts,
|
||||
"affected_files": len(affected_files),
|
||||
"report_format": "SARIF 2.1.0",
|
||||
"generated_at": datetime.utcnow().isoformat()
|
||||
}
|
||||
14
backend/toolbox/modules/scanner/__init__.py
Normal file
14
backend/toolbox/modules/scanner/__init__.py
Normal file
@@ -0,0 +1,14 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from .file_scanner import FileScanner
|
||||
|
||||
__all__ = ["FileScanner"]
|
||||
315
backend/toolbox/modules/scanner/file_scanner.py
Normal file
315
backend/toolbox/modules/scanner/file_scanner.py
Normal file
@@ -0,0 +1,315 @@
|
||||
"""
|
||||
File Scanner Module - Scans and enumerates files in the workspace
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import logging
|
||||
import mimetypes
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
import hashlib
|
||||
|
||||
try:
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
except ImportError:
|
||||
try:
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
except ImportError:
|
||||
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class FileScanner(BaseModule):
|
||||
"""
|
||||
Scans files in the mounted workspace and collects information.
|
||||
|
||||
This module:
|
||||
- Enumerates files based on patterns
|
||||
- Detects file types
|
||||
- Calculates file hashes
|
||||
- Identifies potentially sensitive files
|
||||
"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="file_scanner",
|
||||
version="1.0.0",
|
||||
description="Scans and enumerates files in the workspace",
|
||||
author="FuzzForge Team",
|
||||
category="scanner",
|
||||
tags=["files", "enumeration", "discovery"],
|
||||
input_schema={
|
||||
"patterns": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "File patterns to scan (e.g., ['*.py', '*.js'])",
|
||||
"default": ["*"]
|
||||
},
|
||||
"max_file_size": {
|
||||
"type": "integer",
|
||||
"description": "Maximum file size to scan in bytes",
|
||||
"default": 10485760 # 10MB
|
||||
},
|
||||
"check_sensitive": {
|
||||
"type": "boolean",
|
||||
"description": "Check for sensitive file patterns",
|
||||
"default": True
|
||||
},
|
||||
"calculate_hashes": {
|
||||
"type": "boolean",
|
||||
"description": "Calculate SHA256 hashes for files",
|
||||
"default": False
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"description": "List of discovered files with metadata"
|
||||
}
|
||||
},
|
||||
requires_workspace=True
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate module configuration"""
|
||||
patterns = config.get("patterns", ["*"])
|
||||
if not isinstance(patterns, list):
|
||||
raise ValueError("patterns must be a list")
|
||||
|
||||
max_size = config.get("max_file_size", 10485760)
|
||||
if not isinstance(max_size, int) or max_size <= 0:
|
||||
raise ValueError("max_file_size must be a positive integer")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""
|
||||
Execute the file scanning module.
|
||||
|
||||
Args:
|
||||
config: Module configuration
|
||||
workspace: Path to the workspace directory
|
||||
|
||||
Returns:
|
||||
ModuleResult with file findings
|
||||
"""
|
||||
self.start_timer()
|
||||
self.validate_workspace(workspace)
|
||||
self.validate_config(config)
|
||||
|
||||
findings = []
|
||||
file_count = 0
|
||||
total_size = 0
|
||||
file_types = {}
|
||||
|
||||
# Get configuration
|
||||
patterns = config.get("patterns", ["*"])
|
||||
max_file_size = config.get("max_file_size", 10485760)
|
||||
check_sensitive = config.get("check_sensitive", True)
|
||||
calculate_hashes = config.get("calculate_hashes", False)
|
||||
|
||||
logger.info(f"Scanning workspace with patterns: {patterns}")
|
||||
|
||||
try:
|
||||
# Scan for each pattern
|
||||
for pattern in patterns:
|
||||
for file_path in workspace.rglob(pattern):
|
||||
if not file_path.is_file():
|
||||
continue
|
||||
|
||||
file_count += 1
|
||||
relative_path = file_path.relative_to(workspace)
|
||||
|
||||
# Get file stats
|
||||
try:
|
||||
stats = file_path.stat()
|
||||
file_size = stats.st_size
|
||||
total_size += file_size
|
||||
|
||||
# Skip large files
|
||||
if file_size > max_file_size:
|
||||
logger.warning(f"Skipping large file: {relative_path} ({file_size} bytes)")
|
||||
continue
|
||||
|
||||
# Detect file type
|
||||
file_type = self._detect_file_type(file_path)
|
||||
if file_type not in file_types:
|
||||
file_types[file_type] = 0
|
||||
file_types[file_type] += 1
|
||||
|
||||
# Check for sensitive files
|
||||
if check_sensitive and self._is_sensitive_file(file_path):
|
||||
findings.append(self.create_finding(
|
||||
title=f"Potentially sensitive file: {relative_path.name}",
|
||||
description=f"Found potentially sensitive file at {relative_path}",
|
||||
severity="medium",
|
||||
category="sensitive_file",
|
||||
file_path=str(relative_path),
|
||||
metadata={
|
||||
"file_size": file_size,
|
||||
"file_type": file_type
|
||||
}
|
||||
))
|
||||
|
||||
# Calculate hash if requested
|
||||
file_hash = None
|
||||
if calculate_hashes and file_size < 1048576: # Only hash files < 1MB
|
||||
file_hash = self._calculate_hash(file_path)
|
||||
|
||||
# Create informational finding for each file
|
||||
findings.append(self.create_finding(
|
||||
title=f"File discovered: {relative_path.name}",
|
||||
description=f"File: {relative_path}",
|
||||
severity="info",
|
||||
category="file_enumeration",
|
||||
file_path=str(relative_path),
|
||||
metadata={
|
||||
"file_size": file_size,
|
||||
"file_type": file_type,
|
||||
"file_hash": file_hash
|
||||
}
|
||||
))
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing file {relative_path}: {e}")
|
||||
|
||||
# Create summary
|
||||
summary = {
|
||||
"total_files": file_count,
|
||||
"total_size_bytes": total_size,
|
||||
"file_types": file_types,
|
||||
"patterns_scanned": patterns
|
||||
}
|
||||
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="success",
|
||||
summary=summary,
|
||||
metadata={
|
||||
"workspace": str(workspace),
|
||||
"config": config
|
||||
}
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"File scanner failed: {e}")
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _detect_file_type(self, file_path: Path) -> str:
|
||||
"""
|
||||
Detect the type of a file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file
|
||||
|
||||
Returns:
|
||||
File type string
|
||||
"""
|
||||
# Try to determine from extension
|
||||
mime_type, _ = mimetypes.guess_type(str(file_path))
|
||||
if mime_type:
|
||||
return mime_type
|
||||
|
||||
# Check by extension
|
||||
ext = file_path.suffix.lower()
|
||||
type_map = {
|
||||
'.py': 'text/x-python',
|
||||
'.js': 'application/javascript',
|
||||
'.java': 'text/x-java',
|
||||
'.cpp': 'text/x-c++',
|
||||
'.c': 'text/x-c',
|
||||
'.go': 'text/x-go',
|
||||
'.rs': 'text/x-rust',
|
||||
'.rb': 'text/x-ruby',
|
||||
'.php': 'text/x-php',
|
||||
'.yaml': 'text/yaml',
|
||||
'.yml': 'text/yaml',
|
||||
'.json': 'application/json',
|
||||
'.xml': 'text/xml',
|
||||
'.md': 'text/markdown',
|
||||
'.txt': 'text/plain',
|
||||
'.sh': 'text/x-shellscript',
|
||||
'.bat': 'text/x-batch',
|
||||
'.ps1': 'text/x-powershell'
|
||||
}
|
||||
|
||||
return type_map.get(ext, 'application/octet-stream')
|
||||
|
||||
def _is_sensitive_file(self, file_path: Path) -> bool:
|
||||
"""
|
||||
Check if a file might contain sensitive information.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file
|
||||
|
||||
Returns:
|
||||
True if potentially sensitive
|
||||
"""
|
||||
sensitive_patterns = [
|
||||
'.env',
|
||||
'.env.local',
|
||||
'.env.production',
|
||||
'credentials',
|
||||
'password',
|
||||
'secret',
|
||||
'private_key',
|
||||
'id_rsa',
|
||||
'id_dsa',
|
||||
'.pem',
|
||||
'.key',
|
||||
'.pfx',
|
||||
'.p12',
|
||||
'wallet',
|
||||
'.ssh',
|
||||
'token',
|
||||
'api_key',
|
||||
'config.json',
|
||||
'settings.json',
|
||||
'.git-credentials',
|
||||
'.npmrc',
|
||||
'.pypirc',
|
||||
'.docker/config.json'
|
||||
]
|
||||
|
||||
file_name_lower = file_path.name.lower()
|
||||
for pattern in sensitive_patterns:
|
||||
if pattern in file_name_lower:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _calculate_hash(self, file_path: Path) -> str:
|
||||
"""
|
||||
Calculate SHA256 hash of a file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file
|
||||
|
||||
Returns:
|
||||
Hex string of SHA256 hash
|
||||
"""
|
||||
try:
|
||||
sha256_hash = hashlib.sha256()
|
||||
with open(file_path, "rb") as f:
|
||||
for byte_block in iter(lambda: f.read(4096), b""):
|
||||
sha256_hash.update(byte_block)
|
||||
return sha256_hash.hexdigest()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to calculate hash for {file_path}: {e}")
|
||||
return None
|
||||
36
backend/toolbox/modules/secret_detection/__init__.py
Normal file
36
backend/toolbox/modules/secret_detection/__init__.py
Normal file
@@ -0,0 +1,36 @@
|
||||
"""
|
||||
Secret Detection Modules
|
||||
|
||||
This package contains modules for detecting secrets, credentials, and sensitive information
|
||||
in codebases and repositories.
|
||||
|
||||
Available modules:
|
||||
- TruffleHog: Comprehensive secret detection with verification
|
||||
- Gitleaks: Git-specific secret scanning and leak detection
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
from typing import List, Type
|
||||
from ..base import BaseModule
|
||||
|
||||
# Module registry for automatic discovery
|
||||
SECRET_DETECTION_MODULES: List[Type[BaseModule]] = []
|
||||
|
||||
def register_module(module_class: Type[BaseModule]):
|
||||
"""Register a secret detection module"""
|
||||
SECRET_DETECTION_MODULES.append(module_class)
|
||||
return module_class
|
||||
|
||||
def get_available_modules() -> List[Type[BaseModule]]:
|
||||
"""Get all available secret detection modules"""
|
||||
return SECRET_DETECTION_MODULES.copy()
|
||||
351
backend/toolbox/modules/secret_detection/gitleaks.py
Normal file
351
backend/toolbox/modules/secret_detection/gitleaks.py
Normal file
@@ -0,0 +1,351 @@
|
||||
"""
|
||||
Gitleaks Secret Detection Module
|
||||
|
||||
This module uses Gitleaks to detect secrets and sensitive information in Git repositories
|
||||
and file systems.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
import subprocess
|
||||
import logging
|
||||
|
||||
from ..base import BaseModule, ModuleMetadata, ModuleFinding, ModuleResult
|
||||
from . import register_module
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@register_module
|
||||
class GitleaksModule(BaseModule):
|
||||
"""Gitleaks secret detection module"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="gitleaks",
|
||||
version="8.18.0",
|
||||
description="Git-specific secret scanning and leak detection using Gitleaks",
|
||||
author="FuzzForge Team",
|
||||
category="secret_detection",
|
||||
tags=["secrets", "git", "leak-detection", "credentials"],
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"scan_mode": {
|
||||
"type": "string",
|
||||
"enum": ["detect", "protect"],
|
||||
"default": "detect",
|
||||
"description": "Scan mode: detect (entire repo history) or protect (staged changes)"
|
||||
},
|
||||
"config_file": {
|
||||
"type": "string",
|
||||
"description": "Path to custom Gitleaks configuration file"
|
||||
},
|
||||
"baseline_file": {
|
||||
"type": "string",
|
||||
"description": "Path to baseline file to ignore known findings"
|
||||
},
|
||||
"max_target_megabytes": {
|
||||
"type": "integer",
|
||||
"default": 100,
|
||||
"description": "Maximum size of files to scan (in MB)"
|
||||
},
|
||||
"redact": {
|
||||
"type": "boolean",
|
||||
"default": True,
|
||||
"description": "Redact secrets in output"
|
||||
},
|
||||
"no_git": {
|
||||
"type": "boolean",
|
||||
"default": False,
|
||||
"description": "Scan files without Git context"
|
||||
}
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"rule_id": {"type": "string"},
|
||||
"category": {"type": "string"},
|
||||
"file_path": {"type": "string"},
|
||||
"line_number": {"type": "integer"},
|
||||
"secret": {"type": "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate configuration"""
|
||||
scan_mode = config.get("scan_mode", "detect")
|
||||
if scan_mode not in ["detect", "protect"]:
|
||||
raise ValueError("scan_mode must be 'detect' or 'protect'")
|
||||
|
||||
max_size = config.get("max_target_megabytes", 100)
|
||||
if not isinstance(max_size, int) or max_size < 1 or max_size > 1000:
|
||||
raise ValueError("max_target_megabytes must be between 1 and 1000")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""Execute Gitleaks secret detection"""
|
||||
self.start_timer()
|
||||
|
||||
try:
|
||||
# Validate inputs
|
||||
self.validate_config(config)
|
||||
self.validate_workspace(workspace)
|
||||
|
||||
logger.info(f"Running Gitleaks on {workspace}")
|
||||
|
||||
# Build Gitleaks command
|
||||
scan_mode = config.get("scan_mode", "detect")
|
||||
cmd = ["gitleaks", scan_mode]
|
||||
|
||||
# Add source path
|
||||
cmd.extend(["--source", str(workspace)])
|
||||
|
||||
# Create temp file for JSON output
|
||||
import tempfile
|
||||
output_file = tempfile.NamedTemporaryFile(mode='w+', suffix='.json', delete=False)
|
||||
output_path = output_file.name
|
||||
output_file.close()
|
||||
|
||||
# Add report format and output file
|
||||
cmd.extend(["--report-format", "json"])
|
||||
cmd.extend(["--report-path", output_path])
|
||||
|
||||
# Add redact option
|
||||
if config.get("redact", True):
|
||||
cmd.append("--redact")
|
||||
|
||||
# Add max target size
|
||||
max_size = config.get("max_target_megabytes", 100)
|
||||
cmd.extend(["--max-target-megabytes", str(max_size)])
|
||||
|
||||
# Add config file if specified
|
||||
if config.get("config_file"):
|
||||
config_path = Path(config["config_file"])
|
||||
if config_path.exists():
|
||||
cmd.extend(["--config", str(config_path)])
|
||||
|
||||
# Add baseline file if specified
|
||||
if config.get("baseline_file"):
|
||||
baseline_path = Path(config["baseline_file"])
|
||||
if baseline_path.exists():
|
||||
cmd.extend(["--baseline-path", str(baseline_path)])
|
||||
|
||||
# Add no-git flag if specified
|
||||
if config.get("no_git", False):
|
||||
cmd.append("--no-git")
|
||||
|
||||
# Add verbose output
|
||||
cmd.append("--verbose")
|
||||
|
||||
logger.debug(f"Running command: {' '.join(cmd)}")
|
||||
|
||||
# Run Gitleaks
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=workspace
|
||||
)
|
||||
|
||||
stdout, stderr = await process.communicate()
|
||||
|
||||
# Parse results
|
||||
findings = []
|
||||
try:
|
||||
# Read the JSON output from file
|
||||
with open(output_path, 'r') as f:
|
||||
output_content = f.read()
|
||||
|
||||
if process.returncode == 0:
|
||||
# No secrets found
|
||||
logger.info("No secrets detected by Gitleaks")
|
||||
elif process.returncode == 1:
|
||||
# Secrets found - parse from file content
|
||||
findings = self._parse_gitleaks_output(output_content, workspace)
|
||||
else:
|
||||
# Error occurred
|
||||
error_msg = stderr.decode()
|
||||
logger.error(f"Gitleaks failed: {error_msg}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=f"Gitleaks execution failed: {error_msg}"
|
||||
)
|
||||
finally:
|
||||
# Clean up temp file
|
||||
import os
|
||||
try:
|
||||
os.unlink(output_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
# Create summary
|
||||
summary = {
|
||||
"total_leaks": len(findings),
|
||||
"unique_rules": len(set(f.metadata.get("rule_id", "") for f in findings)),
|
||||
"files_with_leaks": len(set(f.file_path for f in findings if f.file_path)),
|
||||
"scan_mode": scan_mode
|
||||
}
|
||||
|
||||
logger.info(f"Gitleaks found {len(findings)} potential leaks")
|
||||
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="success",
|
||||
summary=summary
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Gitleaks module failed: {e}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _parse_gitleaks_output(self, output: str, workspace: Path) -> List[ModuleFinding]:
|
||||
"""Parse Gitleaks JSON output into findings"""
|
||||
findings = []
|
||||
|
||||
if not output.strip():
|
||||
return findings
|
||||
|
||||
try:
|
||||
# Gitleaks outputs JSON array
|
||||
results = json.loads(output)
|
||||
if not isinstance(results, list):
|
||||
logger.warning("Unexpected Gitleaks output format")
|
||||
return findings
|
||||
|
||||
for result in results:
|
||||
# Extract information
|
||||
rule_id = result.get("RuleID", "unknown")
|
||||
description = result.get("Description", "")
|
||||
file_path = result.get("File", "")
|
||||
line_number = result.get("LineNumber", 0)
|
||||
secret = result.get("Secret", "")
|
||||
match_text = result.get("Match", "")
|
||||
|
||||
# Commit info (if available)
|
||||
commit = result.get("Commit", "")
|
||||
author = result.get("Author", "")
|
||||
email = result.get("Email", "")
|
||||
date = result.get("Date", "")
|
||||
|
||||
# Make file path relative to workspace
|
||||
if file_path:
|
||||
try:
|
||||
rel_path = Path(file_path).relative_to(workspace)
|
||||
file_path = str(rel_path)
|
||||
except ValueError:
|
||||
# If file is outside workspace, keep absolute path
|
||||
pass
|
||||
|
||||
# Determine severity based on rule type
|
||||
severity = self._get_leak_severity(rule_id, description)
|
||||
|
||||
# Create finding
|
||||
finding = self.create_finding(
|
||||
title=f"Secret leak detected: {rule_id}",
|
||||
description=self._get_leak_description(rule_id, description, commit),
|
||||
severity=severity,
|
||||
category="secret_leak",
|
||||
file_path=file_path if file_path else None,
|
||||
line_start=line_number if line_number > 0 else None,
|
||||
code_snippet=match_text if match_text else secret,
|
||||
recommendation=self._get_leak_recommendation(rule_id),
|
||||
metadata={
|
||||
"rule_id": rule_id,
|
||||
"secret_type": description,
|
||||
"commit": commit,
|
||||
"author": author,
|
||||
"email": email,
|
||||
"date": date,
|
||||
"entropy": result.get("Entropy", 0),
|
||||
"fingerprint": result.get("Fingerprint", "")
|
||||
}
|
||||
)
|
||||
|
||||
findings.append(finding)
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
logger.warning(f"Failed to parse Gitleaks output: {e}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Error processing Gitleaks results: {e}")
|
||||
|
||||
return findings
|
||||
|
||||
def _get_leak_severity(self, rule_id: str, description: str) -> str:
|
||||
"""Determine severity based on secret type"""
|
||||
critical_patterns = [
|
||||
"aws", "amazon", "gcp", "google", "azure", "microsoft",
|
||||
"private_key", "rsa", "ssh", "certificate", "database",
|
||||
"password", "auth", "token", "secret", "key"
|
||||
]
|
||||
|
||||
rule_lower = rule_id.lower()
|
||||
desc_lower = description.lower()
|
||||
|
||||
# Check for critical patterns
|
||||
for pattern in critical_patterns:
|
||||
if pattern in rule_lower or pattern in desc_lower:
|
||||
if any(x in rule_lower for x in ["aws", "gcp", "azure"]):
|
||||
return "critical"
|
||||
elif any(x in rule_lower for x in ["private", "key", "password"]):
|
||||
return "high"
|
||||
else:
|
||||
return "medium"
|
||||
|
||||
return "low"
|
||||
|
||||
def _get_leak_description(self, rule_id: str, description: str, commit: str) -> str:
|
||||
"""Get description for the leak finding"""
|
||||
base_desc = f"Gitleaks detected a potential secret leak matching rule '{rule_id}'"
|
||||
if description:
|
||||
base_desc += f" ({description})"
|
||||
|
||||
if commit:
|
||||
base_desc += f" in commit {commit[:8]}"
|
||||
|
||||
base_desc += ". This may indicate sensitive information has been committed to version control."
|
||||
|
||||
return base_desc
|
||||
|
||||
def _get_leak_recommendation(self, rule_id: str) -> str:
|
||||
"""Get remediation recommendation"""
|
||||
base_rec = "Remove the secret from the codebase and Git history. "
|
||||
|
||||
if any(pattern in rule_id.lower() for pattern in ["aws", "gcp", "azure"]):
|
||||
base_rec += "Revoke the cloud credentials immediately and rotate them. "
|
||||
|
||||
base_rec += "Consider using Git history rewriting tools (git-filter-branch, BFG) " \
|
||||
"to remove sensitive data from commit history. Implement pre-commit hooks " \
|
||||
"to prevent future secret commits."
|
||||
|
||||
return base_rec
|
||||
294
backend/toolbox/modules/secret_detection/trufflehog.py
Normal file
294
backend/toolbox/modules/secret_detection/trufflehog.py
Normal file
@@ -0,0 +1,294 @@
|
||||
"""
|
||||
TruffleHog Secret Detection Module
|
||||
|
||||
This module uses TruffleHog to detect secrets, credentials, and sensitive information
|
||||
with verification capabilities.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
import subprocess
|
||||
import logging
|
||||
|
||||
from ..base import BaseModule, ModuleMetadata, ModuleFinding, ModuleResult
|
||||
from . import register_module
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@register_module
|
||||
class TruffleHogModule(BaseModule):
|
||||
"""TruffleHog secret detection module"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="trufflehog",
|
||||
version="3.63.2",
|
||||
description="Comprehensive secret detection with verification using TruffleHog",
|
||||
author="FuzzForge Team",
|
||||
category="secret_detection",
|
||||
tags=["secrets", "credentials", "sensitive-data", "verification"],
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"verify": {
|
||||
"type": "boolean",
|
||||
"default": False,
|
||||
"description": "Verify discovered secrets"
|
||||
},
|
||||
"include_detectors": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Specific detectors to include"
|
||||
},
|
||||
"exclude_detectors": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Specific detectors to exclude"
|
||||
},
|
||||
"max_depth": {
|
||||
"type": "integer",
|
||||
"default": 10,
|
||||
"description": "Maximum directory depth to scan"
|
||||
},
|
||||
"concurrency": {
|
||||
"type": "integer",
|
||||
"default": 10,
|
||||
"description": "Number of concurrent workers"
|
||||
}
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"detector": {"type": "string"},
|
||||
"verified": {"type": "boolean"},
|
||||
"file_path": {"type": "string"},
|
||||
"line": {"type": "integer"},
|
||||
"secret": {"type": "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate configuration"""
|
||||
# Check concurrency bounds
|
||||
concurrency = config.get("concurrency", 10)
|
||||
if not isinstance(concurrency, int) or concurrency < 1 or concurrency > 50:
|
||||
raise ValueError("Concurrency must be between 1 and 50")
|
||||
|
||||
# Check max_depth bounds
|
||||
max_depth = config.get("max_depth", 10)
|
||||
if not isinstance(max_depth, int) or max_depth < 1 or max_depth > 20:
|
||||
raise ValueError("Max depth must be between 1 and 20")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""Execute TruffleHog secret detection"""
|
||||
self.start_timer()
|
||||
|
||||
try:
|
||||
# Validate inputs
|
||||
self.validate_config(config)
|
||||
self.validate_workspace(workspace)
|
||||
|
||||
logger.info(f"Running TruffleHog on {workspace}")
|
||||
|
||||
# Build TruffleHog command
|
||||
cmd = ["trufflehog", "filesystem", str(workspace)]
|
||||
|
||||
# Add verification flag
|
||||
if config.get("verify", False):
|
||||
cmd.append("--verify")
|
||||
|
||||
# Add JSON output
|
||||
cmd.extend(["--json", "--no-update"])
|
||||
|
||||
# Add concurrency
|
||||
cmd.extend(["--concurrency", str(config.get("concurrency", 10))])
|
||||
|
||||
# Add max depth
|
||||
cmd.extend(["--max-depth", str(config.get("max_depth", 10))])
|
||||
|
||||
# Add include/exclude detectors
|
||||
if config.get("include_detectors"):
|
||||
cmd.extend(["--include-detectors", ",".join(config["include_detectors"])])
|
||||
|
||||
if config.get("exclude_detectors"):
|
||||
cmd.extend(["--exclude-detectors", ",".join(config["exclude_detectors"])])
|
||||
|
||||
logger.debug(f"Running command: {' '.join(cmd)}")
|
||||
|
||||
# Run TruffleHog
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=workspace
|
||||
)
|
||||
|
||||
stdout, stderr = await process.communicate()
|
||||
|
||||
# Parse results
|
||||
findings = []
|
||||
if process.returncode == 0 or process.returncode == 1: # 1 indicates secrets found
|
||||
findings = self._parse_trufflehog_output(stdout.decode(), workspace)
|
||||
else:
|
||||
error_msg = stderr.decode()
|
||||
logger.error(f"TruffleHog failed: {error_msg}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=f"TruffleHog execution failed: {error_msg}"
|
||||
)
|
||||
|
||||
# Create summary
|
||||
summary = {
|
||||
"total_secrets": len(findings),
|
||||
"verified_secrets": len([f for f in findings if f.metadata.get("verified", False)]),
|
||||
"detectors_triggered": len(set(f.metadata.get("detector", "") for f in findings)),
|
||||
"files_with_secrets": len(set(f.file_path for f in findings if f.file_path))
|
||||
}
|
||||
|
||||
logger.info(f"TruffleHog found {len(findings)} secrets")
|
||||
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="success",
|
||||
summary=summary
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"TruffleHog module failed: {e}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _parse_trufflehog_output(self, output: str, workspace: Path) -> List[ModuleFinding]:
|
||||
"""Parse TruffleHog JSON output into findings"""
|
||||
findings = []
|
||||
|
||||
for line in output.strip().split('\n'):
|
||||
if not line.strip():
|
||||
continue
|
||||
|
||||
try:
|
||||
result = json.loads(line)
|
||||
|
||||
# Extract information
|
||||
detector = result.get("DetectorName", "unknown")
|
||||
verified = result.get("Verified", False)
|
||||
raw_secret = result.get("Raw", "")
|
||||
|
||||
# Source info
|
||||
source_metadata = result.get("SourceMetadata", {})
|
||||
source_data = source_metadata.get("Data", {})
|
||||
file_path = source_data.get("Filesystem", {}).get("file", "")
|
||||
line_num = source_data.get("Filesystem", {}).get("line", 0)
|
||||
|
||||
# Make file path relative to workspace
|
||||
if file_path:
|
||||
try:
|
||||
rel_path = Path(file_path).relative_to(workspace)
|
||||
file_path = str(rel_path)
|
||||
except ValueError:
|
||||
# If file is outside workspace, keep absolute path
|
||||
pass
|
||||
|
||||
# Determine severity based on verification and detector type
|
||||
severity = self._get_secret_severity(detector, verified, raw_secret)
|
||||
|
||||
# Create finding
|
||||
finding = self.create_finding(
|
||||
title=f"{detector} secret detected",
|
||||
description=self._get_secret_description(detector, verified),
|
||||
severity=severity,
|
||||
category="secret_detection",
|
||||
file_path=file_path if file_path else None,
|
||||
line_start=line_num if line_num > 0 else None,
|
||||
code_snippet=self._truncate_secret(raw_secret),
|
||||
recommendation=self._get_secret_recommendation(detector, verified),
|
||||
metadata={
|
||||
"detector": detector,
|
||||
"verified": verified,
|
||||
"detector_type": result.get("DetectorType", ""),
|
||||
"decoder_type": result.get("DecoderType", ""),
|
||||
"structured_data": result.get("StructuredData", {})
|
||||
}
|
||||
)
|
||||
|
||||
findings.append(finding)
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
logger.warning(f"Failed to parse TruffleHog output line: {e}")
|
||||
continue
|
||||
except Exception as e:
|
||||
logger.warning(f"Error processing TruffleHog result: {e}")
|
||||
continue
|
||||
|
||||
return findings
|
||||
|
||||
def _get_secret_severity(self, detector: str, verified: bool, secret: str) -> str:
|
||||
"""Determine severity based on secret type and verification status"""
|
||||
if verified:
|
||||
# Verified secrets are always high risk
|
||||
critical_detectors = ["aws", "gcp", "azure", "github", "gitlab", "database"]
|
||||
if any(crit in detector.lower() for crit in critical_detectors):
|
||||
return "critical"
|
||||
return "high"
|
||||
|
||||
# Unverified secrets
|
||||
high_risk_detectors = ["private_key", "certificate", "password", "token"]
|
||||
if any(high in detector.lower() for high in high_risk_detectors):
|
||||
return "medium"
|
||||
|
||||
return "low"
|
||||
|
||||
def _get_secret_description(self, detector: str, verified: bool) -> str:
|
||||
"""Get description for the secret finding"""
|
||||
verification_status = "verified and active" if verified else "unverified"
|
||||
return f"A {detector} secret was detected and is {verification_status}. " \
|
||||
f"This may represent a security risk if the credential is valid."
|
||||
|
||||
def _get_secret_recommendation(self, detector: str, verified: bool) -> str:
|
||||
"""Get remediation recommendation"""
|
||||
if verified:
|
||||
return f"IMMEDIATE ACTION REQUIRED: This {detector} secret is verified and active. " \
|
||||
f"Revoke the credential immediately, remove it from the codebase, and " \
|
||||
f"implement proper secret management practices."
|
||||
else:
|
||||
return f"Review this {detector} secret to determine if it's valid. " \
|
||||
f"If real, revoke the credential and remove it from the codebase. " \
|
||||
f"Consider implementing secret scanning in CI/CD pipelines."
|
||||
|
||||
def _truncate_secret(self, secret: str, max_length: int = 50) -> str:
|
||||
"""Truncate secret for display purposes"""
|
||||
if len(secret) <= max_length:
|
||||
return secret
|
||||
return secret[:max_length] + "..."
|
||||
11
backend/toolbox/workflows/__init__.py
Normal file
11
backend/toolbox/workflows/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
59
backend/toolbox/workflows/android_static_analysis/Dockerfile
Normal file
59
backend/toolbox/workflows/android_static_analysis/Dockerfile
Normal file
@@ -0,0 +1,59 @@
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies for MobSF and Jadx
|
||||
RUN apt-get update && apt-get install -y \
|
||||
git \
|
||||
default-jdk \
|
||||
wget \
|
||||
unzip \
|
||||
xfonts-75dpi \
|
||||
xfonts-base \
|
||||
&& rm -rf /var/lib/apt/lists/* \
|
||||
&& wget https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6.1-3/wkhtmltox_0.12.6.1-3.bookworm_amd64.deb \
|
||||
&& apt-get update \
|
||||
&& apt-get install -y ./wkhtmltox_0.12.6.1-3.bookworm_amd64.deb \
|
||||
&& rm wkhtmltox_0.12.6.1-3.bookworm_amd64.deb \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Jadx
|
||||
RUN wget https://github.com/skylot/jadx/releases/download/v1.5.0/jadx-1.5.0.zip -O /tmp/jadx.zip \
|
||||
&& unzip /tmp/jadx.zip -d /opt/jadx \
|
||||
&& rm /tmp/jadx.zip \
|
||||
&& ln -s /opt/jadx/bin/jadx /usr/local/bin/jadx
|
||||
|
||||
# The upstream OpenGrep CLI is not yet published on PyPI. Use semgrep (the
|
||||
# engine that OpenGrep builds upon) and expose it under the `opengrep` name so
|
||||
# the workflow module can invoke it transparently.
|
||||
RUN pip install --no-cache-dir semgrep==1.45.0 \
|
||||
&& ln -sf /usr/local/bin/semgrep /usr/local/bin/opengrep
|
||||
|
||||
# Clone and setup MobSF
|
||||
RUN git clone https://github.com/MobSF/Mobile-Security-Framework-MobSF.git /app/mobsf \
|
||||
&& cd /app/mobsf \
|
||||
&& git checkout v3.9.7 \
|
||||
&& ./setup.sh
|
||||
|
||||
# Force rebuild after this point
|
||||
ARG CACHEBUST=2
|
||||
|
||||
# Copy the entire toolbox directory structure
|
||||
COPY . /app/toolbox
|
||||
|
||||
# Copy Android custom rules to a well-known location
|
||||
COPY ./modules/android/custom_rules /app/custom_opengrep_rules
|
||||
|
||||
ENV PYTHONPATH=/app/toolbox:$PYTHONPATH
|
||||
ENV MOBSF_PORT=8877
|
||||
|
||||
# Create startup script to launch MobSF in background and then Prefect
|
||||
RUN echo '#!/bin/bash\n\
|
||||
cd /app/mobsf && ./run.sh 127.0.0.1:8877 &\n\
|
||||
echo "Waiting for MobSF to start..."\n\
|
||||
sleep 10\n\
|
||||
echo "Starting Prefect engine..."\n\
|
||||
exec python -m prefect.engine\n\
|
||||
' > /app/start.sh && chmod +x /app/start.sh
|
||||
|
||||
CMD ["/app/start.sh"]
|
||||
@@ -0,0 +1,16 @@
|
||||
# Use existing image with MobSF already installed
|
||||
FROM localhost:5001/fuzzforge/android_static_analysis:latest
|
||||
|
||||
# Install unzip and Jadx
|
||||
RUN apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/* \
|
||||
&& wget https://github.com/skylot/jadx/releases/download/v1.5.0/jadx-1.5.0.zip \
|
||||
&& unzip -o jadx-1.5.0.zip -d /opt/jadx \
|
||||
&& rm jadx-1.5.0.zip \
|
||||
&& chmod +x /opt/jadx/bin/jadx \
|
||||
&& ln -sf /opt/jadx/bin/jadx /usr/local/bin/jadx
|
||||
|
||||
# Copy updated toolbox files
|
||||
COPY . /app/toolbox
|
||||
|
||||
# Copy Android custom rules
|
||||
COPY ./modules/android/custom_rules /app/custom_opengrep_rules
|
||||
@@ -0,0 +1,6 @@
|
||||
"""
|
||||
Android Static Analysis Security Testing (SAST) Workflow
|
||||
|
||||
This package contains the Android SAST workflow that combines
|
||||
multiple static analysis tools optimized for Java code security.
|
||||
"""
|
||||
135
backend/toolbox/workflows/android_static_analysis/metadata.yaml
Normal file
135
backend/toolbox/workflows/android_static_analysis/metadata.yaml
Normal file
@@ -0,0 +1,135 @@
|
||||
name: android_static_analysis
|
||||
version: "1.0.0"
|
||||
description: "Perform static analysis on Android applications using OpenGrep and MobSF."
|
||||
author: "FuzzForge Team"
|
||||
category: "specialized"
|
||||
tags:
|
||||
- "android"
|
||||
- "static-analysis"
|
||||
- "security"
|
||||
- "opengrep"
|
||||
- "semgrep"
|
||||
- "mobsf"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
default_volume_mode: "ro"
|
||||
default_target_path: "/workspace/android_test"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "opengrep"
|
||||
- "mobsf"
|
||||
- "sarif_reporter"
|
||||
resources:
|
||||
memory: "2Gi"
|
||||
cpu: "2000m"
|
||||
timeout: 3600
|
||||
environment:
|
||||
python: "3.11"
|
||||
|
||||
has_docker: true
|
||||
|
||||
default_parameters:
|
||||
target_path: "/workspace/android_test"
|
||||
volume_mode: "ro"
|
||||
apk_path: ""
|
||||
opengrep_config: {}
|
||||
custom_rules_path: "/app/custom_opengrep_rules"
|
||||
reporter_config: {}
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace/android_test"
|
||||
description: "Path to the decompiled Android source code for OpenGrep analysis."
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode for the attached workspace."
|
||||
apk_path:
|
||||
type: string
|
||||
default: ""
|
||||
description: "Path to the APK file for MobSF analysis (relative to workspace parent or absolute). If empty, MobSF analysis will be skipped."
|
||||
opengrep_config:
|
||||
type: object
|
||||
description: "Configuration object forwarded to the OpenGrep module."
|
||||
properties:
|
||||
config:
|
||||
type: string
|
||||
enum: ["auto", "p/security-audit", "p/owasp-top-ten", "p/cwe-top-25"]
|
||||
description: "Preset OpenGrep ruleset to run."
|
||||
custom_rules_path:
|
||||
type: string
|
||||
description: "Directory that contains custom OpenGrep rules."
|
||||
languages:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Restrict analysis to specific languages."
|
||||
include_patterns:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "File patterns to include in the scan."
|
||||
exclude_patterns:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "File patterns to exclude from the scan."
|
||||
max_target_bytes:
|
||||
type: integer
|
||||
description: "Maximum file size to analyze (bytes)."
|
||||
timeout:
|
||||
type: integer
|
||||
description: "Analysis timeout in seconds."
|
||||
severity:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
enum: ["ERROR", "WARNING", "INFO"]
|
||||
description: "Severities to include in the results."
|
||||
confidence:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
enum: ["HIGH", "MEDIUM", "LOW"]
|
||||
description: "Confidence levels to include in the results."
|
||||
custom_rules_path:
|
||||
type:
|
||||
- string
|
||||
- "null"
|
||||
default: "/app/custom_opengrep_rules"
|
||||
description: "Optional in-container path pointing to custom OpenGrep rules."
|
||||
reporter_config:
|
||||
type: object
|
||||
description: "Configuration overrides for the SARIF reporter."
|
||||
properties:
|
||||
include_code_flows:
|
||||
type: boolean
|
||||
description: "Include code flow information in the SARIF output."
|
||||
logical_id:
|
||||
type: string
|
||||
description: "Custom identifier to attach to the generated SARIF report."
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted findings produced by the workflow."
|
||||
summary:
|
||||
type: object
|
||||
description: "Summary information about the analysis execution."
|
||||
properties:
|
||||
total_findings:
|
||||
type: integer
|
||||
severity_counts:
|
||||
type: object
|
||||
tool_metadata:
|
||||
type: object
|
||||
@@ -0,0 +1,2 @@
|
||||
requests
|
||||
pydantic
|
||||
280
backend/toolbox/workflows/android_static_analysis/workflow.py
Normal file
280
backend/toolbox/workflows/android_static_analysis/workflow.py
Normal file
@@ -0,0 +1,280 @@
|
||||
"""
|
||||
Android Static Analysis Workflow - Analyze APKs using Jadx, MobSF, and OpenGrep
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import logging
|
||||
import subprocess
|
||||
import time
|
||||
import signal
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
|
||||
from prefect import flow, task
|
||||
|
||||
# S'assurer que /app est dans le PYTHONPATH (exécutions Docker)
|
||||
sys.path.insert(0, "/app")
|
||||
|
||||
# Import des modules internes
|
||||
from toolbox.modules.android.jadx import JadxModule
|
||||
from toolbox.modules.android.opengrep import OpenGrepModule
|
||||
from toolbox.modules.reporter import SARIFReporter
|
||||
from toolbox.modules.android.mobsf import MobSFModule
|
||||
|
||||
# Logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# ---------------------- TASKS ---------------------- #
|
||||
|
||||
@task(name="jadx_decompilation")
|
||||
async def run_jadx_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
print("Running Jadx APK decompilation")
|
||||
print(f" APK file: {config.get('apk_path')}")
|
||||
print(f" Output dir: {config.get('output_dir')}")
|
||||
module = JadxModule()
|
||||
result = await module.execute(config, workspace)
|
||||
print(f"Jadx completed: {result.status}")
|
||||
if result.error:
|
||||
print(f"Jadx error: {result.error}")
|
||||
if result.status == "success":
|
||||
print(f"Jadx decompiled {result.summary.get('java_files', 0)} Java files")
|
||||
print(f"Source dir: {result.summary.get('source_dir')}")
|
||||
return result.dict()
|
||||
|
||||
@task(name="opengrep_analysis")
|
||||
async def run_opengrep_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
print("Running OpenGrep static analysis")
|
||||
print(f" Workspace: {workspace}")
|
||||
print(f" Config: {config}")
|
||||
module = OpenGrepModule()
|
||||
result = await module.execute(config, workspace)
|
||||
print(f"OpenGrep completed: {result.status}")
|
||||
print(f"OpenGrep findings count: {len(result.findings)}")
|
||||
print(f"OpenGrep summary: {result.summary}")
|
||||
return result.dict()
|
||||
|
||||
@task(name="mobsf_analysis")
|
||||
async def run_mobsf_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
print("Running MobSF static analysis")
|
||||
print(f" APK file: {config.get('file_path')}")
|
||||
print(f" MobSF URL: {config.get('mobsf_url')}")
|
||||
|
||||
module = MobSFModule()
|
||||
result = await module.execute(config, workspace)
|
||||
|
||||
print(f"MobSF scan completed: {result.status}")
|
||||
print(f"MobSF findings count: {len(result.findings)}")
|
||||
return result.dict()
|
||||
|
||||
@task(name="android_report_generation")
|
||||
async def generate_android_sarif_report(
|
||||
opengrep_result: Dict[str, Any],
|
||||
mobsf_result: Dict[str, Any],
|
||||
config: Dict[str, Any],
|
||||
workspace: Path
|
||||
) -> Dict[str, Any]:
|
||||
logger.info("Generating SARIF report for Android scan")
|
||||
reporter = SARIFReporter()
|
||||
|
||||
all_findings = []
|
||||
all_findings.extend(opengrep_result.get("findings", []))
|
||||
|
||||
# Add MobSF findings if available
|
||||
if mobsf_result:
|
||||
all_findings.extend(mobsf_result.get("findings", []))
|
||||
|
||||
reporter_config = {
|
||||
**(config or {}),
|
||||
"findings": all_findings,
|
||||
"tool_name": "FuzzForge Android Static Analysis",
|
||||
"tool_version": "1.0.0",
|
||||
}
|
||||
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
# Le reporter renvoie typiquement {"sarif": {...}} dans result.dict()
|
||||
return result.dict().get("sarif", {})
|
||||
|
||||
|
||||
# ---------------------- FLOW ---------------------- #
|
||||
|
||||
@flow(name="android_static_analysis", log_prints=True)
|
||||
async def main_flow(
|
||||
target_path: str = os.getenv("FF_TARGET_PATH", "/workspace/android_test"),
|
||||
volume_mode: str = "ro",
|
||||
apk_path: str = "",
|
||||
opengrep_config: Dict[str, Any] = {},
|
||||
custom_rules_path: str = None,
|
||||
reporter_config: Dict[str, Any] = {},
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Android static analysis workflow using OpenGrep and MobSF.
|
||||
|
||||
Args:
|
||||
target_path: Path to decompiled source code (for OpenGrep analysis)
|
||||
volume_mode: Volume mount mode (ro/rw)
|
||||
apk_path: Path to APK file for MobSF analysis (relative to workspace or absolute)
|
||||
opengrep_config: Configuration for OpenGrep module
|
||||
custom_rules_path: Path to custom OpenGrep rules
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
"""
|
||||
print("📱 Starting Android Static Analysis Workflow")
|
||||
print(f"Workspace: {target_path} (mode: {volume_mode})")
|
||||
workspace = Path(target_path)
|
||||
|
||||
# Start MobSF server in background if APK analysis is needed
|
||||
mobsf_process = None
|
||||
if apk_path:
|
||||
print("🚀 Starting MobSF server in background...")
|
||||
try:
|
||||
mobsf_process = subprocess.Popen(
|
||||
["bash", "-c", "cd /app/mobsf && ./run.sh 127.0.0.1:8877"],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE
|
||||
)
|
||||
print("⏳ Waiting for MobSF to initialize (45 seconds)...")
|
||||
time.sleep(45)
|
||||
print("✅ MobSF should be ready now")
|
||||
|
||||
# Retrieve MobSF API key from secret file
|
||||
print("🔑 Retrieving MobSF API key...")
|
||||
try:
|
||||
secret_file = Path("/root/.MobSF/secret")
|
||||
if secret_file.exists():
|
||||
secret = secret_file.read_text().strip()
|
||||
if secret:
|
||||
# API key is SHA256 hash of the secret file contents
|
||||
import hashlib
|
||||
api_key = hashlib.sha256(secret.encode()).hexdigest()
|
||||
os.environ["MOBSF_API_KEY"] = api_key
|
||||
print(f"✅ MobSF API key retrieved")
|
||||
else:
|
||||
print("⚠️ API key file is empty")
|
||||
else:
|
||||
print(f"⚠️ API key file not found at {secret_file}")
|
||||
except Exception as e:
|
||||
print(f"⚠️ Error retrieving API key: {e}")
|
||||
except Exception as e:
|
||||
print(f"⚠️ Failed to start MobSF: {e}")
|
||||
mobsf_process = None
|
||||
|
||||
# Resolve APK path if provided
|
||||
# Note: target_path gets mounted as /workspace/ in the execution container
|
||||
# So all paths should be relative to /workspace/
|
||||
apk_file_path = None
|
||||
if apk_path:
|
||||
apk_path_obj = Path(apk_path)
|
||||
if apk_path_obj.is_absolute():
|
||||
apk_file_path = str(apk_path_obj)
|
||||
else:
|
||||
# Relative paths are relative to /workspace/ (the mounted target directory)
|
||||
apk_file_path = f"/workspace/{apk_path}"
|
||||
print(f"APK path resolved to: {apk_file_path}")
|
||||
print(f"Checking if APK exists in target: {(Path(target_path) / apk_path).exists()}")
|
||||
|
||||
# Set default Android-specific configuration if not provided
|
||||
if not opengrep_config:
|
||||
opengrep_config = {
|
||||
"languages": ["java", "kotlin"], # Focus on Android languages
|
||||
}
|
||||
|
||||
# Use custom Android rules if available, otherwise use custom_rules_path param
|
||||
if custom_rules_path:
|
||||
opengrep_config["custom_rules_path"] = custom_rules_path
|
||||
elif "custom_rules_path" not in opengrep_config:
|
||||
# Default to custom Android security rules
|
||||
opengrep_config["custom_rules_path"] = "/app/custom_opengrep_rules"
|
||||
|
||||
try:
|
||||
# --- Phase 1 : Jadx Decompilation ---
|
||||
jadx_result = None
|
||||
actual_workspace = workspace
|
||||
if apk_file_path:
|
||||
print(f"Phase 1: Jadx decompilation of APK: {apk_file_path}")
|
||||
jadx_config = {
|
||||
"apk_path": apk_file_path,
|
||||
"output_dir": "jadx_output",
|
||||
"overwrite": True,
|
||||
"threads": 4,
|
||||
}
|
||||
jadx_result = await run_jadx_task(workspace, jadx_config)
|
||||
|
||||
if jadx_result.get("status") == "success":
|
||||
# Use Jadx source output as workspace for OpenGrep
|
||||
source_dir = jadx_result.get("summary", {}).get("source_dir")
|
||||
if source_dir:
|
||||
actual_workspace = Path(source_dir)
|
||||
print(f"✅ Jadx decompiled {jadx_result.get('summary', {}).get('java_files', 0)} Java files")
|
||||
print(f" OpenGrep will analyze: {source_dir}")
|
||||
else:
|
||||
print(f"⚠️ Jadx failed: {jadx_result.get('error', 'unknown error')}")
|
||||
else:
|
||||
print("Phase 1: Jadx decompilation skipped (no APK provided)")
|
||||
|
||||
# --- Phase 2 : OpenGrep ---
|
||||
print("Phase 2: OpenGrep analysis on source code")
|
||||
print(f"Using config: {opengrep_config}")
|
||||
opengrep_result = await run_opengrep_task(actual_workspace, opengrep_config)
|
||||
|
||||
# --- Phase 3 : MobSF ---
|
||||
mobsf_result = None
|
||||
if apk_file_path:
|
||||
print(f"Phase 3: MobSF analysis on APK: {apk_file_path}")
|
||||
mobsf_config = {
|
||||
"mobsf_url": "http://localhost:8877",
|
||||
"file_path": apk_file_path,
|
||||
"api_key": os.environ.get("MOBSF_API_KEY", "")
|
||||
}
|
||||
print(f"Using MobSF config (api_key={mobsf_config['api_key'][:10]}...): {mobsf_config}")
|
||||
mobsf_result = await run_mobsf_task(workspace, mobsf_config)
|
||||
print(f"MobSF result: {mobsf_result}")
|
||||
else:
|
||||
print(f"Phase 3: MobSF analysis skipped (apk_path='{apk_path}' empty)")
|
||||
|
||||
# --- Phase 4 : Rapport SARIF ---
|
||||
print("Phase 4: SARIF report generation")
|
||||
sarif_report = await generate_android_sarif_report(
|
||||
opengrep_result, mobsf_result, reporter_config or {}, workspace
|
||||
)
|
||||
|
||||
findings = sarif_report.get("runs", [{}])[0].get("results", []) if sarif_report else []
|
||||
print(f"✅ Workflow complete with {len(findings)} findings")
|
||||
return sarif_report
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Workflow failed: {e}")
|
||||
print(f"❌ Workflow failed: {e}")
|
||||
# Retourner un squelette SARIF minimal en cas d'échec
|
||||
return {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {"driver": {"name": "FuzzForge Android Static Analysis"}},
|
||||
"results": [],
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": False,
|
||||
"exitCode": 1,
|
||||
"exitCodeDescription": str(e),
|
||||
}
|
||||
],
|
||||
}
|
||||
],
|
||||
}
|
||||
finally:
|
||||
# Cleanup: Stop MobSF if it was started
|
||||
if mobsf_process:
|
||||
print("🛑 Stopping MobSF server...")
|
||||
try:
|
||||
mobsf_process.terminate()
|
||||
mobsf_process.wait(timeout=5)
|
||||
print("✅ MobSF stopped")
|
||||
except Exception as e:
|
||||
print(f"⚠️ Error stopping MobSF: {e}")
|
||||
try:
|
||||
mobsf_process.kill()
|
||||
except:
|
||||
pass
|
||||
12
backend/toolbox/workflows/comprehensive/__init__.py
Normal file
12
backend/toolbox/workflows/comprehensive/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
@@ -0,0 +1,47 @@
|
||||
# Secret Detection Workflow Dockerfile
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
git \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install TruffleHog (use direct binary download to avoid install script issues)
|
||||
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
|
||||
&& tar -xzf trufflehog.tar.gz \
|
||||
&& mv trufflehog /usr/local/bin/ \
|
||||
&& rm trufflehog.tar.gz
|
||||
|
||||
# Install Gitleaks (use specific version to avoid API rate limiting)
|
||||
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
|
||||
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
|
||||
&& mv gitleaks /usr/local/bin/ \
|
||||
&& rm gitleaks_8.18.2_linux_x64.tar.gz
|
||||
|
||||
# Verify installations
|
||||
RUN trufflehog --version && gitleaks version
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /opt/prefect
|
||||
|
||||
# Create toolbox directory structure
|
||||
RUN mkdir -p /opt/prefect/toolbox
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
|
||||
ENV WORKFLOW_NAME=secret_detection_scan
|
||||
|
||||
# The toolbox code will be mounted at runtime from the backend container
|
||||
# This includes:
|
||||
# - /opt/prefect/toolbox/modules/base.py
|
||||
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
|
||||
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
|
||||
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
|
||||
VOLUME /opt/prefect/toolbox
|
||||
|
||||
# Set working directory for execution
|
||||
WORKDIR /opt/prefect
|
||||
@@ -0,0 +1,58 @@
|
||||
# Secret Detection Workflow Dockerfile - Self-Contained Version
|
||||
# This version copies all required modules into the image for complete isolation
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
git \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install TruffleHog
|
||||
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
|
||||
|
||||
# Install Gitleaks
|
||||
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
|
||||
&& tar -xzf gitleaks_linux_x64.tar.gz \
|
||||
&& mv gitleaks /usr/local/bin/ \
|
||||
&& rm gitleaks_linux_x64.tar.gz
|
||||
|
||||
# Verify installations
|
||||
RUN trufflehog --version && gitleaks version
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /opt/prefect
|
||||
|
||||
# Create directory structure
|
||||
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
|
||||
/opt/prefect/toolbox/modules/reporter \
|
||||
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
|
||||
|
||||
# Copy the base module and required modules
|
||||
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
|
||||
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
|
||||
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
|
||||
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
|
||||
|
||||
# Copy the workflow code
|
||||
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
|
||||
|
||||
# Copy toolbox init files
|
||||
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
|
||||
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
|
||||
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
|
||||
|
||||
# Install Python dependencies for the modules
|
||||
RUN pip install --no-cache-dir \
|
||||
pydantic \
|
||||
asyncio
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
|
||||
ENV WORKFLOW_NAME=secret_detection_scan
|
||||
|
||||
# Set default command (can be overridden)
|
||||
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]
|
||||
@@ -0,0 +1,130 @@
|
||||
# Secret Detection Scan Workflow
|
||||
|
||||
This workflow performs comprehensive secret detection using multiple industry-standard tools:
|
||||
|
||||
- **TruffleHog**: Comprehensive secret detection with verification capabilities
|
||||
- **Gitleaks**: Git-specific secret scanning and leak detection
|
||||
|
||||
## Features
|
||||
|
||||
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
|
||||
- **Deduplication**: Automatically removes duplicate findings across tools
|
||||
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
|
||||
- **Configurable**: Supports extensive configuration for both tools
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Required Modules
|
||||
- `toolbox.modules.secret_detection.trufflehog`
|
||||
- `toolbox.modules.secret_detection.gitleaks`
|
||||
- `toolbox.modules.reporter` (SARIF reporter)
|
||||
- `toolbox.modules.base` (Base module interface)
|
||||
|
||||
### External Tools
|
||||
- TruffleHog v3.63.2+
|
||||
- Gitleaks v8.18.0+
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
This workflow provides two Docker deployment approaches:
|
||||
|
||||
### 1. Volume-Based Approach (Default: `Dockerfile`)
|
||||
|
||||
**Advantages:**
|
||||
- Live code updates without rebuilding images
|
||||
- Smaller image sizes
|
||||
- Consistent module versions across workflows
|
||||
- Faster development iteration
|
||||
|
||||
**How it works:**
|
||||
- Docker image contains only external tools (TruffleHog, Gitleaks)
|
||||
- Python modules are mounted at runtime from the backend container
|
||||
- Backend manages code synchronization via shared volumes
|
||||
|
||||
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
|
||||
|
||||
**Advantages:**
|
||||
- Complete isolation and reproducibility
|
||||
- No runtime dependencies on backend code
|
||||
- Can run independently of FuzzForge platform
|
||||
- Better for CI/CD integration
|
||||
|
||||
**How it works:**
|
||||
- All required Python modules are copied into the Docker image
|
||||
- Image is completely self-contained
|
||||
- Larger image size but fully portable
|
||||
|
||||
## Configuration
|
||||
|
||||
### TruffleHog Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"trufflehog_config": {
|
||||
"verify": true, // Verify discovered secrets
|
||||
"concurrency": 10, // Number of concurrent workers
|
||||
"max_depth": 10, // Maximum directory depth
|
||||
"include_detectors": [], // Specific detectors to include
|
||||
"exclude_detectors": [] // Specific detectors to exclude
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gitleaks Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"gitleaks_config": {
|
||||
"scan_mode": "detect", // "detect" or "protect"
|
||||
"redact": true, // Redact secrets in output
|
||||
"max_target_megabytes": 100, // Maximum file size (MB)
|
||||
"no_git": false, // Scan without Git context
|
||||
"config_file": "", // Custom Gitleaks config
|
||||
"baseline_file": "" // Baseline file for known findings
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"target_path": "/path/to/scan",
|
||||
"volume_mode": "ro",
|
||||
"parameters": {
|
||||
"trufflehog_config": {
|
||||
"verify": true,
|
||||
"concurrency": 15
|
||||
},
|
||||
"gitleaks_config": {
|
||||
"scan_mode": "detect",
|
||||
"max_target_megabytes": 200
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
The workflow generates a SARIF report containing:
|
||||
- All unique findings from both tools
|
||||
- Severity levels mapped to standard scale
|
||||
- File locations and line numbers
|
||||
- Detailed descriptions and recommendations
|
||||
- Tool-specific metadata
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **TruffleHog**: CPU-intensive with verification enabled
|
||||
- **Gitleaks**: Memory-intensive for large repositories
|
||||
- **Recommended Resources**: 512Mi memory, 500m CPU
|
||||
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
|
||||
|
||||
## Security Notes
|
||||
|
||||
- Secrets are redacted in output by default
|
||||
- Verified secrets are marked with higher severity
|
||||
- Both tools support custom rules and exclusions
|
||||
- Consider using baseline files for known false positives
|
||||
@@ -0,0 +1,17 @@
|
||||
"""
|
||||
Secret Detection Scan Workflow
|
||||
|
||||
This package contains the comprehensive secret detection workflow that combines
|
||||
multiple secret detection tools for thorough analysis.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
@@ -0,0 +1,113 @@
|
||||
name: secret_detection_scan
|
||||
version: "2.0.0"
|
||||
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
|
||||
author: "FuzzForge Team"
|
||||
category: "comprehensive"
|
||||
tags:
|
||||
- "secrets"
|
||||
- "credentials"
|
||||
- "detection"
|
||||
- "trufflehog"
|
||||
- "gitleaks"
|
||||
- "comprehensive"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
default_volume_mode: "ro"
|
||||
default_target_path: "/workspace"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "trufflehog"
|
||||
- "gitleaks"
|
||||
resources:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
timeout: 1800
|
||||
|
||||
has_docker: true
|
||||
|
||||
default_parameters:
|
||||
target_path: "/workspace"
|
||||
volume_mode: "ro"
|
||||
trufflehog_config: {}
|
||||
gitleaks_config: {}
|
||||
reporter_config: {}
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode"
|
||||
trufflehog_config:
|
||||
type: object
|
||||
description: "TruffleHog configuration"
|
||||
properties:
|
||||
verify:
|
||||
type: boolean
|
||||
description: "Verify discovered secrets"
|
||||
concurrency:
|
||||
type: integer
|
||||
description: "Number of concurrent workers"
|
||||
max_depth:
|
||||
type: integer
|
||||
description: "Maximum directory depth to scan"
|
||||
include_detectors:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Specific detectors to include"
|
||||
exclude_detectors:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Specific detectors to exclude"
|
||||
gitleaks_config:
|
||||
type: object
|
||||
description: "Gitleaks configuration"
|
||||
properties:
|
||||
scan_mode:
|
||||
type: string
|
||||
enum: ["detect", "protect"]
|
||||
description: "Scan mode"
|
||||
redact:
|
||||
type: boolean
|
||||
description: "Redact secrets in output"
|
||||
max_target_megabytes:
|
||||
type: integer
|
||||
description: "Maximum file size to scan (MB)"
|
||||
no_git:
|
||||
type: boolean
|
||||
description: "Scan files without Git context"
|
||||
config_file:
|
||||
type: string
|
||||
description: "Path to custom configuration file"
|
||||
baseline_file:
|
||||
type: string
|
||||
description: "Path to baseline file"
|
||||
reporter_config:
|
||||
type: object
|
||||
description: "SARIF reporter configuration"
|
||||
properties:
|
||||
output_file:
|
||||
type: string
|
||||
description: "Output SARIF file name"
|
||||
include_code_flows:
|
||||
type: boolean
|
||||
description: "Include code flow information"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted security findings"
|
||||
@@ -0,0 +1,290 @@
|
||||
"""
|
||||
Secret Detection Scan Workflow
|
||||
|
||||
This workflow performs comprehensive secret detection using multiple tools:
|
||||
- TruffleHog: Comprehensive secret detection with verification
|
||||
- Gitleaks: Git-specific secret scanning
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import sys
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from prefect import flow, task
|
||||
from prefect.artifacts import create_markdown_artifact, create_table_artifact
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
# Add modules to path
|
||||
sys.path.insert(0, '/app')
|
||||
|
||||
# Import modules
|
||||
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
|
||||
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
|
||||
from toolbox.modules.reporter import SARIFReporter
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@task(name="trufflehog_scan")
|
||||
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to run TruffleHog secret detection.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: TruffleHog configuration
|
||||
|
||||
Returns:
|
||||
TruffleHog results
|
||||
"""
|
||||
logger.info("Running TruffleHog secret detection")
|
||||
module = TruffleHogModule()
|
||||
result = await module.execute(config, workspace)
|
||||
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="gitleaks_scan")
|
||||
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to run Gitleaks secret detection.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Gitleaks configuration
|
||||
|
||||
Returns:
|
||||
Gitleaks results
|
||||
"""
|
||||
logger.info("Running Gitleaks secret detection")
|
||||
module = GitleaksModule()
|
||||
result = await module.execute(config, workspace)
|
||||
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="aggregate_findings")
|
||||
async def aggregate_findings_task(
|
||||
trufflehog_results: Dict[str, Any],
|
||||
gitleaks_results: Dict[str, Any],
|
||||
config: Dict[str, Any],
|
||||
workspace: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to aggregate findings from all secret detection tools.
|
||||
|
||||
Args:
|
||||
trufflehog_results: Results from TruffleHog
|
||||
gitleaks_results: Results from Gitleaks
|
||||
config: Reporter configuration
|
||||
workspace: Path to workspace
|
||||
|
||||
Returns:
|
||||
Aggregated SARIF report
|
||||
"""
|
||||
logger.info("Aggregating secret detection findings")
|
||||
|
||||
# Combine all findings
|
||||
all_findings = []
|
||||
|
||||
# Add TruffleHog findings
|
||||
trufflehog_findings = trufflehog_results.get("findings", [])
|
||||
all_findings.extend(trufflehog_findings)
|
||||
|
||||
# Add Gitleaks findings
|
||||
gitleaks_findings = gitleaks_results.get("findings", [])
|
||||
all_findings.extend(gitleaks_findings)
|
||||
|
||||
# Deduplicate findings based on file path and line number
|
||||
unique_findings = []
|
||||
seen_signatures = set()
|
||||
|
||||
for finding in all_findings:
|
||||
# Create signature for deduplication
|
||||
signature = (
|
||||
finding.get("file_path", ""),
|
||||
finding.get("line_start", 0),
|
||||
finding.get("title", "").lower()[:50] # First 50 chars of title
|
||||
)
|
||||
|
||||
if signature not in seen_signatures:
|
||||
seen_signatures.add(signature)
|
||||
unique_findings.append(finding)
|
||||
else:
|
||||
logger.debug(f"Deduplicated finding: {signature}")
|
||||
|
||||
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
|
||||
|
||||
# Generate SARIF report
|
||||
reporter = SARIFReporter()
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": unique_findings,
|
||||
"tool_name": "FuzzForge Secret Detection",
|
||||
"tool_version": "1.0.0",
|
||||
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
|
||||
}
|
||||
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
return result.dict().get("sarif", {})
|
||||
|
||||
|
||||
@flow(name="secret_detection_scan", log_prints=True)
|
||||
async def main_flow(
|
||||
target_path: str = "/workspace",
|
||||
volume_mode: str = "ro",
|
||||
trufflehog_config: Optional[Dict[str, Any]] = None,
|
||||
gitleaks_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main secret detection workflow.
|
||||
|
||||
This workflow:
|
||||
1. Runs TruffleHog for comprehensive secret detection
|
||||
2. Runs Gitleaks for Git-specific secret detection
|
||||
3. Aggregates and deduplicates findings
|
||||
4. Generates a unified SARIF report
|
||||
|
||||
Args:
|
||||
target_path: Path to the mounted workspace (default: /workspace)
|
||||
volume_mode: Volume mount mode (ro/rw)
|
||||
trufflehog_config: Configuration for TruffleHog
|
||||
gitleaks_config: Configuration for Gitleaks
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings report
|
||||
"""
|
||||
logger.info("Starting comprehensive secret detection workflow")
|
||||
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
|
||||
|
||||
# Set workspace path
|
||||
workspace = Path(target_path)
|
||||
|
||||
if not workspace.exists():
|
||||
logger.error(f"Workspace does not exist: {workspace}")
|
||||
return {
|
||||
"error": f"Workspace not found: {workspace}",
|
||||
"sarif": None
|
||||
}
|
||||
|
||||
# Default configurations - merge with provided configs to ensure defaults are always applied
|
||||
default_trufflehog_config = {
|
||||
"verify": False,
|
||||
"concurrency": 10,
|
||||
"max_depth": 10,
|
||||
"no_git": True # Add no_git for filesystem scanning
|
||||
}
|
||||
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
|
||||
|
||||
default_gitleaks_config = {
|
||||
"scan_mode": "detect",
|
||||
"redact": True,
|
||||
"max_target_megabytes": 100,
|
||||
"no_git": True # Critical for non-git directories
|
||||
}
|
||||
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
|
||||
|
||||
default_reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
reporter_config = {**default_reporter_config, **(reporter_config or {})}
|
||||
|
||||
try:
|
||||
# Run secret detection tools in parallel
|
||||
logger.info("Phase 1: Running secret detection tools")
|
||||
|
||||
# Create tasks for parallel execution
|
||||
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
|
||||
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
|
||||
|
||||
# Wait for both to complete
|
||||
trufflehog_results, gitleaks_results = await asyncio.gather(
|
||||
trufflehog_task_result,
|
||||
gitleaks_task_result,
|
||||
return_exceptions=True
|
||||
)
|
||||
|
||||
# Handle any exceptions
|
||||
if isinstance(trufflehog_results, Exception):
|
||||
logger.error(f"TruffleHog failed: {trufflehog_results}")
|
||||
trufflehog_results = {"findings": [], "status": "failed"}
|
||||
|
||||
if isinstance(gitleaks_results, Exception):
|
||||
logger.error(f"Gitleaks failed: {gitleaks_results}")
|
||||
gitleaks_results = {"findings": [], "status": "failed"}
|
||||
|
||||
# Aggregate findings
|
||||
logger.info("Phase 2: Aggregating findings")
|
||||
sarif_report = await aggregate_findings_task(
|
||||
trufflehog_results,
|
||||
gitleaks_results,
|
||||
reporter_config,
|
||||
workspace
|
||||
)
|
||||
|
||||
# Log summary
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
results_count = len(sarif_report["runs"][0].get("results", []))
|
||||
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
|
||||
|
||||
# Log tool-specific stats
|
||||
trufflehog_count = len(trufflehog_results.get("findings", []))
|
||||
gitleaks_count = len(gitleaks_results.get("findings", []))
|
||||
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
|
||||
else:
|
||||
logger.info("Workflow completed successfully with no findings")
|
||||
|
||||
return sarif_report
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Secret detection workflow failed: {e}")
|
||||
# Return error in SARIF format
|
||||
return {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "FuzzForge Secret Detection",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
},
|
||||
"results": [],
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": False,
|
||||
"exitCode": 1,
|
||||
"exitCodeDescription": str(e)
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# For local testing
|
||||
import asyncio
|
||||
|
||||
asyncio.run(main_flow(
|
||||
target_path="/tmp/test",
|
||||
trufflehog_config={"verify": True, "max_depth": 5},
|
||||
gitleaks_config={"scan_mode": "detect"}
|
||||
))
|
||||
204
backend/toolbox/workflows/registry.py
Normal file
204
backend/toolbox/workflows/registry.py
Normal file
@@ -0,0 +1,204 @@
|
||||
"""
|
||||
Manual Workflow Registry for Prefect Deployment
|
||||
|
||||
This file contains the manual registry of all workflows that can be deployed.
|
||||
Developers MUST add their workflows here after creating them.
|
||||
|
||||
This approach is required because:
|
||||
1. Prefect cannot deploy dynamically imported flows
|
||||
2. Docker deployment needs static flow references
|
||||
3. Explicit registration provides better control and visibility
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from typing import Dict, Any, Callable
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Import only essential workflows
|
||||
# Import each workflow individually to handle failures gracefully
|
||||
security_assessment_flow = None
|
||||
secret_detection_flow = None
|
||||
android_static_analysis_flow = None
|
||||
|
||||
# Try to import each workflow individually
|
||||
try:
|
||||
from .security_assessment.workflow import main_flow as security_assessment_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import security_assessment workflow: {e}")
|
||||
|
||||
try:
|
||||
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
|
||||
|
||||
try:
|
||||
from .android_static_analysis.workflow import main_flow as android_static_analysis_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import android_static_analysis workflow: {e}")
|
||||
|
||||
|
||||
# Manual registry - developers add workflows here after creation
|
||||
# Only include workflows that were successfully imported
|
||||
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
# Add workflows that were successfully imported
|
||||
if security_assessment_flow is not None:
|
||||
WORKFLOW_REGISTRY["security_assessment"] = {
|
||||
"flow": security_assessment_flow,
|
||||
"module_path": "toolbox.workflows.security_assessment.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
|
||||
}
|
||||
|
||||
if secret_detection_flow is not None:
|
||||
WORKFLOW_REGISTRY["secret_detection_scan"] = {
|
||||
"flow": secret_detection_flow,
|
||||
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
|
||||
}
|
||||
|
||||
if android_static_analysis_flow is not None:
|
||||
WORKFLOW_REGISTRY["android_static_analysis"] = {
|
||||
"flow": android_static_analysis_flow,
|
||||
"module_path": "toolbox.workflows.android_static_analysis.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Perform static analysis on Android applications using OpenGrep",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["android", "static-analysis", "security", "opengrep", "semgrep"]
|
||||
}
|
||||
|
||||
#
|
||||
# To add a new workflow, follow this pattern:
|
||||
#
|
||||
# "my_new_workflow": {
|
||||
# "flow": my_new_flow_function, # Import the flow function above
|
||||
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
|
||||
# "function_name": "my_new_flow_function",
|
||||
# "description": "Description of what this workflow does",
|
||||
# "version": "1.0.0",
|
||||
# "author": "Developer Name",
|
||||
# "tags": ["tag1", "tag2"]
|
||||
# }
|
||||
|
||||
|
||||
def get_workflow_flow(workflow_name: str) -> Callable:
|
||||
"""
|
||||
Get the flow function for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Flow function
|
||||
|
||||
Raises:
|
||||
KeyError: If workflow not found in registry
|
||||
"""
|
||||
if workflow_name not in WORKFLOW_REGISTRY:
|
||||
available = list(WORKFLOW_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {available}. "
|
||||
f"Please add the workflow to toolbox/workflows/registry.py"
|
||||
)
|
||||
|
||||
return WORKFLOW_REGISTRY[workflow_name]["flow"]
|
||||
|
||||
|
||||
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get registry information for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Registry information dictionary
|
||||
|
||||
Raises:
|
||||
KeyError: If workflow not found in registry
|
||||
"""
|
||||
if workflow_name not in WORKFLOW_REGISTRY:
|
||||
available = list(WORKFLOW_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {available}"
|
||||
)
|
||||
|
||||
return WORKFLOW_REGISTRY[workflow_name]
|
||||
|
||||
|
||||
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
|
||||
"""
|
||||
Get all registered workflows.
|
||||
|
||||
Returns:
|
||||
Dictionary of all workflow registry entries
|
||||
"""
|
||||
return WORKFLOW_REGISTRY.copy()
|
||||
|
||||
|
||||
def validate_registry() -> bool:
|
||||
"""
|
||||
Validate the workflow registry for consistency.
|
||||
|
||||
Returns:
|
||||
True if valid, raises exceptions if not
|
||||
|
||||
Raises:
|
||||
ValueError: If registry is invalid
|
||||
"""
|
||||
if not WORKFLOW_REGISTRY:
|
||||
raise ValueError("Workflow registry is empty")
|
||||
|
||||
required_fields = ["flow", "module_path", "function_name", "description"]
|
||||
|
||||
for name, entry in WORKFLOW_REGISTRY.items():
|
||||
# Check required fields
|
||||
missing_fields = [field for field in required_fields if field not in entry]
|
||||
if missing_fields:
|
||||
raise ValueError(
|
||||
f"Workflow '{name}' missing required fields: {missing_fields}"
|
||||
)
|
||||
|
||||
# Check if flow is callable
|
||||
if not callable(entry["flow"]):
|
||||
raise ValueError(f"Workflow '{name}' flow is not callable")
|
||||
|
||||
# Check if flow has the required Prefect attributes
|
||||
if not hasattr(entry["flow"], "deploy"):
|
||||
raise ValueError(
|
||||
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
|
||||
)
|
||||
|
||||
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
|
||||
return True
|
||||
|
||||
|
||||
# Validate registry on import
|
||||
try:
|
||||
validate_registry()
|
||||
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
|
||||
except Exception as e:
|
||||
logger.error(f"Workflow registry validation failed: {e}")
|
||||
raise
|
||||
30
backend/toolbox/workflows/security_assessment/Dockerfile
Normal file
30
backend/toolbox/workflows/security_assessment/Dockerfile
Normal file
@@ -0,0 +1,30 @@
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Create toolbox directory structure to match expected import paths
|
||||
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
|
||||
|
||||
# Copy base module infrastructure
|
||||
COPY modules/__init__.py /app/toolbox/modules/
|
||||
COPY modules/base.py /app/toolbox/modules/
|
||||
|
||||
# Copy only required modules (manual selection)
|
||||
COPY modules/scanner /app/toolbox/modules/scanner
|
||||
COPY modules/analyzer /app/toolbox/modules/analyzer
|
||||
COPY modules/reporter /app/toolbox/modules/reporter
|
||||
|
||||
# Copy this workflow
|
||||
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
|
||||
|
||||
# Install workflow-specific requirements if they exist
|
||||
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
|
||||
|
||||
# Install common requirements
|
||||
RUN pip install --no-cache-dir pyyaml
|
||||
|
||||
# Set Python path
|
||||
ENV PYTHONPATH=/app:$PYTHONPATH
|
||||
|
||||
# Create workspace directory
|
||||
RUN mkdir -p /workspace
|
||||
11
backend/toolbox/workflows/security_assessment/__init__.py
Normal file
11
backend/toolbox/workflows/security_assessment/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user