Initial commit

This commit is contained in:
Tanguy Duhamel
2025-09-29 21:26:41 +02:00
parent f0fd367ed8
commit 323a434c73
208 changed files with 72069 additions and 53 deletions
+8
View File
@@ -0,0 +1,8 @@
{
"label": "Reference",
"position": 5,
"link": {
"type": "generated-index",
"description": "Reference pages that are information-oriented."
}
}
+305
View File
@@ -0,0 +1,305 @@
# FuzzForge AI Reference: CLI, Environment, and API
Welcome to the FuzzForge AI Reference! This document provides a comprehensive, no-nonsense guide to all the commands, environment variables, and API endpoints youll need to master the FuzzForge AI system. Use this as your quick lookup for syntax, options, and integration details.
---
## CLI Commands Reference
| Command | Description | Example |
|---------|-------------|---------|
| `/register <url>` | Register an A2A agent | `/register http://localhost:10201` |
| `/unregister <name>` | Remove a registered agent | `/unregister CalculatorAgent` |
| `/list` | Show all registered agents | `/list` |
| `/memory [action]` | Knowledge graph operations | `/memory search security` |
| `/recall <query>` | Search conversation history | `/recall past calculations` |
| `/artifacts [id]` | List or view artifacts | `/artifacts artifact_abc123` |
| `/tasks [id]` | Show task status | `/tasks task_001` |
| `/skills` | Display FuzzForge skills | `/skills` |
| `/sessions` | List active sessions | `/sessions` |
| `/sendfile <agent> <path>` | Send file to agent | `/sendfile Analyzer ./code.py` |
| `/clear` | Clear the screen | `/clear` |
| `/help` | Show help | `/help` |
| `/quit` | Exit the CLI | `/quit` |
---
## Built-in Function Tools
### Knowledge Management
```python
search_project_knowledge(query, dataset, search_type)
list_project_knowledge()
ingest_to_dataset(content, dataset)
```
### File Operations
```python
list_project_files(path, pattern)
read_project_file(file_path, max_lines)
search_project_files(search_pattern, file_pattern, path)
```
### Agent Management
```python
get_agent_capabilities(agent_name)
send_file_to_agent(agent_name, file_path, note)
```
### FuzzForge Platform
```python
list_fuzzforge_workflows()
submit_security_scan_mcp(workflow_name, target_path, parameters)
get_comprehensive_scan_summary(run_id)
get_fuzzforge_run_status(run_id)
get_fuzzforge_summary(run_id)
get_fuzzforge_findings(run_id)
```
### Task Management
```python
create_task_list(tasks)
update_task_status(task_list_id, task_id, status)
get_task_list(task_list_id)
```
---
## Environment Variables
Set these in `.fuzzforge/.env` to configure your FuzzForge AI instance.
### Model Configuration
```env
LITELLM_MODEL=gpt-4o-mini # Any LiteLLM-supported model
OPENAI_API_KEY=sk-... # API key for model provider
ANTHROPIC_API_KEY=sk-ant-... # For Claude models
GEMINI_API_KEY=... # For Gemini models
```
### Memory & Persistence
```env
SESSION_PERSISTENCE=sqlite # sqlite|inmemory
SESSION_DB_PATH=./fuzzforge_sessions.db
MEMORY_SERVICE=inmemory # inmemory|vertexai
```
### Server & Communication
```env
FUZZFORGE_PORT=10100 # A2A server port
ARTIFACT_STORAGE=inmemory # inmemory|gcs
GCS_ARTIFACT_BUCKET=artifacts # For GCS storage
```
### Debug & Observability
```env
FUZZFORGE_DEBUG=1 # Enable debug logging
AGENTOPS_API_KEY=... # Optional observability
```
### Platform Integration
```env
FUZZFORGE_MCP_URL=http://localhost:8010/mcp
```
---
## MCP (Model Context Protocol) Integration
FuzzForge supports the Model Context Protocol (MCP), allowing LLM clients and AI assistants to interact directly with the security testing platform. All FastAPI endpoints are available as MCP-compatible tools, making security automation accessible to any MCP-aware client.
### MCP Endpoints
- **HTTP MCP endpoint:** `http://localhost:8010/mcp`
- **SSE (Server-Sent Events):** `http://localhost:8010/mcp/sse`
- **Base API:** `http://localhost:8000`
### MCP Tools
- `submit_security_scan_mcp` — Submit security scanning workflows
- `get_comprehensive_scan_summary` — Get detailed scan analysis with recommendations
### FastAPI Endpoints (now MCP tools)
- `GET /` — API status
- `GET /workflows/` — List available workflows
- `POST /workflows/{workflow_name}/submit` — Submit security scans
- `GET /runs/{run_id}/status` — Check scan status
- `GET /runs/{run_id}/findings` — Get scan results
- `GET /fuzzing/{run_id}/stats` — Fuzzing statistics
### Usage Example: Submit a Security Scan via MCP
```json
{
"tool": "submit_security_scan_mcp",
"parameters": {
"workflow_name": "infrastructure_scan",
"target_path": "/path/to/your/project",
"volume_mode": "ro",
"parameters": {
"checkov_config": {
"severity": ["HIGH", "MEDIUM", "LOW"]
},
"hadolint_config": {
"severity": ["error", "warning", "info", "style"]
}
}
}
}
```
### Usage Example: Get a Comprehensive Scan Summary
```json
{
"tool": "get_comprehensive_scan_summary",
"parameters": {
"run_id": "your-run-id-here"
}
}
```
### Available Workflows
1. **infrastructure_scan** — Docker/Kubernetes/Terraform security analysis
2. **static_analysis_scan** — Code vulnerability detection
3. **secret_detection_scan** — Credential and secret scanning
4. **penetration_testing_scan** — Network and web app testing
5. **security_assessment** — Comprehensive security evaluation
### MCP Client Configuration Example
```json
{
"mcpServers": {
"fuzzforge": {
"command": "curl",
"args": ["-X", "POST", "http://localhost:8010/mcp"],
"env": {}
}
}
}
```
### Troubleshooting MCP
- **MCP Connection Failed:**
Check backend status:
`docker compose ps fuzzforge-backend`
`curl http://localhost:8000/health`
- **Workflows Not Found:**
`curl http://localhost:8000/workflows/`
- **Scan Submission Errors:**
`curl -X POST http://localhost:8000/workflows/infrastructure_scan/submit -H "Content-Type: application/json" -d '{"target_path": "/your/path", "volume_mode": "ro"}'`
- **General Support:**
- Check Docker Compose logs: `docker compose logs fuzzforge-backend`
- Verify MCP endpoint: `curl http://localhost:8010/mcp`
- Test FastAPI endpoints directly before using MCP
For more, see the [How-To: MCP Integration](../how-to/mcp-integration.md).
---
## API Endpoints
When running as an A2A server (`python -m fuzzforge_ai --port 10100`):
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/.well-known/agent-card.json` | GET | Agent capabilities |
| `/` | POST | A2A message processing |
| `/artifacts/{artifact_id}` | GET | Artifact file serving |
| `/health` | GET | Health check |
### Example: Agent Card Format
```json
{
"name": "FuzzForge",
"description": "Multi-agent orchestrator with memory and security tools",
"version": "1.0.0",
"url": "http://localhost:10100",
"protocolVersion": "0.3.0",
"preferredTransport": "JSONRPC",
"defaultInputModes": ["text/plain", "application/json"],
"defaultOutputModes": ["text/plain", "application/json"],
"capabilities": {
"streaming": false,
"pushNotifications": true,
"multiTurn": true,
"contextRetention": true
},
"skills": [
{
"id": "orchestration",
"name": "Agent Orchestration",
"description": "Route requests to appropriate agents",
"tags": ["orchestration", "routing"]
}
]
}
```
### Example: A2A Message Format
```json
{
"id": "msg_001",
"method": "agent.invoke",
"params": {
"message": {
"role": "user",
"parts": [
{
"type": "text",
"content": "Calculate factorial of 10"
}
]
},
"context": {
"sessionId": "session_abc123",
"conversationId": "conv_001"
}
}
}
```
---
## Project Structure Reference
```
project_root/
├── .fuzzforge/ # Project-local config
│ ├── .env # Environment variables
│ ├── config.json # Project configuration
│ ├── agents.yaml # Registered agents
│ ├── sessions.db # Session storage
│ ├── artifacts/ # Local artifact cache
│ └── data/ # Knowledge graphs
└── your_project_files...
```
### Agent Registry Example (`agents.yaml`)
```yaml
registered_agents:
- name: CalculatorAgent
url: http://localhost:10201
description: Mathematical calculations
- name: SecurityAnalyzer
url: http://localhost:10202
description: Code security analysis
```
---
## Quick Troubleshooting
- **Agent Registration Fails:** Check agent is running and accessible at its URL.
- **Memory Not Persisting:** Ensure `SESSION_PERSISTENCE=sqlite` and DB path is correct.
- **Files Not Found:** Use paths relative to project root.
- **Model API Errors:** Verify API key and model name.
+796
View File
@@ -0,0 +1,796 @@
# Common Patterns Cookbook 👨‍🍳
A collection of proven patterns and recipes for FuzzForge modules and workflows. Copy, paste, and adapt these examples to build your own security tools quickly!
## Module Patterns
### File Processing Patterns
#### Pattern 1: Selective File Scanner
```python
class SelectiveScanner(BaseModule):
"""Scan only specific file types with size limits"""
SUPPORTED_EXTENSIONS = {'.py', '.js', '.java', '.cpp', '.c', '.go', '.rs'}
DEFAULT_MAX_SIZE = 5 * 1024 * 1024 # 5MB
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
max_size = config.get('max_file_size', self.DEFAULT_MAX_SIZE)
extensions = set(config.get('extensions', self.SUPPORTED_EXTENSIONS))
findings = []
processed_files = 0
for file_path in workspace.rglob('*'):
if (file_path.is_file() and
file_path.suffix.lower() in extensions and
file_path.stat().st_size <= max_size):
try:
result = await self._process_file(file_path, workspace)
findings.extend(result)
processed_files += 1
except Exception as e:
# Log error but continue processing
logger.warning(f"Failed to process {file_path}: {e}")
return self.create_result(
findings=findings,
summary={'files_processed': processed_files}
)
```
#### Pattern 2: Content-Based File Analysis
```python
class ContentAnalyzer(BaseModule):
"""Analyze file content with encoding detection"""
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
findings = []
for file_path in workspace.rglob('*'):
if file_path.is_file():
content = await self._safe_read_file(file_path)
if content:
analysis_result = await self._analyze_content(content, file_path, workspace)
findings.extend(analysis_result)
return self.create_result(findings=findings)
async def _safe_read_file(self, file_path: Path) -> str:
"""Safely read file with encoding detection"""
try:
# Try UTF-8 first
return file_path.read_text(encoding='utf-8')
except UnicodeDecodeError:
try:
# Fall back to latin-1 for binary-like files
return file_path.read_text(encoding='latin-1', errors='ignore')
except Exception:
return ""
async def _analyze_content(self, content: str, file_path: Path, workspace: Path) -> List[ModuleFinding]:
"""Override this method in your specific analyzer"""
# Example: Find TODO comments
findings = []
lines = content.split('\n')
for i, line in enumerate(lines, 1):
if 'TODO' in line.upper():
findings.append(self.create_finding(
title="TODO comment found",
description=f"TODO comment: {line.strip()}",
severity="info",
category="code_quality",
file_path=str(file_path.relative_to(workspace)),
line_start=i,
code_snippet=line.strip()
))
return findings
```
#### Pattern 3: Directory Structure Analysis
```python
class StructureAnalyzer(BaseModule):
"""Analyze project directory structure"""
IMPORTANT_FILES = {
'README.md': 'documentation',
'LICENSE': 'legal',
'.gitignore': 'vcs',
'requirements.txt': 'dependencies',
'package.json': 'dependencies',
'Dockerfile': 'deployment'
}
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
findings = []
structure_analysis = {
'total_directories': 0,
'max_depth': 0,
'important_files_found': [],
'important_files_missing': []
}
# Analyze directory structure
for item in workspace.rglob('*'):
if item.is_dir():
structure_analysis['total_directories'] += 1
depth = len(item.relative_to(workspace).parts)
structure_analysis['max_depth'] = max(structure_analysis['max_depth'], depth)
# Check for important files
for filename, category in self.IMPORTANT_FILES.items():
file_path = workspace / filename
if file_path.exists():
structure_analysis['important_files_found'].append(filename)
else:
structure_analysis['important_files_missing'].append(filename)
findings.append(self.create_finding(
title=f"Missing {category} file",
description=f"Recommended file '{filename}' not found",
severity="info",
category=category,
metadata={'file_type': category, 'recommended_file': filename}
))
return self.create_result(
findings=findings,
summary=structure_analysis
)
```
### Configuration Patterns
#### Pattern 1: Schema-Based Configuration
```python
from pydantic import BaseModel, Field, validator
from enum import Enum
class SeverityLevel(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class ModuleConfig(BaseModel):
"""Type-safe configuration with validation"""
severity_threshold: SeverityLevel = SeverityLevel.MEDIUM
max_file_size_mb: int = Field(default=10, gt=0, le=100)
include_patterns: List[str] = Field(default=['**/*.py', '**/*.js'])
exclude_patterns: List[str] = Field(default=['**/node_modules/**', '**/.git/**'])
timeout_seconds: int = Field(default=300, gt=0, le=3600)
@validator('include_patterns')
def validate_patterns(cls, v):
if not v:
raise ValueError('At least one include pattern required')
return v
class ConfigurableModule(BaseModule):
def validate_config(self, config: Dict[str, Any]) -> bool:
try:
ModuleConfig(**config)
return True
except Exception:
return False
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
# Get validated configuration
validated_config = ModuleConfig(**config)
# Use type-safe configuration
max_size = validated_config.max_file_size_mb * 1024 * 1024
severity = validated_config.severity_threshold
# ... rest of implementation
```
#### Pattern 2: Configuration Templates
```python
class TemplateBasedModule(BaseModule):
"""Module with configuration templates"""
TEMPLATES = {
'quick': {
'max_file_size_mb': 5,
'timeout_seconds': 60,
'severity_threshold': 'medium'
},
'thorough': {
'max_file_size_mb': 50,
'timeout_seconds': 1800,
'severity_threshold': 'low'
},
'critical_only': {
'max_file_size_mb': 100,
'timeout_seconds': 3600,
'severity_threshold': 'critical'
}
}
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
# Load template if specified
template_name = config.get('template')
if template_name and template_name in self.TEMPLATES:
base_config = self.TEMPLATES[template_name].copy()
base_config.update(config) # Override template with specific config
config = base_config
# Continue with normal execution
return await self._execute_with_config(config, workspace)
```
### Error Handling Recipes
#### Pattern 1: Graceful Degradation
```python
class ResilientModule(BaseModule):
"""Module that handles errors gracefully"""
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
findings = []
errors = []
processed_files = 0
for file_path in workspace.rglob('*'):
if file_path.is_file():
try:
result = await self._analyze_file(file_path, workspace, config)
findings.extend(result)
processed_files += 1
except PermissionError as e:
errors.append({
'file': str(file_path.relative_to(workspace)),
'error': 'Permission denied',
'type': 'permission_error'
})
except UnicodeDecodeError as e:
errors.append({
'file': str(file_path.relative_to(workspace)),
'error': 'Encoding error',
'type': 'encoding_error'
})
except Exception as e:
errors.append({
'file': str(file_path.relative_to(workspace)),
'error': str(e),
'type': 'analysis_error'
})
# Determine overall status
total_files = processed_files + len(errors)
if len(errors) > total_files * 0.5: # More than 50% failed
status = "partial"
else:
status = "success"
return self.create_result(
findings=findings,
status=status,
summary={
'files_processed': processed_files,
'files_failed': len(errors),
'error_rate': len(errors) / total_files if total_files > 0 else 0
},
metadata={'errors': errors}
)
```
#### Pattern 2: Circuit Breaker
```python
import time
class CircuitBreakerModule(BaseModule):
"""Module with circuit breaker for expensive operations"""
def __init__(self):
super().__init__()
self.failure_count = 0
self.last_failure_time = 0
self.circuit_open = False
self.failure_threshold = 5
self.recovery_timeout = 60 # seconds
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
findings = []
for file_path in workspace.rglob('*'):
if file_path.is_file():
if self._is_circuit_open():
# Circuit is open, skip expensive operations
findings.append(self.create_finding(
title="Analysis skipped",
description="Circuit breaker is open due to previous failures",
severity="info",
category="system",
file_path=str(file_path.relative_to(workspace))
))
continue
try:
result = await self._expensive_analysis(file_path, workspace)
findings.extend(result)
self._on_success()
except Exception as e:
self._on_failure()
logger.warning(f"Analysis failed for {file_path}: {e}")
return self.create_result(findings=findings)
def _is_circuit_open(self) -> bool:
if not self.circuit_open:
return False
# Check if recovery timeout has passed
if time.time() - self.last_failure_time > self.recovery_timeout:
self.circuit_open = False
self.failure_count = 0
return False
return True
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.circuit_open = True
def _on_success(self):
if self.circuit_open:
self.circuit_open = False
self.failure_count = 0
```
### Performance Patterns
#### Pattern 1: Batch Processing
```python
import asyncio
from typing import List, AsyncGenerator
class BatchProcessor(BaseModule):
"""Process files in batches to control memory usage"""
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
batch_size = config.get('batch_size', 10)
findings = []
async for batch_findings in self._process_in_batches(workspace, batch_size, config):
findings.extend(batch_findings)
return self.create_result(findings=findings)
async def _process_in_batches(
self,
workspace: Path,
batch_size: int,
config: Dict[str, Any]
) -> AsyncGenerator[List[ModuleFinding], None]:
"""Process files in batches"""
files = [f for f in workspace.rglob('*') if f.is_file()]
for i in range(0, len(files), batch_size):
batch = files[i:i + batch_size]
batch_findings = []
for file_path in batch:
try:
result = await self._analyze_file(file_path, workspace, config)
batch_findings.extend(result)
except Exception as e:
logger.warning(f"Failed to process {file_path}: {e}")
yield batch_findings
```
#### Pattern 2: Concurrent Processing with Limits
```python
class ConcurrentProcessor(BaseModule):
"""Process files concurrently with semaphore limits"""
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
max_concurrent = config.get('max_concurrent', 5)
semaphore = asyncio.Semaphore(max_concurrent)
files = [f for f in workspace.rglob('*') if f.is_file()]
# Process files concurrently
tasks = [
self._process_file_with_semaphore(file_path, workspace, config, semaphore)
for file_path in files
]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Collect findings and handle exceptions
findings = []
for result in results:
if isinstance(result, list):
findings.extend(result)
elif isinstance(result, Exception):
logger.warning(f"Processing failed: {result}")
return self.create_result(findings=findings)
async def _process_file_with_semaphore(
self,
file_path: Path,
workspace: Path,
config: Dict[str, Any],
semaphore: asyncio.Semaphore
) -> List[ModuleFinding]:
"""Process a single file with semaphore protection"""
async with semaphore:
return await self._analyze_file(file_path, workspace, config)
```
## ⚡ Workflow Patterns
### Sequential Processing
```python
@flow(name="sequential_analysis")
async def sequential_workflow(target_path: str, **kwargs) -> Dict[str, Any]:
"""Execute analysis steps in sequence"""
workspace = Path(target_path)
# Step 1: File discovery
scanner_config = kwargs.get('scanner_config', {})
scan_results = await file_scan_task(workspace, scanner_config)
# Step 2: Analysis (depends on scan results)
analyzer_config = {
**kwargs.get('analyzer_config', {}),
'discovered_files': scan_results.get('summary', {}).get('total_files', 0)
}
analysis_results = await analysis_task(scan_results, workspace, analyzer_config)
# Step 3: Report generation (depends on analysis)
reporter_config = kwargs.get('reporter_config', {})
final_report = await report_task(analysis_results, workspace, reporter_config)
return final_report
```
### Parallel Execution
```python
@flow(name="parallel_analysis")
async def parallel_workflow(target_path: str, **kwargs) -> Dict[str, Any]:
"""Execute independent analyses in parallel"""
workspace = Path(target_path)
# Submit parallel tasks
static_future = static_analysis_task.submit(workspace, kwargs.get('static_config', {}))
secret_future = secret_detection_task.submit(workspace, kwargs.get('secret_config', {}))
license_future = license_check_task.submit(workspace, kwargs.get('license_config', {}))
# Wait for all to complete
static_results = await static_future.result()
secret_results = await secret_future.result()
license_results = await license_future.result()
# Combine results
combined_report = await combine_results_task(
[static_results, secret_results, license_results],
workspace,
kwargs.get('reporter_config', {})
)
return combined_report
```
### Conditional Logic
```python
@flow(name="conditional_analysis")
async def conditional_workflow(target_path: str, **kwargs) -> Dict[str, Any]:
"""Execute workflow with conditional branches"""
workspace = Path(target_path)
# Initial assessment
assessment = await quick_assessment_task(workspace)
# Branch based on project type
if assessment.get('project_type') == 'web_application':
# Web app specific analysis
web_results = await web_security_task(workspace, kwargs.get('web_config', {}))
final_results = web_results
elif assessment.get('project_type') == 'library':
# Library specific analysis
lib_results = await library_analysis_task(workspace, kwargs.get('lib_config', {}))
final_results = lib_results
else:
# Generic analysis
generic_results = await generic_analysis_task(workspace, kwargs.get('generic_config', {}))
final_results = generic_results
# Optional deep analysis for high-risk projects
if assessment.get('risk_level', 'low') in ['high', 'critical']:
deep_results = await deep_analysis_task(workspace, kwargs.get('deep_config', {}))
final_results = await merge_results_task(final_results, deep_results)
return final_results
```
### Data Transformation
```python
@task(name="filter_and_transform")
async def filter_transform_task(
raw_results: Dict[str, Any],
config: Dict[str, Any]
) -> Dict[str, Any]:
"""Filter and transform findings based on criteria"""
findings = raw_results.get('findings', [])
# Filter by severity
min_severity = config.get('min_severity', 'low')
severity_order = {'info': 0, 'low': 1, 'medium': 2, 'high': 3, 'critical': 4}
min_level = severity_order.get(min_severity, 0)
filtered_findings = [
f for f in findings
if severity_order.get(f.get('severity', 'info'), 0) >= min_level
]
# Group by category
categorized = {}
for finding in filtered_findings:
category = finding.get('category', 'other')
if category not in categorized:
categorized[category] = []
categorized[category].append(finding)
# Transform findings (add risk scores, priorities, etc.)
enriched_findings = []
for finding in filtered_findings:
enriched_finding = {
**finding,
'risk_score': calculate_risk_score(finding),
'priority': determine_priority(finding),
'remediation_effort': estimate_effort(finding)
}
enriched_findings.append(enriched_finding)
return {
'findings': enriched_findings,
'summary': {
'total_findings': len(enriched_findings),
'by_category': {k: len(v) for k, v in categorized.items()},
'by_severity': {
severity: len([f for f in enriched_findings if f.get('severity') == severity])
for severity in ['info', 'low', 'medium', 'high', 'critical']
}
}
}
```
## 🧪 Testing Patterns
### Pattern 1: Comprehensive Module Testing
```python
import pytest
import tempfile
from pathlib import Path
from unittest.mock import patch, AsyncMock
class TestMyModule:
@pytest.fixture
def temp_workspace(self):
with tempfile.TemporaryDirectory() as temp_dir:
workspace = Path(temp_dir)
# Create test files
(workspace / 'test.py').write_text('print("hello")')
(workspace / 'config.json').write_text('{"key": "value"}')
yield workspace
@pytest.fixture
def module(self):
return MyModule()
@pytest.fixture
def base_config(self):
return {
'max_file_size_mb': 10,
'severity_threshold': 'medium',
'timeout_seconds': 60
}
@pytest.mark.asyncio
async def test_execute_success(self, module, temp_workspace, base_config):
result = await module.execute(base_config, temp_workspace)
assert result.status == "success"
assert isinstance(result.findings, list)
assert isinstance(result.summary, dict)
assert 'total_files' in result.summary
@pytest.mark.asyncio
async def test_execute_empty_workspace(self, module, base_config):
with tempfile.TemporaryDirectory() as empty_dir:
result = await module.execute(base_config, Path(empty_dir))
assert result.summary['total_files'] == 0
assert len(result.findings) == 0
@pytest.mark.asyncio
async def test_config_validation(self, module):
assert module.validate_config({'max_file_size_mb': 10})
assert not module.validate_config({'max_file_size_mb': -1})
assert not module.validate_config({'max_file_size_mb': 'invalid'})
@pytest.mark.asyncio
async def test_error_handling(self, module, base_config):
with patch.object(module, '_analyze_file', side_effect=Exception("Test error")):
result = await module.execute(base_config, Path('/tmp'))
# Should handle errors gracefully
assert 'errors' in result.metadata
assert len(result.metadata['errors']) > 0
@pytest.mark.parametrize("severity,expected", [
('low', ['low', 'medium', 'high', 'critical']),
('medium', ['medium', 'high', 'critical']),
('high', ['high', 'critical']),
('critical', ['critical'])
])
async def test_severity_filtering(self, module, temp_workspace, severity, expected):
config = {'severity_threshold': severity}
result = await module.execute(config, temp_workspace)
found_severities = {f.severity for f in result.findings}
assert found_severities.issubset(set(expected))
```
## 🔧 Utility Functions
### File Type Detection
```python
def detect_file_type(file_path: Path) -> str:
"""Detect file type from extension and content"""
# Extension-based detection
extension_map = {
'.py': 'python',
'.js': 'javascript',
'.ts': 'typescript',
'.java': 'java',
'.cpp': 'cpp',
'.c': 'c',
'.go': 'go',
'.rs': 'rust',
'.json': 'json',
'.yaml': 'yaml',
'.yml': 'yaml',
'.xml': 'xml',
'.html': 'html',
'.css': 'css',
'.md': 'markdown',
'.txt': 'text'
}
file_type = extension_map.get(file_path.suffix.lower())
if file_type:
return file_type
# Content-based detection for files without extensions
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
first_line = f.readline().strip()
if first_line.startswith('#!'):
if 'python' in first_line:
return 'python'
elif 'bash' in first_line or 'sh' in first_line:
return 'shell'
elif 'node' in first_line:
return 'javascript'
if first_line.startswith('<?xml'):
return 'xml'
if first_line.startswith('<!DOCTYPE html') or first_line.startswith('<html'):
return 'html'
except Exception:
pass
return 'unknown'
```
### Risk Scoring
```python
def calculate_risk_score(finding: Dict[str, Any]) -> int:
"""Calculate numeric risk score for a finding"""
base_scores = {
'critical': 100,
'high': 75,
'medium': 50,
'low': 25,
'info': 10
}
severity = finding.get('severity', 'info')
base_score = base_scores.get(severity, 10)
# Adjust based on category
category_multipliers = {
'security': 1.0,
'vulnerability': 1.0,
'credential': 1.2,
'injection': 1.1,
'authentication': 1.1,
'authorization': 1.1,
'code_quality': 0.8,
'performance': 0.7,
'documentation': 0.5
}
category = finding.get('category', 'other')
multiplier = category_multipliers.get(category, 0.9)
# Adjust based on file location
file_path = finding.get('file_path', '')
if any(sensitive in file_path.lower() for sensitive in ['config', 'secret', 'password', 'key']):
multiplier *= 1.2
return int(base_score * multiplier)
```
### Finding Deduplication
```python
def deduplicate_findings(findings: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Remove duplicate findings based on title, file, and line"""
seen = set()
deduplicated = []
for finding in findings:
# Create unique key
key = (
finding.get('title', ''),
finding.get('file_path', ''),
finding.get('line_start', 0),
finding.get('category', '')
)
if key not in seen:
seen.add(key)
deduplicated.append(finding)
else:
# Update metadata to indicate duplication
for existing in deduplicated:
if (existing.get('title') == finding.get('title') and
existing.get('file_path') == finding.get('file_path')):
metadata = existing.setdefault('metadata', {})
metadata['duplicate_count'] = metadata.get('duplicate_count', 1) + 1
break
return deduplicated
```
---
**🎯 Next Steps**: Use these patterns as building blocks for your own modules and workflows. Mix and match patterns to create powerful security analysis tools!
+69
View File
@@ -0,0 +1,69 @@
# Contributing
Contributing is much appreciated.
## How to contribute
### Development environment setup
We recommand using the excellent [Based Pyright](https://docs.basedpyright.com/latest/) LSP, which is a fork of [pyright](https://github.com/microsoft/pyright) with various type checking improvements, pylance features and more. It is available in all major editors (VSCode, Vim, Emacs, Zed).
To work on the project, you will need to install `uv`. Check the [installation instructions](https://docs.astral.sh/uv/getting-started/installation/) for your platform.
We also recommand using Just to manage your development environment. Just is a command runner, similar to Make, but with a simpler syntax and more features. It is available in all major platforms. Check the [installation instructions](https://just.systems/man/en/) for your platform. We wrapped on number of useful commands in the `Justfile` at the root of the repository. You can see the available commands by running `just`.
### Code conventions
We try to follow the [Python Style Guide](https://www.python.org/dev/peps/pep-0008/) and [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md) for Python code. A linter and formatter is used to ensure that the code is consistent and follows the style guide. The linter and formatter used is [Ruff](https://docs.astral.sh/ruff/).
### Git usage
We use the [Conventional Commits 1.0.0](https://www.conventionalcommits.org/en/v1.0.0/) specification to format our commits.
As for our workflow, we use the following with the following branch names :
- main : `main` - for production code
- hotfix : `hotfix/<hotfix name>` - for urgent fixes to the production code
- develop : `dev` - for development
- feature : `feat/<feature name>` - for new features
- continuous integration : `ci/<ci name>` - for continuous integration related changes
- documentation : `docs/<documentation name>` - for documentation changes
- fix : `fix/<bug name>` - for bug fixes
- chore : `chore/<chore name>` - for changes that do not modify src or test files
- refactor : `refactor/<refactor name>` - for code refactoring
- perf : `perf/<performance name>` - for performance improvements
- test : `test/<test name>` - for adding or modifying tests
- build : `build/<build name>` - for build-related changes
- revert : `revert/<revert name>` - for reverting changes
- style : `style/<style name>` - for style-related changes
![Git workflow](./img/git-workflow.png)
In addition to the branching names, only `dev` and `hotfix` branches are allowed to be merged into `main`. All other branches must be merged into `dev` first.
The `dev` branch is the main development branch, and all new features and bug fixes should be merged into it. The `main` branch is the production branch, and only stable code should be merged into it.
The `hotfix` branch is used for urgent fixes to the production code, and should be merged into both `main` and `dev` branches.
This workflow is derived from the [Git Flow](https://nvie.com/posts/a-successful-git-branching-model/) workflow, with some modifications to fit our needs.
!!! note
Following theses conventions allows for an automatic CI to label pull requests and commits with the correct labels. This can be used to automatically generate the changelog and release notes, but mainly facilitates the review process.
### Testing
We use [pytest](https://docs.pytest.org/en/latest/) for unit and integration testing, and [PyTestArch](https://pypi.org/project/PyTestArch/) for architectural rules. The tests are located in the `tests` directory.
A test is required for every new feature and bug fix. The tests should be located in the `tests` directory of the corresponding module.
The tests should be run before merging any changes into the `dev` or `main` branches.
### Continuous integration
We use [GitHub Actions](https://docs.github.com/en/actions) for continuous integration. The CI workflow is located in the `.github/workflows` directory.
The CI workflow is triggered on every push to the `dev` and `main` branches, and on every pull request to the `dev` and `main` branches.
The CI workflow runs the tests and linter, and builds the documentation. The CI workflow is required to pass before merging any changes into the `dev` or `main` branches.
### Bug report
To-do
@@ -0,0 +1,64 @@
# {Title of solution to solve the problem}
## Context and problem statement
{Describe the context and problem in free form, using two to three sentences or in the form of an illustrative story.
You may want to articulate the problem in form of a question and add links to collaboration boards or issue management systems.}
<!-- This is an optional element. Feel free to remove. -->
## Decision Drivers
* {decision driver, e.g., a force, facing concern, ...}
* ...
## Considered Options
* [{title of option}](#{title of option})
* ...
## Decision Outcome
Chosen option: "{title of chosen option}", because
{justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
<!-- This is an optional element. Feel free to remove. -->
## Decision Revisit
Last revisit: {information about the last revisit e.g. never | {date} by {author}}
<!-- This is an optional element. Feel free to remove. -->
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* … <!-- numbers of consequences can vary -->
<!-- This is an optional element. Feel free to remove. -->
## Validation
{describe how the implementation of/compliance with the ADR is validated. E.g., by a review or an ArchUnit test}
<!-- This is an optional element. Feel free to remove. -->
## Pros and Cons of the Options
<!-- This is an repeated element per option. use when necessary. -->
### {title of option}
<!-- This is an optional element. Feel free to remove. -->
{example | description | pointer to more information | …}
* Good, because {argument a}
* Good, because {argument b}
<!-- use "neutral" if the given argument weights neither for good nor bad -->
* Neutral, because {argument c}
* Bad, because {argument d}
* ... <!-- numbers of pros and cons can vary -->
<!-- This is an optional element. Feel free to remove. -->
## More Information
{Provide additional evidence/confidence for the decision outcome here and/or
document the team agreement on the decision and/or
define when this decision when and how the decision should be realized and if/when it should be re-visited and/or
how the decision is validated.
Links to other decisions and resources might here appear as well.}
@@ -0,0 +1,17 @@
# Diataxis documentation
This project uses the [Diátaxis](https://diataxis.fr) technical documentation framework.
There are 4 main parts:
1. [Getting started (tutorials)](https://diataxis.fr/tutorials): learning-oriented
- Pages that contain tutorials needed to get people up and running, for instance [Getting Started](/intro.md).
2. [Concepts (explanation)](https://diataxis.fr/explanation): understanding-oriented
- Pages explaining concepts that are relevant to the domain, for instance [Working with documentation](../concept/working-with-documentation.md).
3. [How-to guides](https://diataxis.fr/how-to-guides): goal-oriented
- Pages that contain tutorials, for instance [How-to: start the local documentation server](#).
4. [Reference](https://diataxis.fr/reference): information-oriented
- Pages that contain reference information, for instance about [Diataxis documentation](../reference/diataxis-documentation.md).
## Working with Diátaxis documentation
See [working with Diátaxis documentation](../concept/working-with-documentation.md).
+17
View File
@@ -0,0 +1,17 @@
# {Title}
**Status**: active
## Description
{Provide a detailed description of the issue, include things such as how to identify this particular issue}
<!-- This is an optional element. Feel free to remove. -->
## Impact
{Describe the impact of the issue on users or the system.}
<!-- This is an optional element. Feel free to remove. -->
## Workaround
{If available, describe any possible workarounds to mitigate the issue until it is resolved.}
Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

@@ -0,0 +1,257 @@
# Static Analysis Workflow Reference
The Static Analysis workflow in FuzzForge helps you find vulnerabilities, code quality issues, and compliance problems—before they reach production. This workflow uses multiple Static Application Security Testing (SAST) tools to analyze your source code without executing it, providing fast, actionable feedback in a standardized format.
---
## What Does This Workflow Do?
- **Workflow ID:** `static_analysis_scan`
- **Primary Tools:** Semgrep (multi-language), Bandit (Python)
- **Supported Languages:** Python, JavaScript, Java, Go, C/C++, PHP, Ruby, and more
- **Typical Duration:** 15 minutes (varies by codebase size)
- **Output Format:** SARIF 2.1.0 (industry standard)
---
## How Does It Work?
The workflow orchestrates multiple SAST tools in a containerized environment:
- **Semgrep:** Pattern-based static analysis for 30+ languages, with rule sets for OWASP Top 10, CWE Top 25, and more.
- **Bandit:** Python-specific security scanner, focused on issues like hardcoded secrets, injection, and unsafe code patterns.
Each tool runs independently, and their findings are merged and normalized into a single SARIF report.
---
## How to Use the Static Analysis Workflow
### Basic Usage
**CLI:**
```bash
fuzzforge runs submit static_analysis_scan /path/to/your/project
```
**API:**
```bash
curl -X POST "http://localhost:8000/workflows/static_analysis_scan/submit" \
-H "Content-Type: application/json" \
-d '{"target_path": "/path/to/your/project"}'
```
### Advanced Configuration
You can fine-tune the workflow by passing parameters for each tool:
**CLI:**
```bash
fuzzforge runs submit static_analysis_scan /path/to/project \
--parameters '{
"semgrep_config": {
"rules": ["p/security-audit", "owasp-top-ten"],
"severity": ["ERROR", "WARNING"],
"exclude_patterns": ["test/*", "vendor/*", "node_modules/*"]
},
"bandit_config": {
"confidence": "MEDIUM",
"severity": "MEDIUM",
"exclude_dirs": ["tests", "migrations"]
}
}'
```
**API:**
```json
{
"target_path": "/path/to/project",
"parameters": {
"semgrep_config": {
"rules": ["p/security-audit"],
"languages": ["python", "javascript"],
"severity": ["ERROR", "WARNING"],
"exclude_patterns": ["*.test.js", "test_*.py", "vendor/*"]
},
"bandit_config": {
"confidence": "MEDIUM",
"severity": "LOW",
"tests": ["B201", "B301"],
"exclude_dirs": ["tests", ".git"]
}
}
}
```
---
## Configuration Reference
### Semgrep Parameters
| Parameter | Type | Default | Description |
|-------------------|-----------|-------------------------------|---------------------------------------------|
| `rules` | array | `"auto"` | Rule sets to use (e.g., `"p/security-audit"`)|
| `languages` | array | `null` | Languages to analyze |
| `severity` | array | `["ERROR", "WARNING", "INFO"]`| Severities to include |
| `exclude_patterns`| array | `[]` | File patterns to exclude |
| `include_patterns`| array | `[]` | File patterns to include |
| `max_target_bytes`| integer | `1000000` | Max file size to analyze (bytes) |
| `timeout` | integer | `300` | Tool timeout (seconds) |
### Bandit Parameters
| Parameter | Type | Default | Description |
|-------------------|-----------|-------------------------------|---------------------------------------------|
| `confidence` | string | `"LOW"` | Minimum confidence (`"LOW"`, `"MEDIUM"`, `"HIGH"`) |
| `severity` | string | `"LOW"` | Minimum severity (`"LOW"`, `"MEDIUM"`, `"HIGH"`) |
| `tests` | array | `null` | Specific test IDs to run |
| `exclude_dirs` | array | `["tests", ".git"]` | Directories to exclude |
| `aggregate` | string | `"file"` | Aggregation mode (`"file"`, `"vuln"`) |
| `context_lines` | integer | `3` | Context lines around findings |
---
## What Can It Detect?
### Vulnerability Categories
- **OWASP Top 10:** Broken Access Control, Injection, Security Misconfiguration, etc.
- **CWE Top 25:** SQL Injection, XSS, Command Injection, Information Exposure, etc.
- **Language-Specific:** Python (unsafe eval, Django/Flask issues), JavaScript (XSS, prototype pollution), Java (deserialization), Go (race conditions), C/C++ (buffer overflows).
### Example Detections
**SQL Injection (Python)**
```python
query = f"SELECT * FROM users WHERE id = {user_id}" # CWE-89
```
*Recommendation: Use parameterized queries.*
**Command Injection (Python)**
```python
os.system(f"cp {filename} backup/") # CWE-78
```
*Recommendation: Use subprocess with argument arrays.*
**XSS (JavaScript)**
```javascript
element.innerHTML = userInput; // CWE-79
```
*Recommendation: Use textContent or sanitize input.*
---
## Output Format
All results are returned in SARIF 2.1.0 format, which is supported by many IDEs and security tools.
**Summary Example:**
```json
{
"workflow": "static_analysis_scan",
"status": "completed",
"total_findings": 18,
"severity_counts": {
"critical": 0,
"high": 6,
"medium": 5,
"low": 7
},
"tool_counts": {
"semgrep": 12,
"bandit": 6
}
}
```
**Finding Example:**
```json
{
"ruleId": "bandit.B608",
"level": "error",
"message": {
"text": "Possible SQL injection vector through string-based query construction"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "src/database.py"
},
"region": {
"startLine": 42,
"startColumn": 15,
"endLine": 42,
"endColumn": 65
}
}
}
],
"properties": {
"severity": "high",
"category": "sql_injection",
"cwe": "CWE-89",
"confidence": "high",
"tool": "bandit"
}
}
```
---
## Performance Tips
- For large codebases, increase `max_target_bytes` and `timeout` as needed.
- Exclude large generated or dependency directories (`vendor/`, `node_modules/`, `dist/`).
- Run focused scans on changed files for faster CI/CD feedback.
---
## Integration Examples
### GitHub Actions
```yaml
- name: Run Static Analysis
run: |
curl -X POST "${{ secrets.FUZZFORGE_URL }}/workflows/static_analysis_scan/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "${{ github.workspace }}",
"parameters": {
"semgrep_config": {"severity": ["ERROR", "WARNING"]},
"bandit_config": {"confidence": "MEDIUM"}
}
}'
```
### Pre-commit Hook
```bash
fuzzforge runs submit static_analysis_scan . --wait --json > /tmp/analysis.json
HIGH_ISSUES=$(jq '.sarif.severity_counts.high // 0' /tmp/analysis.json)
if [ "$HIGH_ISSUES" -gt 0 ]; then
echo "❌ Found $HIGH_ISSUES high-severity security issues. Commit blocked."
exit 1
fi
```
---
## Best Practices
- **Target the right code:** Focus on your main source directories, not dependencies or build artifacts.
- **Start broad, then refine:** Use default rule sets first, then add exclusions or custom rules as needed.
- **Triage findings:** Address high-severity issues first, and document false positives for future runs.
- **Monitor trends:** Track your security posture over time to measure improvement.
- **Optimize for speed:** Use file size limits and timeouts for very large projects.
---
## Troubleshooting
- **No Python files found:** Bandit will report zero findings if your project isnt Python, this is normal.
- **High memory usage:** Exclude large files and directories, or increase Docker memory limits.
- **Slow scans:** Use inclusion/exclusion patterns and increase timeouts for big repos.
- **Workflow errors:** See the [Troubleshooting Guide](/how-to/troubleshooting.md) for help with registry, Docker, or workflow issues.