mirror of https://github.com/FuzzingLabs/fuzzforge_ai.git synced 2026-02-12 20:32:46 +00:00

Files

tduhamel42 54738ca091 fix: Add benchmark results files to git

- Added exception in .gitignore for benchmark results directory
- Force-added comparison_report.md and comparison_results.json
- These files contain benchmark metrics, not actual secrets
- Fixes broken link in README to benchmark results

2025-10-17 10:02:39 +02:00

by_category

fix: Add benchmark results files to git

2025-10-17 10:02:39 +02:00

category_configs.py

CI/CD Integration with Ephemeral Deployment Model (#14 )

2025-10-14 10:13:45 +02:00

conftest.py

CI/CD Integration with Ephemeral Deployment Model (#14 )

2025-10-14 10:13:45 +02:00

README.md

CI/CD Integration with Ephemeral Deployment Model (#14 )

2025-10-14 10:13:45 +02:00

README.md

FuzzForge Benchmark Suite

Performance benchmarking infrastructure organized by module category.

Directory Structure

benchmarks/
├── conftest.py              # Benchmark fixtures
├── category_configs.py      # Category-specific thresholds
├── by_category/             # Benchmarks organized by category
│   ├── fuzzer/
│   │   ├── bench_cargo_fuzz.py
│   │   └── bench_atheris.py
│   ├── scanner/
│   │   └── bench_file_scanner.py
│   ├── secret_detection/
│   │   ├── bench_gitleaks.py
│   │   └── bench_trufflehog.py
│   └── analyzer/
│       └── bench_security_analyzer.py
├── fixtures/                # Benchmark test data
│   ├── small/               # ~1K LOC
│   ├── medium/              # ~10K LOC
│   └── large/               # ~100K LOC
└── results/                 # Benchmark results (JSON)

Module Categories

Fuzzer

Expected Metrics: execs/sec, coverage_rate, time_to_crash, memory_usage

Performance Thresholds:

Min 1000 execs/sec
Max 10s for small projects
Max 2GB memory

Scanner

Expected Metrics: files/sec, LOC/sec, findings_count

Performance Thresholds:

Min 100 files/sec
Min 10K LOC/sec
Max 512MB memory

Secret Detection

Expected Metrics: patterns/sec, precision, recall, F1

Performance Thresholds:

Min 90% precision
Min 95% recall
Max 5 false positives per 100 secrets

Analyzer

Expected Metrics: analysis_depth, files/sec, accuracy

Performance Thresholds:

Min 10 files/sec (deep analysis)
Min 85% accuracy
Max 2GB memory

Running Benchmarks

All Benchmarks

cd backend
pytest benchmarks/ --benchmark-only -v

Specific Category

pytest benchmarks/by_category/fuzzer/ --benchmark-only -v

With Comparison

# Run and save baseline
pytest benchmarks/ --benchmark-only --benchmark-save=baseline

# Compare against baseline
pytest benchmarks/ --benchmark-only --benchmark-compare=baseline

Generate Histogram

pytest benchmarks/ --benchmark-only --benchmark-histogram=histogram

Benchmark Results

Results are saved as JSON and include:

Mean execution time
Standard deviation
Min/Max values
Iterations per second
Memory usage

Example output:

------------------------ benchmark: fuzzer --------------------------
Name                                Mean      StdDev    Ops/Sec
bench_cargo_fuzz[discovery]        0.0012s   0.0001s   833.33
bench_cargo_fuzz[execution]        0.1250s   0.0050s     8.00
bench_cargo_fuzz[memory]           0.0100s   0.0005s   100.00
---------------------------------------------------------------------

CI/CD Integration

Benchmarks run:

Nightly: Full benchmark suite, track trends
On PR: When benchmarks/ or modules/ changed
Manual: Via workflow_dispatch

Regression Detection

Benchmarks automatically fail if:

Performance degrades >10%
Memory usage exceeds thresholds
Throughput drops below minimum

See .github/workflows/benchmark.yml for configuration.

Adding New Benchmarks

1. Create benchmark file in category directory

# benchmarks/by_category/fuzzer/bench_new_fuzzer.py

import pytest
from benchmarks.category_configs import ModuleCategory, get_threshold

@pytest.mark.benchmark(group="fuzzer")
def test_execution_performance(benchmark, new_fuzzer, test_workspace):
    """Benchmark execution speed"""
    result = benchmark(new_fuzzer.execute, config, test_workspace)

    # Validate against threshold
    threshold = get_threshold(ModuleCategory.FUZZER, "max_execution_time_small")
    assert result.execution_time < threshold

2. Update category_configs.py if needed

Add new thresholds or metrics for your module.

3. Run locally

pytest benchmarks/by_category/fuzzer/bench_new_fuzzer.py --benchmark-only -v

Best Practices

Use mocking for external dependencies (network, disk I/O)
Fixed iterations for consistent benchmarking
Warm-up runs for JIT-compiled code
Category-specific metrics aligned with module purpose
Realistic fixtures that represent actual use cases
Memory profiling using tracemalloc
Compare apples to apples within the same category

Interpreting Results

Good Performance

✅ Execution time below threshold
✅ Memory usage within limits
✅ Throughput meets minimum
✅ <5% variance across runs

Performance Issues

⚠️ Execution time 10-20% over threshold
❌ Execution time >20% over threshold
❌ Memory leaks (increasing over iterations)
❌ High variance (>10%) indicates instability

Tracking Performance Over Time

Benchmark results are stored as artifacts with:

Commit SHA
Timestamp
Environment details (Python version, OS)
Full metrics

Use these to track long-term performance trends and detect gradual degradation.