CalvinBackup/fuzzforge_ai

Fork 0

mirror of https://github.com/FuzzingLabs/fuzzforge_ai.git synced 2026-02-12 17:12:46 +00:00

Files

AFredefon 8b8662d7af feat(modules): add harness-tester module for Rust fuzzing pipeline

2026-02-03 18:12:28 +01:00

4.5 KiB

Raw Blame History

Harness Tester Module

Tests and evaluates fuzz harnesses with comprehensive feedback for AI-driven iteration.

Overview

The harness-tester module runs a battery of tests on fuzz harnesses to provide actionable feedback:

Compilation Testing - Validates harness compiles correctly
Execution Testing - Ensures harness runs without immediate crashes
Fuzzing Trial - Runs short fuzzing session (default: 30s) to measure:
- Coverage growth
- Execution performance (execs/sec)
- Stability (crashes, hangs)
Quality Assessment - Generates scored evaluation with specific issues and suggestions

Feedback Categories

1. Compilation Feedback

Undefined variables → "Check variable names match function signature"
Type mismatches → "Convert fuzzer input to correct type"
Missing traits → "Ensure you're using correct types"

2. Execution Feedback

Stack overflow → "Check for infinite recursion, use heap allocation"
Immediate panic → "Check initialization code and input validation"
Timeout/infinite loop → "Add iteration limits"

3. Coverage Feedback

No coverage → "Harness may not be using fuzzer input"
Very low coverage (<5%) → "May not be reaching target code, check entry point"
Low coverage (<20%) → "Try fuzzing multiple entry points"
Good/Excellent coverage → "Harness is exploring code paths well"

4. Performance Feedback

Extremely slow (<10 execs/s) → "Remove file I/O or network operations"
Slow (<100 execs/s) → "Optimize harness, avoid allocations in hot path"
Good (>500 execs/s) → Ready for production
Excellent (>1000 execs/s) → Optimal performance

5. Stability Feedback

Frequent crashes → "Add error handling for edge cases"
Hangs detected → "Add timeouts to prevent infinite loops"
Stable → Ready for production

Usage

# Via MCP
result = execute_module("harness-tester",
    assets_path="/path/to/rust/project",
    configuration={
        "trial_duration_sec": 30,
        "execution_timeout_sec": 10
    })

Input Requirements

Rust project with Cargo.toml
Fuzz harnesses in fuzz/fuzz_targets/
Source code to analyze

Output Artifacts

`harness-evaluation.json`

Complete structured evaluation with:

{
  "harnesses": [
    {
      "name": "fuzz_png_decode",
      "compilation": { "success": true, "time_ms": 4523 },
      "execution": { "success": true },
      "fuzzing_trial": {
        "coverage": {
          "final_edges": 891,
          "growth_rate": "good",
          "percentage_estimate": 67.0
        },
        "performance": {
          "execs_per_sec": 1507.0,
          "performance_rating": "excellent"
        },
        "stability": { "status": "stable" }
      },
      "quality": {
        "score": 85,
        "verdict": "production-ready",
        "issues": [],
        "strengths": ["Excellent performance", "Good coverage"],
        "recommended_actions": ["Ready for production fuzzing"]
      }
    }
  ],
  "summary": {
    "total_harnesses": 1,
    "production_ready": 1,
    "average_score": 85.0
  }
}

`feedback-summary.md`

Human-readable summary with all issues and suggestions.

Quality Scoring

Harnesses are scored 0-100 based on:

Compilation (20 points): Must compile to proceed
Execution (20 points): Must run without crashing
Coverage (40 points):
- Excellent growth: 40 pts
- Good growth: 30 pts
- Poor growth: 10 pts
Performance (25 points):
- 1000 execs/s: 25 pts
- 500 execs/s: 20 pts
- 100 execs/s: 10 pts
Stability (15 points):
- Stable: 15 pts
- Unstable: 10 pts
- Crashes frequently: 5 pts

Verdicts:

70-100: production-ready
30-69: needs-improvement
0-29: broken

AI Agent Iteration Pattern

1. AI generates harness
2. harness-tester evaluates it
3. Returns: score=35, verdict="needs-improvement"
   Issues: "Low coverage (8%), slow execution (7.8 execs/s)"
   Suggestions: "Check entry point function, remove I/O operations"
4. AI fixes harness based on feedback
5. harness-tester re-evaluates
6. Returns: score=85, verdict="production-ready"
7. Proceed to production fuzzing

Configuration Options

Option	Default	Description
`trial_duration_sec`	30	How long to run fuzzing trial
`execution_timeout_sec`	10	Timeout for execution test

4.5 KiB Raw Blame History