Files
fuzzforge_ai/fuzzforge-modules/harness-tester

Harness Tester Module

Tests and evaluates fuzz harnesses with comprehensive feedback for AI-driven iteration.

Overview

The harness-tester module runs a battery of tests on fuzz harnesses to provide actionable feedback:

  1. Compilation Testing - Validates harness compiles correctly
  2. Execution Testing - Ensures harness runs without immediate crashes
  3. Fuzzing Trial - Runs short fuzzing session (default: 30s) to measure:
    • Coverage growth
    • Execution performance (execs/sec)
    • Stability (crashes, hangs)
  4. Quality Assessment - Generates scored evaluation with specific issues and suggestions

Feedback Categories

1. Compilation Feedback

  • Undefined variables → "Check variable names match function signature"
  • Type mismatches → "Convert fuzzer input to correct type"
  • Missing traits → "Ensure you're using correct types"

2. Execution Feedback

  • Stack overflow → "Check for infinite recursion, use heap allocation"
  • Immediate panic → "Check initialization code and input validation"
  • Timeout/infinite loop → "Add iteration limits"

3. Coverage Feedback

  • No coverage → "Harness may not be using fuzzer input"
  • Very low coverage (<5%) → "May not be reaching target code, check entry point"
  • Low coverage (<20%) → "Try fuzzing multiple entry points"
  • Good/Excellent coverage → "Harness is exploring code paths well"

4. Performance Feedback

  • Extremely slow (<10 execs/s) → "Remove file I/O or network operations"
  • Slow (<100 execs/s) → "Optimize harness, avoid allocations in hot path"
  • Good (>500 execs/s) → Ready for production
  • Excellent (>1000 execs/s) → Optimal performance

5. Stability Feedback

  • Frequent crashes → "Add error handling for edge cases"
  • Hangs detected → "Add timeouts to prevent infinite loops"
  • Stable → Ready for production

Usage

# Via MCP
result = execute_module("harness-tester",
    assets_path="/path/to/rust/project",
    configuration={
        "trial_duration_sec": 30,
        "execution_timeout_sec": 10
    })

Input Requirements

  • Rust project with Cargo.toml
  • Fuzz harnesses in fuzz/fuzz_targets/
  • Source code to analyze

Output Artifacts

harness-evaluation.json

Complete structured evaluation with:

{
  "harnesses": [
    {
      "name": "fuzz_png_decode",
      "compilation": { "success": true, "time_ms": 4523 },
      "execution": { "success": true },
      "fuzzing_trial": {
        "coverage": {
          "final_edges": 891,
          "growth_rate": "good",
          "percentage_estimate": 67.0
        },
        "performance": {
          "execs_per_sec": 1507.0,
          "performance_rating": "excellent"
        },
        "stability": { "status": "stable" }
      },
      "quality": {
        "score": 85,
        "verdict": "production-ready",
        "issues": [],
        "strengths": ["Excellent performance", "Good coverage"],
        "recommended_actions": ["Ready for production fuzzing"]
      }
    }
  ],
  "summary": {
    "total_harnesses": 1,
    "production_ready": 1,
    "average_score": 85.0
  }
}

feedback-summary.md

Human-readable summary with all issues and suggestions.

Quality Scoring

Harnesses are scored 0-100 based on:

  • Compilation (20 points): Must compile to proceed
  • Execution (20 points): Must run without crashing
  • Coverage (40 points):
    • Excellent growth: 40 pts
    • Good growth: 30 pts
    • Poor growth: 10 pts
  • Performance (25 points):
    • 1000 execs/s: 25 pts

    • 500 execs/s: 20 pts

    • 100 execs/s: 10 pts

  • Stability (15 points):
    • Stable: 15 pts
    • Unstable: 10 pts
    • Crashes frequently: 5 pts

Verdicts:

  • 70-100: production-ready
  • 30-69: needs-improvement
  • 0-29: broken

AI Agent Iteration Pattern

1. AI generates harness
2. harness-tester evaluates it
3. Returns: score=35, verdict="needs-improvement"
   Issues: "Low coverage (8%), slow execution (7.8 execs/s)"
   Suggestions: "Check entry point function, remove I/O operations"
4. AI fixes harness based on feedback
5. harness-tester re-evaluates
6. Returns: score=85, verdict="production-ready"
7. Proceed to production fuzzing

Configuration Options

Option Default Description
trial_duration_sec 30 How long to run fuzzing trial
execution_timeout_sec 10 Timeout for execution test

See Also