Files
fuzzforge_ai/test_projects/secret_detection_benchmark/README.md
tduhamel42 3be4d34531 test: Add secret detection benchmark dataset and ground truth
Add comprehensive benchmark dataset with 32 documented secrets for testing
secret detection workflows (gitleaks, trufflehog, llm_secret_detection).

- Add test_projects/secret_detection_benchmark/ with 19 test files
- Add ground truth JSON with precise line-by-line secret mappings
- Update .gitignore with exceptions for benchmark files (not real secrets)

Dataset breakdown:
- 12 Easy secrets (standard patterns)
- 10 Medium secrets (obfuscated)
- 10 Hard secrets (well hidden)
2025-10-16 11:46:28 +02:00

3.4 KiB
Raw Blame History

Secret Detection Benchmark Dataset

Ground truth dataset with exactly 32 known secrets for testing secret detection tools.

Contents

  • 12 Easy Secrets: Standard patterns (AWS keys, GitHub PATs, Stripe keys, etc.)
  • 10 Medium Secrets: Slightly obfuscated (Base64, hex, concatenated, in comments)
  • 10 Hard Secrets: Well hidden (ROT13, binary, XOR, reversed, template strings)

Files

├── .env                        # 2 secrets
├── config/
│   ├── settings.py            # 3 secrets
│   ├── database.yaml          # 1 secret
│   ├── app.properties         # 1 secret
│   ├── oauth.json             # 1 secret
│   ├── keys.yaml              # 2 secrets
│   └── legacy.ini             # 2 secrets
├── src/
│   ├── app.py                 # 1 secret
│   ├── Main.java              # 1 secret
│   ├── config.py              # 3 secrets (medium difficulty)
│   ├── obfuscated.py          # 4 secrets (hard difficulty)
│   ├── advanced.js            # 4 secrets (hard difficulty)
│   ├── Crypto.go              # 2 secrets (hard difficulty)
│   └── database.sql           # 1 secret
├── scripts/
│   ├── webhook.js             # 1 secret
│   └── deploy.sh              # 2 secrets
└── id_rsa                     # 1 secret

Total: 17 files with 32 secrets

Secret Difficulty Breakdown

Easy (12 secrets)

Should be detected by any decent secret scanner:

  • Plain AWS access keys
  • GitHub Personal Access Tokens
  • Stripe API keys
  • Database passwords in plain text
  • JWT secrets
  • SSH private keys
  • OAuth secrets
  • Slack webhooks

Medium (10 secrets)

Requires some parsing or contextual understanding:

  • Base64 encoded AWS key
  • Hex-encoded tokens
  • Split strings concatenated at runtime
  • URL-encoded passwords
  • Multi-line private keys in YAML
  • Secrets with Unicode characters
  • Secrets in SQL/shell comments
  • Deprecated config formats

Hard (10 secrets)

Well hidden, may challenge even advanced tools:

  • ROT13 encoded secrets
  • Binary string representations
  • Character array joins
  • Reversed strings
  • Template string constructs
  • Secrets in regex patterns
  • XOR encrypted values
  • Escaped JSON within strings
  • Heredoc patterns
  • Intentional typos corrected programmatically

Usage

Run secret detection tools against this directory and compare results to the ground truth file (located in backend/benchmarks/by_category/secret_detection/secret_detection_benchmark_GROUND_TRUTH.json) to calculate:

  • Precision: TP / (TP + FP) - How many detected secrets are real?
  • Recall: TP / (TP + FN) - How many real secrets were found?
  • F1 Score: 2 × (Precision × Recall) / (Precision + Recall)

Expected Performance

Tool Type Expected Easy Expected Medium Expected Hard Total Expected
Pattern-based (Gitleaks) 12/12 (100%) 6-8/10 (60-80%) 2-4/10 (20-40%) 20-24/32
Entropy-based (TruffleHog) 12/12 (100%) 5-7/10 (50-70%) 1-3/10 (10-30%) 18-22/32
LLM-based 12/12 (100%) 8-10/10 (80-100%) 4-8/10 (40-80%) 24-30/32

Validation

Use the validation script to check tool performance:

python validate_ground_truth.py --tool-output results.json

This will calculate precision, recall, and F1 score against the ground truth.