mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-02-12 20:32:46 +00:00
Add comprehensive benchmark dataset with 32 documented secrets for testing secret detection workflows (gitleaks, trufflehog, llm_secret_detection). - Add test_projects/secret_detection_benchmark/ with 19 test files - Add ground truth JSON with precise line-by-line secret mappings - Update .gitignore with exceptions for benchmark files (not real secrets) Dataset breakdown: - 12 Easy secrets (standard patterns) - 10 Medium secrets (obfuscated) - 10 Hard secrets (well hidden)
Secret Detection Benchmark Dataset
Ground truth dataset with exactly 32 known secrets for testing secret detection tools.
Contents
- 12 Easy Secrets: Standard patterns (AWS keys, GitHub PATs, Stripe keys, etc.)
- 10 Medium Secrets: Slightly obfuscated (Base64, hex, concatenated, in comments)
- 10 Hard Secrets: Well hidden (ROT13, binary, XOR, reversed, template strings)
Files
├── .env # 2 secrets
├── config/
│ ├── settings.py # 3 secrets
│ ├── database.yaml # 1 secret
│ ├── app.properties # 1 secret
│ ├── oauth.json # 1 secret
│ ├── keys.yaml # 2 secrets
│ └── legacy.ini # 2 secrets
├── src/
│ ├── app.py # 1 secret
│ ├── Main.java # 1 secret
│ ├── config.py # 3 secrets (medium difficulty)
│ ├── obfuscated.py # 4 secrets (hard difficulty)
│ ├── advanced.js # 4 secrets (hard difficulty)
│ ├── Crypto.go # 2 secrets (hard difficulty)
│ └── database.sql # 1 secret
├── scripts/
│ ├── webhook.js # 1 secret
│ └── deploy.sh # 2 secrets
└── id_rsa # 1 secret
Total: 17 files with 32 secrets
Secret Difficulty Breakdown
Easy (12 secrets)
Should be detected by any decent secret scanner:
- Plain AWS access keys
- GitHub Personal Access Tokens
- Stripe API keys
- Database passwords in plain text
- JWT secrets
- SSH private keys
- OAuth secrets
- Slack webhooks
Medium (10 secrets)
Requires some parsing or contextual understanding:
- Base64 encoded AWS key
- Hex-encoded tokens
- Split strings concatenated at runtime
- URL-encoded passwords
- Multi-line private keys in YAML
- Secrets with Unicode characters
- Secrets in SQL/shell comments
- Deprecated config formats
Hard (10 secrets)
Well hidden, may challenge even advanced tools:
- ROT13 encoded secrets
- Binary string representations
- Character array joins
- Reversed strings
- Template string constructs
- Secrets in regex patterns
- XOR encrypted values
- Escaped JSON within strings
- Heredoc patterns
- Intentional typos corrected programmatically
Usage
Run secret detection tools against this directory and compare results to the ground truth file (located in backend/benchmarks/by_category/secret_detection/secret_detection_benchmark_GROUND_TRUTH.json) to calculate:
- Precision: TP / (TP + FP) - How many detected secrets are real?
- Recall: TP / (TP + FN) - How many real secrets were found?
- F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
Expected Performance
| Tool Type | Expected Easy | Expected Medium | Expected Hard | Total Expected |
|---|---|---|---|---|
| Pattern-based (Gitleaks) | 12/12 (100%) | 6-8/10 (60-80%) | 2-4/10 (20-40%) | 20-24/32 |
| Entropy-based (TruffleHog) | 12/12 (100%) | 5-7/10 (50-70%) | 1-3/10 (10-30%) | 18-22/32 |
| LLM-based | 12/12 (100%) | 8-10/10 (80-100%) | 4-8/10 (40-80%) | 24-30/32 |
Validation
Use the validation script to check tool performance:
python validate_ground_truth.py --tool-output results.json
This will calculate precision, recall, and F1 score against the ground truth.