# Secret Detection Benchmark Dataset

Ground truth dataset with **exactly 32 known secrets** for testing secret detection tools.

## Contents

- **12 Easy Secrets**: Standard patterns (AWS keys, GitHub PATs, Stripe keys, etc.)
- **10 Medium Secrets**: Slightly obfuscated (Base64, hex, concatenated, in comments)
- **10 Hard Secrets**: Well hidden (ROT13, binary, XOR, reversed, template strings)

## Files

```
├── .env                        # 2 secrets
├── config/
│   ├── settings.py            # 3 secrets
│   ├── database.yaml          # 1 secret
│   ├── app.properties         # 1 secret
│   ├── oauth.json             # 1 secret
│   ├── keys.yaml              # 2 secrets
│   └── legacy.ini             # 2 secrets
├── src/
│   ├── app.py                 # 1 secret
│   ├── Main.java              # 1 secret
│   ├── config.py              # 3 secrets (medium difficulty)
│   ├── obfuscated.py          # 4 secrets (hard difficulty)
│   ├── advanced.js            # 4 secrets (hard difficulty)
│   ├── Crypto.go              # 2 secrets (hard difficulty)
│   └── database.sql           # 1 secret
├── scripts/
│   ├── webhook.js             # 1 secret
│   └── deploy.sh              # 2 secrets
└── id_rsa                     # 1 secret

Total: 17 files with 32 secrets
```

## Secret Difficulty Breakdown

### Easy (12 secrets)
Should be detected by any decent secret scanner:
- Plain AWS access keys
- GitHub Personal Access Tokens
- Stripe API keys
- Database passwords in plain text
- JWT secrets
- SSH private keys
- OAuth secrets
- Slack webhooks

### Medium (10 secrets)
Requires some parsing or contextual understanding:
- Base64 encoded AWS key
- Hex-encoded tokens
- Split strings concatenated at runtime
- URL-encoded passwords
- Multi-line private keys in YAML
- Secrets with Unicode characters
- Secrets in SQL/shell comments
- Deprecated config formats

### Hard (10 secrets)
Well hidden, may challenge even advanced tools:
- ROT13 encoded secrets
- Binary string representations
- Character array joins
- Reversed strings
- Template string constructs
- Secrets in regex patterns
- XOR encrypted values
- Escaped JSON within strings
- Heredoc patterns
- Intentional typos corrected programmatically

## Usage

Run secret detection tools against this directory and compare results to the ground truth file (located in `backend/benchmarks/by_category/secret_detection/secret_detection_benchmark_GROUND_TRUTH.json`) to calculate:

- **Precision**: TP / (TP + FP) - How many detected secrets are real?
- **Recall**: TP / (TP + FN) - How many real secrets were found?
- **F1 Score**: 2 × (Precision × Recall) / (Precision + Recall)

### Expected Performance

| Tool Type | Expected Easy | Expected Medium | Expected Hard | Total Expected |
|-----------|---------------|-----------------|---------------|----------------|
| Pattern-based (Gitleaks) | 12/12 (100%) | 6-8/10 (60-80%) | 2-4/10 (20-40%) | 20-24/32 |
| Entropy-based (TruffleHog) | 12/12 (100%) | 5-7/10 (50-70%) | 1-3/10 (10-30%) | 18-22/32 |
| LLM-based | 12/12 (100%) | 8-10/10 (80-100%) | 4-8/10 (40-80%) | 24-30/32 |

## Validation

Use the validation script to check tool performance:

```bash
python validate_ground_truth.py --tool-output results.json
```

This will calculate precision, recall, and F1 score against the ground truth.