mirror of https://github.com/FuzzingLabs/fuzzforge_ai.git synced 2026-05-25 17:27:49 +02:00

Files

T

History

tduhamel42 ddc6f163f7 feat(test): add automated workflow testing framework

- Add test matrix configuration (.github/test-matrix.yaml)
  - Maps 8 workflows to workers, test projects, and parameters
  - Excludes LLM and OSS-Fuzz workflows
  - Defines fast, full, and platform test suites

- Add workflow execution test script (scripts/test_workflows.py)
  - Executes workflows with parameter validation
  - Validates SARIF export and structure
  - Counts findings and measures execution time
  - Generates test summary reports

- Add platform detection unit tests (cli/tests/test_platform_detection.py)
  - Tests platform detection (x86_64, aarch64, arm64)
  - Tests Dockerfile selection for multi-platform workers
  - Tests metadata.yaml parsing
  - Includes integration tests

- Add GitHub Actions workflow (.github/workflows/test-workflows.yml)
  - Platform detection unit tests
  - Fast workflow tests (5 workflows on every PR)
  - Android platform-specific tests (AMD64 + ARM64)
  - Full workflow tests (on main/schedule)
  - Automatic log collection on failure

- Add comprehensive testing documentation (docs/docs/development/testing.md)
  - Local testing guide
  - CI/CD testing explanation
  - Platform-specific testing guide
  - Debugging guide and best practices

- Update test.yml with reference to new workflow tests

- Remove tracked .fuzzforge/findings.db (already in .gitignore)

Tested locally:
- Single workflow test: python_sast (6.87s) ✅
- Fast test suite: 5/5 workflows passed ✅
  - android_static_analysis (98.98s) ✅
  - python_sast (6.78s) ✅
  - secret_detection (38.04s) ✅
  - gitleaks_detection (1.67s) ✅
  - trufflehog_detection (1.64s) ✅

2025-10-29 14:34:31 +01:00

config

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

scripts

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

src

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

.env

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

id_rsa

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

README.md

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

validate_ground_truth.py

test: Add secret detection benchmark dataset and ground truth

2025-10-16 11:46:28 +02:00

README.md

Secret Detection Benchmark Dataset

Ground truth dataset with exactly 32 known secrets for testing secret detection tools.

12 Easy Secrets: Standard patterns (AWS keys, GitHub PATs, Stripe keys, etc.)
10 Medium Secrets: Slightly obfuscated (Base64, hex, concatenated, in comments)
10 Hard Secrets: Well hidden (ROT13, binary, XOR, reversed, template strings)

Files

├── .env                        # 2 secrets
├── config/
│   ├── settings.py            # 3 secrets
│   ├── database.yaml          # 1 secret
│   ├── app.properties         # 1 secret
│   ├── oauth.json             # 1 secret
│   ├── keys.yaml              # 2 secrets
│   └── legacy.ini             # 2 secrets
├── src/
│   ├── app.py                 # 1 secret
│   ├── Main.java              # 1 secret
│   ├── config.py              # 3 secrets (medium difficulty)
│   ├── obfuscated.py          # 4 secrets (hard difficulty)
│   ├── advanced.js            # 4 secrets (hard difficulty)
│   ├── Crypto.go              # 2 secrets (hard difficulty)
│   └── database.sql           # 1 secret
├── scripts/
│   ├── webhook.js             # 1 secret
│   └── deploy.sh              # 2 secrets
└── id_rsa                     # 1 secret

Total: 17 files with 32 secrets

Secret Difficulty Breakdown

Easy (12 secrets)

Should be detected by any decent secret scanner:

Plain AWS access keys
GitHub Personal Access Tokens
Stripe API keys
Database passwords in plain text
JWT secrets
SSH private keys
OAuth secrets
Slack webhooks

Medium (10 secrets)

Requires some parsing or contextual understanding:

Base64 encoded AWS key
Hex-encoded tokens
Split strings concatenated at runtime
URL-encoded passwords
Multi-line private keys in YAML
Secrets with Unicode characters
Secrets in SQL/shell comments
Deprecated config formats

Hard (10 secrets)

Well hidden, may challenge even advanced tools:

ROT13 encoded secrets
Binary string representations
Character array joins
Reversed strings
Template string constructs
Secrets in regex patterns
XOR encrypted values
Escaped JSON within strings
Heredoc patterns
Intentional typos corrected programmatically

Usage

Run secret detection tools against this directory and compare results to the ground truth file (located in backend/benchmarks/by_category/secret_detection/secret_detection_benchmark_GROUND_TRUTH.json) to calculate:

Precision: TP / (TP + FP) - How many detected secrets are real?
Recall: TP / (TP + FN) - How many real secrets were found?
F1 Score: 2 × (Precision × Recall) / (Precision + Recall)

Expected Performance

Tool Type	Expected Easy	Expected Medium	Expected Hard	Total Expected
Pattern-based (Gitleaks)	12/12 (100%)	6-8/10 (60-80%)	2-4/10 (20-40%)	20-24/32
Entropy-based (TruffleHog)	12/12 (100%)	5-7/10 (50-70%)	1-3/10 (10-30%)	18-22/32
LLM-based	12/12 (100%)	8-10/10 (80-100%)	4-8/10 (40-80%)	24-30/32

Validation

Use the validation script to check tool performance:

python validate_ground_truth.py --tool-output results.json

This will calculate precision, recall, and F1 score against the ground truth.

README.md Unescape Escape

Secret Detection Benchmark Dataset

Contents

Files

Secret Difficulty Breakdown

Easy (12 secrets)

Medium (10 secrets)

Hard (10 secrets)

Usage

Expected Performance

Validation

README.md