mirror of https://github.com/Shiva108/ai-llm-red-team-handbook.git synced 2026-02-12 14:42:46 +00:00

Files

shiva108 92f16a543e refactor: Reorganize scripts into dedicated subdirectories for examples, tools, config, and docs.

2026-01-07 12:14:03 +01:00

8.3 KiB

Raw Permalink Blame History

Comprehensive Testing Guide

Quick Start

Option 1: Run Full Comprehensive Test Suite (Recommended)

cd /home/e/Desktop/ai-llm-red-team-handbook/scripts
./test./tests/run_comprehensive_tests.sh

This will run ALL test phases (2-4 hours):

✅ Environment validation
✅ Static analysis & syntax checking
✅ Functional testing (all 386+ scripts)
✅ Tool integration verification
✅ Existing pytest suite
✅ Performance benchmarks
✅ Compliance checks (OWASP, MITRE, NIST, Ethical)
✅ LLM-powered validation
✅ Comprehensive report generation

Option 2: Quick Test (5-10 minutes)

cd /home/e/Desktop/ai-llm-red-team-handbook/scripts

# Test specific category
python3 tests/test_orchestrator.py \
    --category prompt_injection \
    --test-type functional \
    --verbose

# Test with LLM validation
python3 tests/test_orchestrator.py \
    --category reconnaissance \
    --llm-endpoint http://localhost:11434 \
    --llm-validate \
    --verbose

Option 3: Specific Test Types

Functional Testing Only

python3 tests/test_orchestrator.py \
    --all \
    --test-type functional \
    --generate-report \
    --output functional_report.json

Integration Testing Only

python3 tests/test_orchestrator.py \
    --all \
    --test-type integration \
    --verbose

Performance Testing Only

python3 tests/test_orchestrator.py \
    --all \
    --test-type performance \
    --generate-report \
    --format html \
    --output performance_report.html

Compliance Testing

# Test OWASP LLM Top 10 compliance
python3 tests/test_orchestrator.py \
    --all \
    --test-type compliance \
    --standard OWASP-LLM-TOP-10 \
    --verbose

# Test MITRE ATLAS compliance
python3 tests/test_orchestrator.py \
    --all \
    --test-type compliance \
    --standard MITRE-ATLAS \
    --verbose

# Test NIST AI RMF compliance
python3 tests/test_orchestrator.py \
    --all \
    --test-type compliance \
    --standard NIST-AI-RMF \
    --verbose

Available Test Types

Test Type	Description	Duration
functional	Syntax, imports, help text validation	10-15 min
integration	External tool and dependency checks	2-3 min
performance	Response time and resource usage	5-10 min
compliance	Standards adherence (OWASP/MITRE/NIST)	5-10 min
all	Runs all test types	20-30 min

Report Formats

JSON Report

python3 tests/test_orchestrator.py \
    --all \
    --test-type all \
    --generate-report \
    --format json \
    --output test_report.json

HTML Report (Visual Dashboard)

python3 tests/test_orchestrator.py \
    --all \
    --test-type all \
    --generate-report \
    --format html \
    --output test_report.html

# View in browser
xdg-open test_report.html

Executive Summary (Text)

python3 tests/test_orchestrator.py \
    --all \
    --test-type all \
    --generate-report \
    --format summary \
    --output executive_summary.txt

# View in terminal
cat executive_summary.txt

Testing Individual Categories

Test scripts from specific attack categories:

# Reconnaissance scripts
python3 tests/test_orchestrator.py --category reconnaissance --verbose

# Prompt injection scripts
python3 tests/test_orchestrator.py --category prompt_injection --verbose

# Data extraction scripts
python3 tests/test_orchestrator.py --category data_extraction --verbose

# Jailbreak scripts
python3 tests/test_orchestrator.py --category jailbreak --verbose

# Plugin exploitation scripts
python3 tests/test_orchestrator.py --category plugin_exploitation --verbose

# RAG attacks scripts
python3 tests/test_orchestrator.py --category rag_attacks --verbose

# Evasion scripts
python3 tests/test_orchestrator.py --category evasion --verbose

# Model attacks scripts
python3 tests/test_orchestrator.py --category model_attacks --verbose

# Multimodal scripts
python3 tests/test_orchestrator.py --category multimodal --verbose

# Post-exploitation scripts
python3 tests/test_orchestrator.py --category post_exploitation --verbose

# Social engineering scripts
python3 tests/test_orchestrator.py --category social_engineering --verbose

# Automation scripts
python3 tests/test_orchestrator.py --category automation --verbose

# Supply chain scripts
python3 tests/test_orchestrator.py --category supply_chain --verbose

# Compliance scripts
python3 tests/test_orchestrator.py --category compliance --verbose

# Utility scripts
python3 tests/test_orchestrator.py --category utils --verbose

LLM-Powered Testing

Leverage the local LLM for intelligent validation:

# Test LLM connection first
curl http://localhost:11434/api/tags

# Run with LLM validation
python3 tests/test_orchestrator.py \
    --all \
    --llm-endpoint http://localhost:11434 \
    --llm-validate \
    --verbose \
    --generate-report \
    --format html \
    --output llm_validated_report.html

The LLM will:

✅ Analyze script purpose and implementation
✅ Identify potential security concerns
✅ Provide code quality ratings
✅ Suggest improvements

Running Existing Pytest Suite

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=. --cov-report=html --cov-report=term

# Run specific test file
pytest tests/test_prompt_injection.py -v

# Run in parallel
pytest tests/ -n auto -v

# Run with timeout
pytest tests/ --timeout=60

Troubleshooting

LLM Endpoint Not Accessible

If you see warnings about LLM endpoint:

# Check if Ollama/LMStudio is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

# Or use different endpoint
python3 tests/test_orchestrator.py \
    --llm-endpoint http://localhost:8080 \
    --all

Import Errors

# Ensure virtual environment is activated
source venv/bin/activate

# Install dependencies
pip install -r config/requirements.txt

# Install test dependencies
pip install pytest pytest-cov pytest-timeout pytest-xdist

Permission Denied

# Make scripts executable
chmod +x tests/test_orchestrator.py
chmod +x tests/run_comprehensive_tests.sh

Performance Optimization

For faster testing:

# Test fewer scripts per category (sampling)
python3 tests/test_orchestrator.py --all --test-type functional

# Skip LLM validation for speed
python3 tests/test_orchestrator.py --all --test-type all

# Run pytest in parallel
pytest tests/ -n auto

Continuous Integration

Add to cron for weekly testing:

# Edit crontab
crontab -e

# Add line for Sunday 2 AM testing
0 2 * * 0 cd /home/e/Desktop/ai-llm-red-team-handbook/scripts && ./tests/run_comprehensive_tests.sh > /tmp/test_results.log 2>&1

Expected Outputs

After running tests, you'll find in test_reports/:

test_reports/
├── functional_tests_20260107_120000.json
├── integration_tests_20260107_120500.json
├── performance_20260107_121000.json
├── compliance_OWASP-LLM-TOP-10_20260107_121500.json
├── compliance_MITRE-ATLAS_20260107_122000.json
├── compliance_NIST-AI-RMF_20260107_122500.json
├── compliance_ETHICAL_20260107_123000.json
├── llm_validation_20260107_123500.json
├── comprehensive_report_20260107_124000.html
├── executive_summary_20260107_124000.txt
├── coverage_20260107_124000/
│   └── index.html
├── pytest_20260107_120000.log
└── syntax_check_20260107_120000.log

Success Criteria

✅ All tests should meet these criteria:

100% syntax validation (all scripts compile)
90% functional test pass rate
All critical dependencies available
Performance within acceptable ranges
Compliance coverage >80% for each standard
Zero critical security vulnerabilities

Next Steps

After testing:

Review executive_summary_*.txt for high-level results
Open comprehensive_report_*.html in browser for detailed analysis
Check coverage_*/index.html for code coverage metrics
Address any failures found in reports
Re-run tests to verify fixes

Total Testing Time:

Quick test (single category): 2-5 minutes
Moderate test (all functional): 15-20 minutes
Full comprehensive suite: 2-4 hours

Recommended frequency: Weekly or before major releases

8.3 KiB Raw Permalink Blame History