mirror of
https://github.com/Shiva108/ai-llm-red-team-handbook.git
synced 2026-02-12 14:42:46 +00:00
8.3 KiB
8.3 KiB
Comprehensive Testing Guide
Quick Start
Option 1: Run Full Comprehensive Test Suite (Recommended)
cd /home/e/Desktop/ai-llm-red-team-handbook/scripts
./test./tests/run_comprehensive_tests.sh
This will run ALL test phases (2-4 hours):
- ✅ Environment validation
- ✅ Static analysis & syntax checking
- ✅ Functional testing (all 386+ scripts)
- ✅ Tool integration verification
- ✅ Existing pytest suite
- ✅ Performance benchmarks
- ✅ Compliance checks (OWASP, MITRE, NIST, Ethical)
- ✅ LLM-powered validation
- ✅ Comprehensive report generation
Option 2: Quick Test (5-10 minutes)
cd /home/e/Desktop/ai-llm-red-team-handbook/scripts
# Test specific category
python3 tests/test_orchestrator.py \
--category prompt_injection \
--test-type functional \
--verbose
# Test with LLM validation
python3 tests/test_orchestrator.py \
--category reconnaissance \
--llm-endpoint http://localhost:11434 \
--llm-validate \
--verbose
Option 3: Specific Test Types
Functional Testing Only
python3 tests/test_orchestrator.py \
--all \
--test-type functional \
--generate-report \
--output functional_report.json
Integration Testing Only
python3 tests/test_orchestrator.py \
--all \
--test-type integration \
--verbose
Performance Testing Only
python3 tests/test_orchestrator.py \
--all \
--test-type performance \
--generate-report \
--format html \
--output performance_report.html
Compliance Testing
# Test OWASP LLM Top 10 compliance
python3 tests/test_orchestrator.py \
--all \
--test-type compliance \
--standard OWASP-LLM-TOP-10 \
--verbose
# Test MITRE ATLAS compliance
python3 tests/test_orchestrator.py \
--all \
--test-type compliance \
--standard MITRE-ATLAS \
--verbose
# Test NIST AI RMF compliance
python3 tests/test_orchestrator.py \
--all \
--test-type compliance \
--standard NIST-AI-RMF \
--verbose
Available Test Types
| Test Type | Description | Duration |
|---|---|---|
| functional | Syntax, imports, help text validation | 10-15 min |
| integration | External tool and dependency checks | 2-3 min |
| performance | Response time and resource usage | 5-10 min |
| compliance | Standards adherence (OWASP/MITRE/NIST) | 5-10 min |
| all | Runs all test types | 20-30 min |
Report Formats
JSON Report
python3 tests/test_orchestrator.py \
--all \
--test-type all \
--generate-report \
--format json \
--output test_report.json
HTML Report (Visual Dashboard)
python3 tests/test_orchestrator.py \
--all \
--test-type all \
--generate-report \
--format html \
--output test_report.html
# View in browser
xdg-open test_report.html
Executive Summary (Text)
python3 tests/test_orchestrator.py \
--all \
--test-type all \
--generate-report \
--format summary \
--output executive_summary.txt
# View in terminal
cat executive_summary.txt
Testing Individual Categories
Test scripts from specific attack categories:
# Reconnaissance scripts
python3 tests/test_orchestrator.py --category reconnaissance --verbose
# Prompt injection scripts
python3 tests/test_orchestrator.py --category prompt_injection --verbose
# Data extraction scripts
python3 tests/test_orchestrator.py --category data_extraction --verbose
# Jailbreak scripts
python3 tests/test_orchestrator.py --category jailbreak --verbose
# Plugin exploitation scripts
python3 tests/test_orchestrator.py --category plugin_exploitation --verbose
# RAG attacks scripts
python3 tests/test_orchestrator.py --category rag_attacks --verbose
# Evasion scripts
python3 tests/test_orchestrator.py --category evasion --verbose
# Model attacks scripts
python3 tests/test_orchestrator.py --category model_attacks --verbose
# Multimodal scripts
python3 tests/test_orchestrator.py --category multimodal --verbose
# Post-exploitation scripts
python3 tests/test_orchestrator.py --category post_exploitation --verbose
# Social engineering scripts
python3 tests/test_orchestrator.py --category social_engineering --verbose
# Automation scripts
python3 tests/test_orchestrator.py --category automation --verbose
# Supply chain scripts
python3 tests/test_orchestrator.py --category supply_chain --verbose
# Compliance scripts
python3 tests/test_orchestrator.py --category compliance --verbose
# Utility scripts
python3 tests/test_orchestrator.py --category utils --verbose
LLM-Powered Testing
Leverage the local LLM for intelligent validation:
# Test LLM connection first
curl http://localhost:11434/api/tags
# Run with LLM validation
python3 tests/test_orchestrator.py \
--all \
--llm-endpoint http://localhost:11434 \
--llm-validate \
--verbose \
--generate-report \
--format html \
--output llm_validated_report.html
The LLM will:
- ✅ Analyze script purpose and implementation
- ✅ Identify potential security concerns
- ✅ Provide code quality ratings
- ✅ Suggest improvements
Running Existing Pytest Suite
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=. --cov-report=html --cov-report=term
# Run specific test file
pytest tests/test_prompt_injection.py -v
# Run in parallel
pytest tests/ -n auto -v
# Run with timeout
pytest tests/ --timeout=60
Troubleshooting
LLM Endpoint Not Accessible
If you see warnings about LLM endpoint:
# Check if Ollama/LMStudio is running
curl http://localhost:11434/api/tags
# Start Ollama if needed
ollama serve
# Or use different endpoint
python3 tests/test_orchestrator.py \
--llm-endpoint http://localhost:8080 \
--all
Import Errors
# Ensure virtual environment is activated
source venv/bin/activate
# Install dependencies
pip install -r config/requirements.txt
# Install test dependencies
pip install pytest pytest-cov pytest-timeout pytest-xdist
Permission Denied
# Make scripts executable
chmod +x tests/test_orchestrator.py
chmod +x tests/run_comprehensive_tests.sh
Performance Optimization
For faster testing:
# Test fewer scripts per category (sampling)
python3 tests/test_orchestrator.py --all --test-type functional
# Skip LLM validation for speed
python3 tests/test_orchestrator.py --all --test-type all
# Run pytest in parallel
pytest tests/ -n auto
Continuous Integration
Add to cron for weekly testing:
# Edit crontab
crontab -e
# Add line for Sunday 2 AM testing
0 2 * * 0 cd /home/e/Desktop/ai-llm-red-team-handbook/scripts && ./tests/run_comprehensive_tests.sh > /tmp/test_results.log 2>&1
Expected Outputs
After running tests, you'll find in test_reports/:
test_reports/
├── functional_tests_20260107_120000.json
├── integration_tests_20260107_120500.json
├── performance_20260107_121000.json
├── compliance_OWASP-LLM-TOP-10_20260107_121500.json
├── compliance_MITRE-ATLAS_20260107_122000.json
├── compliance_NIST-AI-RMF_20260107_122500.json
├── compliance_ETHICAL_20260107_123000.json
├── llm_validation_20260107_123500.json
├── comprehensive_report_20260107_124000.html
├── executive_summary_20260107_124000.txt
├── coverage_20260107_124000/
│ └── index.html
├── pytest_20260107_120000.log
└── syntax_check_20260107_120000.log
Success Criteria
✅ All tests should meet these criteria:
- 100% syntax validation (all scripts compile)
-
90% functional test pass rate
- All critical dependencies available
- Performance within acceptable ranges
- Compliance coverage >80% for each standard
- Zero critical security vulnerabilities
Next Steps
After testing:
- Review
executive_summary_*.txtfor high-level results - Open
comprehensive_report_*.htmlin browser for detailed analysis - Check
coverage_*/index.htmlfor code coverage metrics - Address any failures found in reports
- Re-run tests to verify fixes
Total Testing Time:
- Quick test (single category): 2-5 minutes
- Moderate test (all functional): 15-20 minutes
- Full comprehensive suite: 2-4 hours
Recommended frequency: Weekly or before major releases