mirror of
https://github.com/Shiva108/ai-llm-red-team-handbook.git
synced 2026-05-15 05:00:57 +02:00
366 lines
8.3 KiB
Markdown
366 lines
8.3 KiB
Markdown
# Comprehensive Testing Guide
|
|
|
|
## Quick Start
|
|
|
|
### Option 1: Run Full Comprehensive Test Suite (Recommended)
|
|
|
|
```bash
|
|
cd /home/e/Desktop/ai-llm-red-team-handbook/scripts
|
|
./test./tests/run_comprehensive_tests.sh
|
|
```
|
|
|
|
This will run ALL test phases (2-4 hours):
|
|
|
|
- ✅ Environment validation
|
|
- ✅ Static analysis & syntax checking
|
|
- ✅ Functional testing (all 386+ scripts)
|
|
- ✅ Tool integration verification
|
|
- ✅ Existing pytest suite
|
|
- ✅ Performance benchmarks
|
|
- ✅ Compliance checks (OWASP, MITRE, NIST, Ethical)
|
|
- ✅ LLM-powered validation
|
|
- ✅ Comprehensive report generation
|
|
|
|
### Option 2: Quick Test (5-10 minutes)
|
|
|
|
```bash
|
|
cd /home/e/Desktop/ai-llm-red-team-handbook/scripts
|
|
|
|
# Test specific category
|
|
python3 tests/test_orchestrator.py \
|
|
--category prompt_injection \
|
|
--test-type functional \
|
|
--verbose
|
|
|
|
# Test with LLM validation
|
|
python3 tests/test_orchestrator.py \
|
|
--category reconnaissance \
|
|
--llm-endpoint http://localhost:11434 \
|
|
--llm-validate \
|
|
--verbose
|
|
```
|
|
|
|
### Option 3: Specific Test Types
|
|
|
|
#### Functional Testing Only
|
|
|
|
```bash
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type functional \
|
|
--generate-report \
|
|
--output functional_report.json
|
|
```
|
|
|
|
#### Integration Testing Only
|
|
|
|
```bash
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type integration \
|
|
--verbose
|
|
```
|
|
|
|
#### Performance Testing Only
|
|
|
|
```bash
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type performance \
|
|
--generate-report \
|
|
--format html \
|
|
--output performance_report.html
|
|
```
|
|
|
|
#### Compliance Testing
|
|
|
|
```bash
|
|
# Test OWASP LLM Top 10 compliance
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type compliance \
|
|
--standard OWASP-LLM-TOP-10 \
|
|
--verbose
|
|
|
|
# Test MITRE ATLAS compliance
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type compliance \
|
|
--standard MITRE-ATLAS \
|
|
--verbose
|
|
|
|
# Test NIST AI RMF compliance
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type compliance \
|
|
--standard NIST-AI-RMF \
|
|
--verbose
|
|
```
|
|
|
|
## Available Test Types
|
|
|
|
| Test Type | Description | Duration |
|
|
| --------------- | -------------------------------------- | --------- |
|
|
| **functional** | Syntax, imports, help text validation | 10-15 min |
|
|
| **integration** | External tool and dependency checks | 2-3 min |
|
|
| **performance** | Response time and resource usage | 5-10 min |
|
|
| **compliance** | Standards adherence (OWASP/MITRE/NIST) | 5-10 min |
|
|
| **all** | Runs all test types | 20-30 min |
|
|
|
|
## Report Formats
|
|
|
|
### JSON Report
|
|
|
|
```bash
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type all \
|
|
--generate-report \
|
|
--format json \
|
|
--output test_report.json
|
|
```
|
|
|
|
### HTML Report (Visual Dashboard)
|
|
|
|
```bash
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type all \
|
|
--generate-report \
|
|
--format html \
|
|
--output test_report.html
|
|
|
|
# View in browser
|
|
xdg-open test_report.html
|
|
```
|
|
|
|
### Executive Summary (Text)
|
|
|
|
```bash
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--test-type all \
|
|
--generate-report \
|
|
--format summary \
|
|
--output executive_summary.txt
|
|
|
|
# View in terminal
|
|
cat executive_summary.txt
|
|
```
|
|
|
|
## Testing Individual Categories
|
|
|
|
Test scripts from specific attack categories:
|
|
|
|
```bash
|
|
# Reconnaissance scripts
|
|
python3 tests/test_orchestrator.py --category reconnaissance --verbose
|
|
|
|
# Prompt injection scripts
|
|
python3 tests/test_orchestrator.py --category prompt_injection --verbose
|
|
|
|
# Data extraction scripts
|
|
python3 tests/test_orchestrator.py --category data_extraction --verbose
|
|
|
|
# Jailbreak scripts
|
|
python3 tests/test_orchestrator.py --category jailbreak --verbose
|
|
|
|
# Plugin exploitation scripts
|
|
python3 tests/test_orchestrator.py --category plugin_exploitation --verbose
|
|
|
|
# RAG attacks scripts
|
|
python3 tests/test_orchestrator.py --category rag_attacks --verbose
|
|
|
|
# Evasion scripts
|
|
python3 tests/test_orchestrator.py --category evasion --verbose
|
|
|
|
# Model attacks scripts
|
|
python3 tests/test_orchestrator.py --category model_attacks --verbose
|
|
|
|
# Multimodal scripts
|
|
python3 tests/test_orchestrator.py --category multimodal --verbose
|
|
|
|
# Post-exploitation scripts
|
|
python3 tests/test_orchestrator.py --category post_exploitation --verbose
|
|
|
|
# Social engineering scripts
|
|
python3 tests/test_orchestrator.py --category social_engineering --verbose
|
|
|
|
# Automation scripts
|
|
python3 tests/test_orchestrator.py --category automation --verbose
|
|
|
|
# Supply chain scripts
|
|
python3 tests/test_orchestrator.py --category supply_chain --verbose
|
|
|
|
# Compliance scripts
|
|
python3 tests/test_orchestrator.py --category compliance --verbose
|
|
|
|
# Utility scripts
|
|
python3 tests/test_orchestrator.py --category utils --verbose
|
|
```
|
|
|
|
## LLM-Powered Testing
|
|
|
|
Leverage the local LLM for intelligent validation:
|
|
|
|
```bash
|
|
# Test LLM connection first
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Run with LLM validation
|
|
python3 tests/test_orchestrator.py \
|
|
--all \
|
|
--llm-endpoint http://localhost:11434 \
|
|
--llm-validate \
|
|
--verbose \
|
|
--generate-report \
|
|
--format html \
|
|
--output llm_validated_report.html
|
|
```
|
|
|
|
The LLM will:
|
|
|
|
- ✅ Analyze script purpose and implementation
|
|
- ✅ Identify potential security concerns
|
|
- ✅ Provide code quality ratings
|
|
- ✅ Suggest improvements
|
|
|
|
## Running Existing Pytest Suite
|
|
|
|
```bash
|
|
# Run all tests
|
|
pytest tests/ -v
|
|
|
|
# Run with coverage
|
|
pytest tests/ --cov=. --cov-report=html --cov-report=term
|
|
|
|
# Run specific test file
|
|
pytest tests/test_prompt_injection.py -v
|
|
|
|
# Run in parallel
|
|
pytest tests/ -n auto -v
|
|
|
|
# Run with timeout
|
|
pytest tests/ --timeout=60
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### LLM Endpoint Not Accessible
|
|
|
|
If you see warnings about LLM endpoint:
|
|
|
|
```bash
|
|
# Check if Ollama/LMStudio is running
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Start Ollama if needed
|
|
ollama serve
|
|
|
|
# Or use different endpoint
|
|
python3 tests/test_orchestrator.py \
|
|
--llm-endpoint http://localhost:8080 \
|
|
--all
|
|
```
|
|
|
|
### Import Errors
|
|
|
|
```bash
|
|
# Ensure virtual environment is activated
|
|
source venv/bin/activate
|
|
|
|
# Install dependencies
|
|
pip install -r config/requirements.txt
|
|
|
|
# Install test dependencies
|
|
pip install pytest pytest-cov pytest-timeout pytest-xdist
|
|
```
|
|
|
|
### Permission Denied
|
|
|
|
```bash
|
|
# Make scripts executable
|
|
chmod +x tests/test_orchestrator.py
|
|
chmod +x tests/run_comprehensive_tests.sh
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
For faster testing:
|
|
|
|
```bash
|
|
# Test fewer scripts per category (sampling)
|
|
python3 tests/test_orchestrator.py --all --test-type functional
|
|
|
|
# Skip LLM validation for speed
|
|
python3 tests/test_orchestrator.py --all --test-type all
|
|
|
|
# Run pytest in parallel
|
|
pytest tests/ -n auto
|
|
```
|
|
|
|
## Continuous Integration
|
|
|
|
Add to cron for weekly testing:
|
|
|
|
```bash
|
|
# Edit crontab
|
|
crontab -e
|
|
|
|
# Add line for Sunday 2 AM testing
|
|
0 2 * * 0 cd /home/e/Desktop/ai-llm-red-team-handbook/scripts && ./tests/run_comprehensive_tests.sh > /tmp/test_results.log 2>&1
|
|
```
|
|
|
|
## Expected Outputs
|
|
|
|
After running tests, you'll find in `test_reports/`:
|
|
|
|
```text
|
|
test_reports/
|
|
├── functional_tests_20260107_120000.json
|
|
├── integration_tests_20260107_120500.json
|
|
├── performance_20260107_121000.json
|
|
├── compliance_OWASP-LLM-TOP-10_20260107_121500.json
|
|
├── compliance_MITRE-ATLAS_20260107_122000.json
|
|
├── compliance_NIST-AI-RMF_20260107_122500.json
|
|
├── compliance_ETHICAL_20260107_123000.json
|
|
├── llm_validation_20260107_123500.json
|
|
├── comprehensive_report_20260107_124000.html
|
|
├── executive_summary_20260107_124000.txt
|
|
├── coverage_20260107_124000/
|
|
│ └── index.html
|
|
├── pytest_20260107_120000.log
|
|
└── syntax_check_20260107_120000.log
|
|
```
|
|
|
|
## Success Criteria
|
|
|
|
✅ **All tests should meet these criteria:**
|
|
|
|
- 100% syntax validation (all scripts compile)
|
|
- > 90% functional test pass rate
|
|
- All critical dependencies available
|
|
- Performance within acceptable ranges
|
|
- Compliance coverage >80% for each standard
|
|
- Zero critical security vulnerabilities
|
|
|
|
## Next Steps
|
|
|
|
After testing:
|
|
|
|
1. Review `executive_summary_*.txt` for high-level results
|
|
2. Open `comprehensive_report_*.html` in browser for detailed analysis
|
|
3. Check `coverage_*/index.html` for code coverage metrics
|
|
4. Address any failures found in reports
|
|
5. Re-run tests to verify fixes
|
|
|
|
---
|
|
|
|
**Total Testing Time:**
|
|
|
|
- Quick test (single category): 2-5 minutes
|
|
- Moderate test (all functional): 15-20 minutes
|
|
- Full comprehensive suite: 2-4 hours
|
|
|
|
**Recommended frequency:** Weekly or before major releases
|