mirror of https://github.com/Shiva108/ai-llm-red-team-handbook.git synced 2026-05-14 04:42:13 +02:00

Files

T

shiva108 114d1abb32 docs: Standardize documentation headings from bolded text to markdown syntax for improved structure and readability.

2025-12-22 15:24:51 +01:00

13 KiB

Raw Blame History

AI/LLM Red Team Field Manual - Index

Mission: Get junior penetration testers operational in 15 minutes with standalone, actionable attack playbooks.

🚀 Quick Start

New to LLM Red Teaming? Start here

Setup (15 min): Complete environment setup
First Test (5 min): Run your first prompt injection test
Choose Attack: Pick a playbook below based on your target

📚 Attack Playbooks

Each playbook is completely self-contained with:

✅ Step-by-step procedures (no theory)
✅ Copy-paste attack code
✅ Success indicators ("You'll see X")
✅ Troubleshooting guide
✅ Tool commands

Core Attack Playbooks

#	Playbook	Use When	Difficulty
01	Prompt Injection	Testing any LLM chat/completion API	⭐ Beginner
02	Data Leakage & Extraction	Target has training data you want to extract	⭐⭐ Intermediate
03	Jailbreaks & Bypass	Need to bypass content filters/safety	⭐ Beginner
04	Plugin & API Exploitation	Target uses plugins/function calling	⭐⭐⭐ Advanced
05	Evasion & Obfuscation	Bypassing input filters/WAFs	⭐⭐ Intermediate
06	Data Poisoning	Can inject training data or RAG docs	⭐⭐⭐ Advanced
07	Model Theft & Inference	Want to extract/steal the model	⭐⭐⭐ Advanced
08	DoS & Resource Exhaustion	Testing availability/cost inflation	⭐⭐ Intermediate
09	Multimodal Attacks	Target uses vision/audio/multimodal	⭐⭐ Intermediate
10	Persistence & Chaining	Need multi-turn/persistent compromise	⭐⭐⭐ Advanced
11	Social Engineering	AI-powered phishing/impersonation	⭐⭐ Intermediate

Reference Materials

📋 Quick Reference Card - One-page cheat sheet
🛠️ Tool Installation & Setup - Detailed setup guide
🔍 Troubleshooting Guide - Common issues & fixes
📊 Attack Decision Tree - Which attack to use when

15-Minute Setup Guide

Prerequisites Checklist

Before starting, ensure you have:

Written authorization (RoE/SOW) to test the target
Python 3.8+ installed (python3 --version)
Internet access for tool downloads
API credentials (OpenAI, Anthropic, or local model)
Terminal/command line access

Step 1: Create Testing Environment

# Create project directory
mkdir ~/llm-redteam
cd ~/llm-redteam

# Create Python virtual environment
python3 -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate  # Windows

# Create directory structure
mkdir -p {logs,evidence,configs,playbooks}

Step 2: Install Core Tools

# Upgrade pip
pip install --upgrade pip

# Install essential tools
pip install spikee requests python-dotenv

# Verify installation
spikee --version

Expected output

Step 3: Configure API Access

# Create API configuration
cat > configs/.env << 'EOF'
# OpenAI Configuration
OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-3.5-turbo

# Logging
LOG_DIR=../logs
EVIDENCE_DIR=../evidence
EOF

# Secure the file
chmod 600 configs/.env

Get API Keys

OpenAI: https://platform.openai.com/api-keys
Anthropic: https://console.anthropic.com/
Local (Ollama): curl https://ollama.ai/install.sh | sh

Step 4: Verify Setup

# Initialize spikee workspace
spikee init

# Test with a basic prompt injection dataset
spikee generate --seed-folder workspace/datasets/seeds-cybersec-2025-04 --format full-prompt

# Test against OpenAI (configure target with your API key)
# Expected: ✓ Dataset generated, ✓ Ready for testing

✅ Setup complete! You're ready to use the playbooks.

Your First Test: Prompt Injection

Goal: Test if you can override the system's instructions.

5-Minute Quick Test

# Navigate to your testing directory
cd ~/llm-redteam

# Initialize spikee workspace
spikee init

# Generate prompt injection dataset
spikee generate --seed-folder datasets/seeds-cybersec-2025-04 --format full-prompt

# Test against your target (using OpenAI as example)
spikee test --target openai_api --dataset datasets/cybersec-2025-04-full-prompt-dataset-*.jsonl

# Check results
ls results/

What to look for

✅ Pass rate: % of successful injections
⚠️ Vulnerabilities: Specific bypasses found
📊 Report: first_test_report.html

Next steps

If injection worked → Go to Playbook 01 for advanced techniques
If blocked → Try Playbook 03: Jailbreaks
To extract data → Use Playbook 02

Attack Decision Tree

Use this to decide which playbook to use

START: What's your target?
│
├─ Chat/Completion API?
│  ├─ Want to bypass filters? → Playbook 03 (Jailbreaks)
│  ├─ Want to inject instructions? → Playbook 01 (Prompt Injection)
│  └─ Want to extract training data? → Playbook 02 (Data Leakage)
│
├─ Has Plugins/Tools?
│  └─ → Playbook 04 (Plugin Exploitation)
│
├─ Multimodal (images/audio)?
│  └─ → Playbook 09 (Multimodal Attacks)
│
├─ Can inject training data?
│  └─ → Playbook 06 (Data Poisoning)
│
├─ Want to steal the model?
│  └─ → Playbook 07 (Model Theft)
│
├─ Test availability/costs?
│  └─ → Playbook 08 (DoS)
│
├─ Need persistent access?
│  └─ → Playbook 10 (Persistence)
│
└─ AI-powered social engineering?
   └─ → Playbook 11 (Social Engineering)

Detailed Tool Setup

Option 1: Docker (Recommended for Consistency)

# Create Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.10-slim
WORKDIR /workspace
RUN apt-get update && apt-get install -y git curl
RUN pip install spikee requests python-dotenv textattack
CMD ["/bin/bash"]
EOF

# Build and run
docker build -t llm-redteam .
docker run -it -v $(pwd):/workspace llm-redteam

Option 2: Native Installation

See Step 2: Install Core Tools above.

Additional Tools (Optional)

# For advanced attacks
pip install textattack transformers torch

# For web/API testing
pip install selenium playwright

# For reporting
pip install jinja2 markdown2

Common Issues & Fixes

Issue	Solution
❌ `Authentication Error`	Check API key in `.env`, verify key is active
❌ `Rate Limit Exceeded`	Add `--delay 2` to commands, check API quotas
❌ `ModuleNotFoundError`	Activate venv: `source venv/bin/activate`
❌ `Command not found: spikee`	Install: `pip install spikee`, check venv active
❌ No output files	Verify `--report-prefix` path exists, check permissions
❌ Slow responses	Normal for API testing, use `--runs` to limit tests
❌ Connection timeout	Check internet connection, verify API endpoint

Still stuck? Check the troubleshooting section in the specific playbook you're using.

📖 How to Use These Playbooks

Structure of Each Playbook

Every playbook follows the same format:

What & When: What this attack is, when to use it
Prerequisites: What you need before starting
Step-by-Step Procedure: Numbered steps, exact commands
Code Examples: Copy-paste ready attack scripts
Success Indicators: How to know if it worked
Troubleshooting: Common problems & fixes
Next Steps: What to do after finding vulnerabilities

Reading the Playbooks

Numbered steps (1, 2, 3...): Execute in order
Code blocks: Copy-paste into terminal
"Expected output:": What you should see
"✓ Success": Attack worked
"✗ Failed": Try troubleshooting section

Best Practices

✅ DO:

Read the entire playbook before starting
Copy-paste code examples exactly
Document every finding with screenshots
Follow cleanup procedures
Report critical findings immediately

❌ DON'T:

Skip prerequisite checks
Modify code without understanding it
Test without authorization
Ignore rate limits (you'll get blocked)
Delete evidence/logs before reporting

📋 Engagement Workflow

Complete workflow for a red team engagement

Phase 1: Pre-Engagement

✅ Verify authorization (signed RoE/SOW)
✅ Complete setup (this page)
✅ Identify target systems/APIs
✅ Choose relevant playbooks

Phase 2: Reconnaissance

Map LLM endpoints
Identify plugins/integrations
Document baseline behavior
Choose attack sequence

Phase 3: Execution

Start with low-impact tests
Follow playbook procedures
Document all findings
Escalate critical issues

Phase 4: Reporting

Compile evidence
Write technical report
Create executive summary
Present findings

Phase 5: Cleanup

Revoke test API keys
Delete test accounts
Remove injected data
Secure/encrypt evidence

🚨 Legal & Ethical Reminders

CRITICAL: Only test systems you have written authorization to test.

Illegal without authorization

❌ Testing production systems without permission
❌ Accessing other users' data
❌ Causing service disruption
❌ Extracting proprietary models

Always

✅ Get signed RoE before testing
✅ Stay within agreed scope
✅ Report critical findings immediately
✅ Follow responsible disclosure

Laws that apply

Computer Fraud and Abuse Act (CFAA)
GDPR (for EU data)
SOC 2 compliance requirements
Industry-specific regulations

📞 Support & Resources

Need help?

Check playbook troubleshooting sections
Review common issues
Escalate to senior team member

Additional resources

OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/
MITRE ATLAS: https://atlas.mitre.org/
AI Red Team Community: [Your org's Slack/Teams channel]

📝 Quick Reference

Most common commands

# Initialize workspace (one-time setup)
spikee init

# Prompt injection test
spikee generate --seed-folder workspace/datasets/seeds-cybersec-2025-04 --format full-prompt
spikee test --target openai_api --dataset datasets/cybersec-2025-04-full-prompt-dataset-*.jsonl

# Jailbreak test
spikee generate --seed-folder datasets/seeds-simsonsun-high-quality-jailbreaks --include-standalone-inputs
spikee test --target openai_api --dataset datasets/simsonsun-high-quality-jailbreaks-*.jsonl

# Data extraction test (using custom seeds)
spikee generate --seed-folder datasets/seeds-data-extraction --format full-prompt
spikee test --target openai_api --dataset datasets/data-extraction-*.jsonl

# View results
ls results/

File structure

~/llm-redteam/
├── venv/               (Python environment)
├── configs/.env        (API credentials)
├── logs/               (Test execution logs)
├── evidence/           (Screenshots, outputs)
└── playbooks/          (Downloaded playbooks)

Ready to start testing? → Pick a playbook from the Attack Playbooks section above!

Last Updated: December 2025
Version: 2.0 (Modular)
Handbook Chapters: Based on Chapters 14-24

13 KiB Raw Blame History