mirror of
https://github.com/dongdongunique/EvoSynth.git
synced 2026-02-12 17:22:44 +00:00
🖼️ Add EvoSynth architecture diagram (PNG format)
- Convert diagram.pdf to diagram.png for GitHub rendering - Update README to reference PNG instead of PDF - Added ActorAttack judge prompt acknowledgment - Diagram shows: Reconnaissance → Algorithm Creation → Exploitation → Coordinator workflow
This commit is contained in:
160
README.md
160
README.md
@@ -1,26 +1,93 @@
|
||||
# EvoSynth
|
||||
|
||||
**Advanced Multi-Agent Jailbreak Attack Framework**
|
||||
<div align="center">
|
||||
|
||||
EvoSynth is a modular jailbreak attack framework designed to test the safety alignments of AI models through autonomous multi-agent workflows. It provides a comprehensive toolkit for AI security research, enabling systematic evaluation of model robustness against adversarial attacks.
|
||||
**🚨 Evolutionary Synthesis of Jailbreak Attacks on LLMs 🚨**
|
||||
|
||||
[](https://arxiv.org/abs/2511.12710)
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
|
||||
**The first framework to autonomously engineer novel, executable, code-based attack algorithms**
|
||||
|
||||
[Key Features](#features) • [Installation](#installation) • [Quick Start](#quick-start) • [Paper](https://arxiv.org/abs/2511.12710)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Research Highlights
|
||||
|
||||
**EvoSynth** introduces a paradigm shift from attack planning to **evolutionary synthesis** of jailbreak methods. Unlike existing frameworks that refine known strategies, EvoSynth autonomously engineers entirely new, executable attack algorithms through multi-agent collaboration.
|
||||
|
||||
### 🏆 State-of-the-Art Performance
|
||||
|
||||
| Target Model | EvoSynth | X-Teaming | Improvement |
|
||||
|--------------|----------|-----------|-------------|
|
||||
| **Claude-Sonnet-4.5** | **85.5%** | 52.5% | +33.0% |
|
||||
| **GPT-5-Chat** | **94.5%** | 88.5% | +6.0% |
|
||||
| **GPT-4o** | **97.5%** | 96.0% | +1.5% |
|
||||
| **Llama-3.1-70B** | **98.5%** | 83.5% | +15.0% |
|
||||
| **Average ASR** | **95.9%** | 85.7% | **+10.2%** |
|
||||
|
||||
*Evaluated against 11 state-of-the-art baselines on Harmbench Standard dataset*
|
||||
|
||||
### 🔬 Key Innovation
|
||||
|
||||
**From Prompt Refinement to Method Evolution**
|
||||
|
||||
Existing frameworks select, combine, or refine known attack strategies. EvoSynth breaks this limitation by:
|
||||
|
||||
1. **Autonomously Engineering New Attacks**: Generates executable code-based algorithms, not just prompts
|
||||
2. **Code-Level Self-Correction**: Iteratively rewrites attack logic in response to failure
|
||||
3. **Evolutionary Synthesis**: Discovers novel attack mechanisms through multi-agent collaboration
|
||||
4. **Higher Diversity**: Produces more semantically diverse attacks (median diversity score: 0.82 vs 0.63)
|
||||
|
||||
### 📊 EvoSynth Architecture Overview
|
||||
|
||||
<div align="center">
|
||||
|
||||

|
||||
|
||||
*The EvoSynth multi-agent framework: From reconnaissance to evolutionary synthesis of executable attack algorithms*
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
- **Multi-Agent Architecture**: Coordinated attack system with specialized agents:
|
||||
- **Master Coordinator**: Orchestrates workflow and strategic decisions
|
||||
- **Reconnaissance Agent**: Analyzes target models and identifies vulnerabilities
|
||||
- **Tool Synthesizer**: Generates and evolves AI-powered attack tools
|
||||
- **Exploitation Agent**: Executes multi-turn attack conversations
|
||||
### 🧬 Multi-Agent Evolutionary Architecture
|
||||
|
||||
- **Modular Design**: Registry-based component system supporting:
|
||||
- Multiple model backends (OpenAI, HuggingFace, etc.)
|
||||
- Customizable attack strategies
|
||||
- Pluggable evaluation metrics
|
||||
- LLM-based judging systems
|
||||
EvoSynth employs four specialized agents working in concert:
|
||||
|
||||
- **AI-Generated Tools**: Dynamic tool creation with Python code execution and performance-based evolution
|
||||
- **🔍 Reconnaissance Agent**: Identifies vulnerabilities and formulates attack strategies
|
||||
- Analyzes target model behavior
|
||||
- Generates attack categories and concepts
|
||||
- Adapts strategy based on interaction history
|
||||
|
||||
- **⚙️ Algorithm Creation Agent**: Engineers executable attack algorithms
|
||||
- Synthesizes code-based attack methods
|
||||
- Implements evolutionary code improvement loop
|
||||
- Validates algorithm functionality and performance
|
||||
|
||||
- **🎯 Exploitation Agent**: Deploys and executes attacks
|
||||
- Selects optimal algorithms via reinforcement learning
|
||||
- Manages multi-turn conversations
|
||||
- Learns from execution feedback
|
||||
|
||||
- **🎼 Master Coordinator**: Orchestrates the entire workflow
|
||||
- Coordinates phase transitions
|
||||
- Performs failure analysis and adaptive re-tasking
|
||||
- Updates algorithm arsenal based on results
|
||||
|
||||
### 🚀 Technical Capabilities
|
||||
|
||||
- **Black-Box Evaluation**: Tests against real production APIs with full safety infrastructure
|
||||
- **Async Support**: Full asynchronous orchestration for scalable evaluations
|
||||
- **Dynamic Evolution**: Code-level self-correction with dual feedback (judge + target response)
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -29,6 +96,7 @@ git clone https://github.com/dongdongunique/EvoSynth.git
|
||||
cd EvoSynth
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Create an .env file
|
||||
```bash
|
||||
OPENAI_KEY="YOUR-KEY"
|
||||
@@ -52,6 +120,9 @@ For best results, we recommend using router platforms that provide unified acces
|
||||
- **BoyueRichData** (https://boyuerichdata.apifox.cn/) - Alternative router platform
|
||||
|
||||
Set your `base_url` parameter to the router's endpoint when initializing OpenAIModel:
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Environment Setup
|
||||
@@ -124,6 +195,8 @@ python eval_async.py \
|
||||
| `--base-url` | API base URL | From env |
|
||||
| `--dataset` | Dataset to use | `harmbench` |
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
@@ -154,6 +227,8 @@ EvoSynth/
|
||||
└── datasets/ # Data handling
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### EvosynthConfig Options
|
||||
@@ -179,19 +254,71 @@ class EvosynthConfig:
|
||||
- `start_tool_creation`: Begin from tool creation phase
|
||||
- `start_exploitation`: Begin from exploitation phase
|
||||
|
||||
---
|
||||
|
||||
## Evaluation
|
||||
|
||||
Results are evaluated using:
|
||||
|
||||
- **LLM Judge**: Scores responses on a 1-5 scale. Prompts are from ActorAttack.
|
||||
- **LLM Judge**: Scores responses on a 1-5 scale using judge prompts from [ActorAttack](https://github.com/AI45Lab/ActorAttack).
|
||||
- **Success Threshold**: Score >= 5 indicates successful jailbreak
|
||||
|
||||
### Evaluation Dataset
|
||||
|
||||
EvoSynth is evaluated on **Harmbench Standard**, a comprehensive dataset with instructions balanced across 6 risk categories:
|
||||
- Cybercrime & Unauthorized Intrusion
|
||||
- Chemical & Biological Weapons/Drugs
|
||||
- Misinformation & Disinformation
|
||||
- Harassment & Bullying
|
||||
- Illegal Activities
|
||||
- General Harm
|
||||
|
||||
### Target Models
|
||||
|
||||
Evaluated against 7 state-of-the-art LLMs:
|
||||
- GPT-5-Chat-2025-08-07
|
||||
- GPT-4o
|
||||
- Claude-Sonnet-4.5-2025-09-29
|
||||
- Llama 3.1-8B-Instruct
|
||||
- Llama 3.1-70B-Instruct
|
||||
- Qwen-Max-2025-01-25
|
||||
- Deepseek-V3.2-Exp
|
||||
|
||||
---
|
||||
|
||||
## Key Research Findings
|
||||
|
||||
### 🎓 Attack Diversity
|
||||
|
||||
EvoSynth generates significantly more diverse attacks than existing methods:
|
||||
|
||||
- **Median Diversity Score**: 0.82 (vs 0.63 for X-Teaming)
|
||||
- **Fewer Low-Diversity Pairs**: Avoids repetitive attack patterns
|
||||
- **Greater High-Diversity Coverage**: More novel attack mechanisms
|
||||
- **Stable Diversity**: Consistent novelty across generations
|
||||
|
||||
### ⚡ Learning Efficiency
|
||||
|
||||
- **90%** of successful jailbreaks occur within **6 refinement iterations**
|
||||
- **74%** succeed within first **12 total agent actions**
|
||||
- Rapid convergence demonstrates directed synthesis process
|
||||
|
||||
### 🔄 Algorithm Transferability
|
||||
|
||||
- **20%** of synthesized algorithms are "universal keys" effective on **80%+ of queries**
|
||||
- **50%** of algorithms effective on **45%+ of queries**
|
||||
- Demonstrates discovery of robust, general-purpose attack methods
|
||||
|
||||
---
|
||||
|
||||
## TODO
|
||||
|
||||
- [x] **Phase 1: Framework Development** - Core architecture and multi-agent system implementation
|
||||
- [ ] **Phase 2: SOTA LLM Testing** - Evaluating framework against recent released state-of-the-art LLMs (GPT-5.1, GEMINI 3.0 Pro, Claude 4.5 Opus etc.)
|
||||
- [ ] **Phase 3: Dataset Curation** - Filtering and curating generated attack results to create a new research dataset
|
||||
|
||||
---
|
||||
|
||||
## Ethical Disclaimer
|
||||
|
||||
**For Defensive Security Research Only.**
|
||||
@@ -205,6 +332,7 @@ EvoSynth is designed to aid AI researchers and developers in identifying safety
|
||||
|
||||
Do not use this framework for malicious purposes or against systems without explicit permission.
|
||||
|
||||
---
|
||||
|
||||
## Citation
|
||||
|
||||
@@ -219,6 +347,8 @@ If you use EvoSynth in your research, please cite:
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome!
|
||||
Contributions are welcome! Please feel free to submit issues and enhancement requests.
|
||||
|
||||
BIN
asset/diagram.png
Normal file
BIN
asset/diagram.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 308 KiB |
Reference in New Issue
Block a user