🖼️ Add EvoSynth architecture diagram (PNG format)

- Convert diagram.pdf to diagram.png for GitHub rendering - Update README to reference PNG instead of PDF - Added ActorAttack judge prompt acknowledgment - Diagram shows: Reconnaissance → Algorithm Creation → Exploitation → Coordinator workflow
2026-02-12 17:22:44 +00:00 · 2025-12-10 15:03:27 +08:00
parent 3a5dbdeed2
commit 3f12231c26
2 changed files with 145 additions and 15 deletions
--- a/README.md
+++ b/README.md
@@ -1,26 +1,93 @@
 # EvoSynth

-**Advanced Multi-Agent Jailbreak Attack Framework**
+<div align="center">

-EvoSynth is a modular jailbreak attack framework designed to test the safety alignments of AI models through autonomous multi-agent workflows. It provides a comprehensive toolkit for AI security research, enabling systematic evaluation of model robustness against adversarial attacks.
+**🚨 Evolutionary Synthesis of Jailbreak Attacks on LLMs 🚨**
+
+[![arXiv](https://img.shields.io/badge/arXiv-2511.12710-red)](https://arxiv.org/abs/2511.12710)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
+**The first framework to autonomously engineer novel, executable, code-based attack algorithms**
+
+[Key Features](#features) • [Installation](#installation) • [Quick Start](#quick-start) • [Paper](https://arxiv.org/abs/2511.12710)
+
+</div>
+
+---
+
+## 🎯 Research Highlights
+
+**EvoSynth** introduces a paradigm shift from attack planning to **evolutionary synthesis** of jailbreak methods. Unlike existing frameworks that refine known strategies, EvoSynth autonomously engineers entirely new, executable attack algorithms through multi-agent collaboration.
+
+### 🏆 State-of-the-Art Performance
+
+| Target Model | EvoSynth | X-Teaming | Improvement |
+|--------------|----------|-----------|-------------|
+| **Claude-Sonnet-4.5** | **85.5%** | 52.5% | +33.0% |
+| **GPT-5-Chat** | **94.5%** | 88.5% | +6.0% |
+| **GPT-4o** | **97.5%** | 96.0% | +1.5% |
+| **Llama-3.1-70B** | **98.5%** | 83.5% | +15.0% |
+| **Average ASR** | **95.9%** | 85.7% | **+10.2%** |
+
+*Evaluated against 11 state-of-the-art baselines on Harmbench Standard dataset*
+
+### 🔬 Key Innovation
+
+**From Prompt Refinement to Method Evolution**
+
+Existing frameworks select, combine, or refine known attack strategies. EvoSynth breaks this limitation by:
+
+1. **Autonomously Engineering New Attacks**: Generates executable code-based algorithms, not just prompts
+2. **Code-Level Self-Correction**: Iteratively rewrites attack logic in response to failure
+3. **Evolutionary Synthesis**: Discovers novel attack mechanisms through multi-agent collaboration
+4. **Higher Diversity**: Produces more semantically diverse attacks (median diversity score: 0.82 vs 0.63)
+
+### 📊 EvoSynth Architecture Overview
+
+<div align="center">
+
+![EvoSynth Architecture](asset/diagram.png)
+
+*The EvoSynth multi-agent framework: From reconnaissance to evolutionary synthesis of executable attack algorithms*
+
+</div>
+
+---

 ## Features

- **Multi-Agent Architecture**: Coordinated attack system with specialized agents:
-  - **Master Coordinator**: Orchestrates workflow and strategic decisions
-  - **Reconnaissance Agent**: Analyzes target models and identifies vulnerabilities
-  - **Tool Synthesizer**: Generates and evolves AI-powered attack tools
-  - **Exploitation Agent**: Executes multi-turn attack conversations
+### 🧬 Multi-Agent Evolutionary Architecture

- **Modular Design**: Registry-based component system supporting:
-  - Multiple model backends (OpenAI, HuggingFace, etc.)
-  - Customizable attack strategies
-  - Pluggable evaluation metrics
-  - LLM-based judging systems
+EvoSynth employs four specialized agents working in concert:

- **AI-Generated Tools**: Dynamic tool creation with Python code execution and performance-based evolution
+- **🔍 Reconnaissance Agent**: Identifies vulnerabilities and formulates attack strategies
+  - Analyzes target model behavior
+  - Generates attack categories and concepts
+  - Adapts strategy based on interaction history

+- **⚙️ Algorithm Creation Agent**: Engineers executable attack algorithms
+  - Synthesizes code-based attack methods
+  - Implements evolutionary code improvement loop
+  - Validates algorithm functionality and performance
+
+- **🎯 Exploitation Agent**: Deploys and executes attacks
+  - Selects optimal algorithms via reinforcement learning
+  - Manages multi-turn conversations
+  - Learns from execution feedback
+
+- **🎼 Master Coordinator**: Orchestrates the entire workflow
+  - Coordinates phase transitions
+  - Performs failure analysis and adaptive re-tasking
+  - Updates algorithm arsenal based on results
+
+### 🚀 Technical Capabilities
+
+- **Black-Box Evaluation**: Tests against real production APIs with full safety infrastructure
 - **Async Support**: Full asynchronous orchestration for scalable evaluations
+- **Dynamic Evolution**: Code-level self-correction with dual feedback (judge + target response)
+
+---

 ## Installation

@@ -29,6 +96,7 @@ git clone https://github.com/dongdongunique/EvoSynth.git
 cd EvoSynth
 pip install -r requirements.txt
 ```
+
 ## Create an .env file
 ```bash
 OPENAI_KEY="YOUR-KEY"
@@ -52,6 +120,9 @@ For best results, we recommend using router platforms that provide unified acces
 - **BoyueRichData** (https://boyuerichdata.apifox.cn/) - Alternative router platform

 Set your `base_url` parameter to the router's endpoint when initializing OpenAIModel:
+
+---
+
 ## Quick Start

 ### Environment Setup
@@ -124,6 +195,8 @@ python eval_async.py \
 | `--base-url` | API base URL | From env |
 | `--dataset` | Dataset to use | `harmbench` |

+---
+
 ## Project Structure

 ```
@@ -154,6 +227,8 @@ EvoSynth/
    └── datasets/                # Data handling
 ```

+---
+
 ## Configuration

 ### EvosynthConfig Options
@@ -179,19 +254,71 @@ class EvosynthConfig:
 - `start_tool_creation`: Begin from tool creation phase
 - `start_exploitation`: Begin from exploitation phase

+---
+
 ## Evaluation

 Results are evaluated using:

- **LLM Judge**: Scores responses on a 1-5 scale. Prompts are from ActorAttack.
+- **LLM Judge**: Scores responses on a 1-5 scale using judge prompts from [ActorAttack](https://github.com/AI45Lab/ActorAttack).
 - **Success Threshold**: Score >= 5 indicates successful jailbreak

+### Evaluation Dataset
+
+EvoSynth is evaluated on **Harmbench Standard**, a comprehensive dataset with instructions balanced across 6 risk categories:
+- Cybercrime & Unauthorized Intrusion
+- Chemical & Biological Weapons/Drugs
+- Misinformation & Disinformation
+- Harassment & Bullying
+- Illegal Activities
+- General Harm
+
+### Target Models
+
+Evaluated against 7 state-of-the-art LLMs:
+- GPT-5-Chat-2025-08-07
+- GPT-4o
+- Claude-Sonnet-4.5-2025-09-29
+- Llama 3.1-8B-Instruct
+- Llama 3.1-70B-Instruct
+- Qwen-Max-2025-01-25
+- Deepseek-V3.2-Exp
+
+---
+
+## Key Research Findings
+
+### 🎓 Attack Diversity
+
+EvoSynth generates significantly more diverse attacks than existing methods:
+
+- **Median Diversity Score**: 0.82 (vs 0.63 for X-Teaming)
+- **Fewer Low-Diversity Pairs**: Avoids repetitive attack patterns
+- **Greater High-Diversity Coverage**: More novel attack mechanisms
+- **Stable Diversity**: Consistent novelty across generations
+
+### ⚡ Learning Efficiency
+
+- **90%** of successful jailbreaks occur within **6 refinement iterations**
+- **74%** succeed within first **12 total agent actions**
+- Rapid convergence demonstrates directed synthesis process
+
+### 🔄 Algorithm Transferability
+
+- **20%** of synthesized algorithms are "universal keys" effective on **80%+ of queries**
+- **50%** of algorithms effective on **45%+ of queries**
+- Demonstrates discovery of robust, general-purpose attack methods
+
+---
+
 ## TODO

 - [x] **Phase 1: Framework Development** - Core architecture and multi-agent system implementation
 - [ ] **Phase 2: SOTA LLM Testing** - Evaluating framework against recent released state-of-the-art LLMs (GPT-5.1, GEMINI 3.0 Pro, Claude 4.5 Opus etc.)
 - [ ] **Phase 3: Dataset Curation** - Filtering and curating generated attack results to create a new research dataset

+---
+
 ## Ethical Disclaimer

 **For Defensive Security Research Only.**
@@ -205,6 +332,7 @@ EvoSynth is designed to aid AI researchers and developers in identifying safety

 Do not use this framework for malicious purposes or against systems without explicit permission.

+---

 ## Citation

@@ -219,6 +347,8 @@ If you use EvoSynth in your research, please cite:
 }
 ```

+---
+
 ## Contributing

-Contributions are welcome! 
+Contributions are welcome! Please feel free to submit issues and enhancement requests.
--- a/asset/diagram.png
+++ b/asset/diagram.png