mirror of
https://github.com/dongdongunique/EvoSynth.git
synced 2026-02-12 17:22:44 +00:00
✅ Complete Phase 2: SOTA LLM Testing
- Add EvoSynth vs X-Teaming comparison table with 20 models - EvoSynth achieves 98.8% avg ASR vs X-Teaming 87.9% (+10.9% improvement) - 100% ASR on 17 out of 20 evaluated models - Add arxiv/ folder to .gitignore
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -76,3 +76,5 @@ async_logs/
|
||||
asset/sec/
|
||||
# Original PDF (using PNG version)
|
||||
asset/diagram.pdf
|
||||
arXiv-2601.01592v2.tar.gz
|
||||
arxiv/
|
||||
35
README.md
35
README.md
@@ -30,6 +30,39 @@
|
||||
|
||||
*Evaluated against 11 state-of-the-art baselines on Harmbench Standard dataset*
|
||||
|
||||
### 🚀 EvoSynth vs X-Teaming: 2025 SOTA Model Evaluation
|
||||
|
||||
Evaluated on **20 recent state-of-the-art models**, EvoSynth achieves near-perfect ASR across the board:
|
||||
|
||||
| Model | X-Teaming | EvoSynth | Improvement |
|
||||
|-------|-----------|----------|-------------|
|
||||
| **GPT-5.2** | 75.5% | 99.0% | +23.5% |
|
||||
| **GPT-5.1** | 95.5% | 100.0% | +4.5% |
|
||||
| **Claude Haiku 4.5** | 47.5% | 74.0% | +26.5% |
|
||||
| **Gemini 3 Pro Preview** | 86.5% | 100.0% | +13.5% |
|
||||
| **Gemini 2.5 Flash Thinking** | 89.0% | 100.0% | +11.0% |
|
||||
| **Mistral Large 3** | 91.0% | 100.0% | +9.0% |
|
||||
| **Llama-4-Maverick** | 86.0% | 100.0% | +14.0% |
|
||||
| **Llama-4-Scout** | 98.0% | 100.0% | +2.0% |
|
||||
| **Grok 4.1 Fast** | 90.5% | 100.0% | +9.5% |
|
||||
| **Doubao-Seed-1.6** | 87.0% | 100.0% | +13.0% |
|
||||
| **Qwen3-Max** | 94.0% | 100.0% | +6.0% |
|
||||
| **Qwen3-235B-A22B** | 98.5% | 100.0% | +1.5% |
|
||||
| **Qwen3-Next-80B-A3B** | 80.5% | 100.0% | +19.5% |
|
||||
| **DeepSeek-R1** | 94.0% | 100.0% | +6.0% |
|
||||
| **DeepSeek-V3.2** | 99.0% | 100.0% | +1.0% |
|
||||
| **Kimi K2-Instruct** | 89.5% | 100.0% | +10.5% |
|
||||
| **MiniMax-M2** | 93.0% | 100.0% | +7.0% |
|
||||
| **GLM-4.6** | 98.5% | 100.0% | +1.5% |
|
||||
| **Hunyuan-A13B-Instruct** | 97.0% | 100.0% | +3.0% |
|
||||
| **ERNIE-4.5-300B-A47B** | 95.0% | 100.0% | +5.0% |
|
||||
| **Average** | **87.9%** | **98.8%** | **+10.9%** |
|
||||
|
||||
**Key Findings:**
|
||||
- EvoSynth achieves **100% ASR** on 17 out of 20 models
|
||||
- Largest gains on harder-to-break models (Claude Haiku 4.5: +26.5%)
|
||||
- Higher attack diversity (0.82 vs 0.79) for more varied jailbreak strategies
|
||||
|
||||
### 🔬 Key Innovation
|
||||
|
||||
**From Prompt Refinement to Method Evolution**
|
||||
@@ -317,7 +350,7 @@ EvoSynth generates significantly more diverse attacks than existing methods:
|
||||
## TODO
|
||||
|
||||
- [x] **Phase 1: Framework Development** - Core architecture and multi-agent system implementation
|
||||
- [ ] **Phase 2: SOTA LLM Testing** - Evaluating framework against recent released state-of-the-art LLMs (GPT-5.1, GEMINI 3.0 Pro, Claude 4.5 Opus etc.)
|
||||
- [x] **Phase 2: SOTA LLM Testing** - Evaluated against 20 recent SOTA models (GPT-5.1/5.2, Claude Haiku 4.5, Gemini 3 Pro, Llama-4, DeepSeek-V3.2, etc.) with 98.8% average ASR
|
||||
- [ ] **Phase 3: Dataset Curation** - Filtering and curating generated attack results to create a new research dataset
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user