From bd1cd2e1b113ff2af53b5c251d876a18d5e62e03 Mon Sep 17 00:00:00 2001 From: dongdongunique <909580378@qq.com> Date: Wed, 4 Feb 2026 13:53:11 +0800 Subject: [PATCH] =?UTF-8?q?=E2=9C=85=20Complete=20Phase=202:=20SOTA=20LLM?= =?UTF-8?q?=20Testing?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add EvoSynth vs X-Teaming comparison table with 20 models - EvoSynth achieves 98.8% avg ASR vs X-Teaming 87.9% (+10.9% improvement) - 100% ASR on 17 out of 20 evaluated models - Add arxiv/ folder to .gitignore --- .gitignore | 2 ++ README.md | 35 ++++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 6619040..e7e3626 100644 --- a/.gitignore +++ b/.gitignore @@ -76,3 +76,5 @@ async_logs/ asset/sec/ # Original PDF (using PNG version) asset/diagram.pdf +arXiv-2601.01592v2.tar.gz +arxiv/ \ No newline at end of file diff --git a/README.md b/README.md index acd0bf5..e1f4311 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,39 @@ *Evaluated against 11 state-of-the-art baselines on Harmbench Standard dataset* +### 🚀 EvoSynth vs X-Teaming: 2025 SOTA Model Evaluation + +Evaluated on **20 recent state-of-the-art models**, EvoSynth achieves near-perfect ASR across the board: + +| Model | X-Teaming | EvoSynth | Improvement | +|-------|-----------|----------|-------------| +| **GPT-5.2** | 75.5% | 99.0% | +23.5% | +| **GPT-5.1** | 95.5% | 100.0% | +4.5% | +| **Claude Haiku 4.5** | 47.5% | 74.0% | +26.5% | +| **Gemini 3 Pro Preview** | 86.5% | 100.0% | +13.5% | +| **Gemini 2.5 Flash Thinking** | 89.0% | 100.0% | +11.0% | +| **Mistral Large 3** | 91.0% | 100.0% | +9.0% | +| **Llama-4-Maverick** | 86.0% | 100.0% | +14.0% | +| **Llama-4-Scout** | 98.0% | 100.0% | +2.0% | +| **Grok 4.1 Fast** | 90.5% | 100.0% | +9.5% | +| **Doubao-Seed-1.6** | 87.0% | 100.0% | +13.0% | +| **Qwen3-Max** | 94.0% | 100.0% | +6.0% | +| **Qwen3-235B-A22B** | 98.5% | 100.0% | +1.5% | +| **Qwen3-Next-80B-A3B** | 80.5% | 100.0% | +19.5% | +| **DeepSeek-R1** | 94.0% | 100.0% | +6.0% | +| **DeepSeek-V3.2** | 99.0% | 100.0% | +1.0% | +| **Kimi K2-Instruct** | 89.5% | 100.0% | +10.5% | +| **MiniMax-M2** | 93.0% | 100.0% | +7.0% | +| **GLM-4.6** | 98.5% | 100.0% | +1.5% | +| **Hunyuan-A13B-Instruct** | 97.0% | 100.0% | +3.0% | +| **ERNIE-4.5-300B-A47B** | 95.0% | 100.0% | +5.0% | +| **Average** | **87.9%** | **98.8%** | **+10.9%** | + +**Key Findings:** +- EvoSynth achieves **100% ASR** on 17 out of 20 models +- Largest gains on harder-to-break models (Claude Haiku 4.5: +26.5%) +- Higher attack diversity (0.82 vs 0.79) for more varied jailbreak strategies + ### 🔬 Key Innovation **From Prompt Refinement to Method Evolution** @@ -317,7 +350,7 @@ EvoSynth generates significantly more diverse attacks than existing methods: ## TODO - [x] **Phase 1: Framework Development** - Core architecture and multi-agent system implementation -- [ ] **Phase 2: SOTA LLM Testing** - Evaluating framework against recent released state-of-the-art LLMs (GPT-5.1, GEMINI 3.0 Pro, Claude 4.5 Opus etc.) +- [x] **Phase 2: SOTA LLM Testing** - Evaluated against 20 recent SOTA models (GPT-5.1/5.2, Claude Haiku 4.5, Gemini 3 Pro, Llama-4, DeepSeek-V3.2, etc.) with 98.8% average ASR - [ ] **Phase 3: Dataset Curation** - Filtering and curating generated attack results to create a new research dataset ---