mirror of
https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.git
synced 2026-02-12 17:52:46 +00:00
Update README.md
This commit is contained in:
18
README.md
18
README.md
@@ -2,6 +2,24 @@
|
||||
A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation
|
||||
|
||||
## Integrated Attack Methods
|
||||
| Name | Full paper title | Paper | Code | Classification |
|
||||
|------|------------------|-------|------|----------------|
|
||||
| FigStep / FigStep-Pro | FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts | [paper link](https://arxiv.org/abs/2311.05608) | [code link](https://github.com/ThuCCSLab/FigStep) | Black-box — Structured visual carriers |
|
||||
| QR-Attack (MM-SafetyBench) | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | [paper link](https://arxiv.org/abs/2311.17600) | [code link](https://github.com/isXinLiu/MM-SafetyBench) | Black-box — Structured visual carriers |
|
||||
| MML | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | [paper link](https://aclanthology.org/2025.acl-long.74/) | [code link](https://github.com/wangyu-ovo/MML) | Black-box — Structured visual carriers |
|
||||
| CS-DJ | Distraction is All You Need for Multimodal Large Language Model Jailbreaking | [paper link](https://arxiv.org/abs/2502.10794) | [code link](https://github.com/TeamPigeonLab/CS-DJ) | Black-box — OOD (attention / distribution manipulation) |
|
||||
| SI-Attack | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | [paper link](https://arxiv.org/abs/2501.04931) | [code link](https://github.com/zhaoshiji123/SI-Attack) | Black-box — OOD (shuffle / attention inconsistency) |
|
||||
| JOOD | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | [paper link](https://arxiv.org/abs/2503.20823) | [code link](https://github.com/naver-ai/JOOD) | Black-box — OOD (OOD strategy) |
|
||||
| HIMRD | Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack | [paper link](https://arxiv.org/abs/2412.05934) | [code link](https://github.com/MaTengSYSU/HIMRD-jailbreak) | Black-box — OOD / risk distribution |
|
||||
| HADES | Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | [paper link](https://arxiv.org/abs/2403.09792) | [code link](https://github.com/AoiDragon/HADES) | Black-box — Structured visual carriers |
|
||||
| BAP | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) | [paper link](https://arxiv.org/abs/2406.04031) | [code link](https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt) | White-box — Cross-modal |
|
||||
| visual_adv | Visual Adversarial Examples Jailbreak Aligned Large Language Models | [paper link](https://ojs.aaai.org/index.php/AAAI/article/view/30150) | [code link](https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models) | White-box — Single-modal |
|
||||
| VisCRA | VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | [paper link](https://arxiv.org/abs/2505.19684) | [code link](https://github.com/DyMessi/VisCRA) | Black-box — Structured visual carriers |
|
||||
| UMK | White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK) | [paper link](https://arxiv.org/abs/2405.17894) | [code link](https://github.com/roywang021/UMK) | White-box — Cross-modal |
|
||||
| PBI-Attack | Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack) | [paper link](https://aclanthology.org/2025.emnlp-main.32.pdf) | [code link](https://github.com/Rosy0912/PBI-Attack) | Black-box — Query optimization & transfer |
|
||||
| ImgJP / DeltaJP | Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP) | [paper link](https://arxiv.org/abs/2402.02309) | [code link](https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model) | White-box — Single-modal |
|
||||
| JPS | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering | [paper link](https://arxiv.org/abs/2508.05087) | [code link](https://github.com/thu-coai/JPS) | White-box — Cross-modal |
|
||||
|
||||
|
||||
## 🛡️Integrated Defense Methods
|
||||
| No. | Title | Type | Venue | Paper | Code |
|
||||
|
||||
Reference in New Issue
Block a user