mirror of
https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.git
synced 2026-02-12 17:52:46 +00:00
0a30a1e973d28a1e35320f8b402e76cd38cc9d60
OmniSafeBench-MM
A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation
🗡️ Integrated Attack Methods
| Name | Full paper title | Paper | Code | Classification |
|---|---|---|---|---|
| FigStep / FigStep-Pro | FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts | paper link | code link | Black-box — Structured visual carriers |
| QR-Attack (MM-SafetyBench) | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | paper link | code link | Black-box — Structured visual carriers |
| MML | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | paper link | code link | Black-box — Structured visual carriers |
| CS-DJ | Distraction is All You Need for Multimodal Large Language Model Jailbreaking | paper link | code link | Black-box — OOD (attention / distribution manipulation) |
| SI-Attack | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | paper link | code link | Black-box — OOD (shuffle / attention inconsistency) |
| JOOD | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | paper link | code link | Black-box — OOD (OOD strategy) |
| HIMRD | Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack | paper link | code link | Black-box — OOD / risk distribution |
| HADES | Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | paper link | code link | Black-box — Structured visual carriers |
| BAP | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) | paper link | code link | White-box — Cross-modal |
| visual_adv | Visual Adversarial Examples Jailbreak Aligned Large Language Models | paper link | code link | White-box — Single-modal |
| VisCRA | VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | paper link | code link | Black-box — Structured visual carriers |
| UMK | White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK) | paper link | code link | White-box — Cross-modal |
| PBI-Attack | Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack) | paper link | code link | Black-box — Query optimization & transfer |
| ImgJP / DeltaJP | Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP) | paper link | code link | White-box — Single-modal |
| JPS | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering | paper link | code link | White-box — Cross-modal |
🛡️Integrated Defense Methods
| No. | Title | Type | Venue | Paper | Code |
|---|---|---|---|---|---|
| 1 | Jailguard: A universal detection framework for llm prompt-based attacks | 1 | arXiv | paper link | code link |
| 2 | Mllm-protector: Ensuring mllm's safety without hurting performance | 1 | EMNLP 2024 | paper link | code link |
| 3 | Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation | 1 | ECCV 2024 | paper link | code link |
| 4 | Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors | 1 | EMNLP 2024 | paper link | code link |
| 5 | Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting | 1 | ECCV 2024 | paper link | code link |
| 6 | Uniguard: Towards universal safety guardrails for jailbreak attacks on multimodal large language models | 1 | ECCV 2024 | paper link | code link |
| 7 | Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision | 1 | ICML 2025 | paper link | code link |
| 8 | Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models | 1 | EMNLP 2024 | paper link | code link |
| 9 | Guardreasoner-vl: Safeguarding vlms via reinforced reasoning | 1 | ICML 2025 | paper link | code link |
| 10 | Llama Guard 4 | 1 | None | document link | model link |
| 11 | QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety | 1 | arXiv | paper link | code link |
| 12 | Llavaguard: Vlm-based safeguard for vision dataset curation and safety assessment | 1 | CVPR 2024 | paper link | code link |
More methods are coming soon!!
Description
Languages
Python
100%