2025-12-03 20:52:41 +08:00
2025-11-03 02:28:54 +08:00
2025-12-03 01:11:17 +08:00

🗡️ OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak AttackDefense Evaluation 🛡️

OmniSafeBench-MM

📰 News 📰

🗡️ What is OmniSafeBench-MM 🛡️

OmniSafeBench-MM is a unified benchmark and open-source toolbox for evaluating multimodal jailbreak attacks and defenses in Large VisionLanguage Models (MLLMs). It integrates a large-scale dataset spanning 89 major risk domains and 50 fine-grained categories, supports three real-world prompt types (consultative, imperative, declarative), and implements 13 representative attack methods and 15 defense strategies in a modular pipeline. Beyond traditional ASR, it introduces a three-dimensional evaluation protocol measuring harmfulness, intent alignment, and response detail, enabling fine-grained safetyutility analysis. Tested across 18 open-source and closed-source MLLMs, OmniSafeBench-MM provides a comprehensive, reproducible, and extensible platform for benchmarking multimodal safety.

🌐 Overview 🌐

HarmBench Evaluation Pipeline

Overview of OmniSafeBench-MM. The benchmark unifies multi-modal jailbreak attackdefense evaluation, 13 attack and 15 defense methods, and a three-dimensional scoring protocol measuring harmfulness, alignment, and detail.

🗡️ Integrated Attack Methods

Name Title Venue Paper Code
FigStep / FigStep-Pro FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts AAAI 2025 https://arxiv.org/abs/2311.05608 https://github.com/ThuCCSLab/FigStep
QR-Attack (MM-SafetyBench) MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models ECCV 2024 https://arxiv.org/abs/2311.17600 https://github.com/isXinLiu/MM-SafetyBench
MML Jailbreak Large Vision-Language Models Through Multi-Modal Linkage ACL 2025 https://aclanthology.org/2025.acl-long.74/ https://github.com/wangyu-ovo/MML
CS-DJ Distraction is All You Need for Multimodal Large Language Model Jailbreaking CVPR 2025 https://arxiv.org/abs/2502.10794 https://github.com/TeamPigeonLab/CS-DJ
SI-Attack Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency ICCV 2025 https://arxiv.org/abs/2501.04931 https://github.com/zhaoshiji123/SI-Attack
JOOD Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy CVPR 2025 https://arxiv.org/abs/2503.20823 https://github.com/naver-ai/JOOD
HIMRD Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack ICCV 2025 https://arxiv.org/abs/2412.05934 https://github.com/MaTengSYSU/HIMRD-jailbreak
HADES Images are Achilles Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking MLLMs ECCV 2024 https://arxiv.org/abs/2403.09792 https://github.com/AoiDragon/HADES
BAP Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) TIFS 2025 https://arxiv.org/abs/2406.04031 https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
visual_adv Visual Adversarial Examples Jailbreak Aligned Large Language Models AAAI 2024 https://ojs.aaai.org/index.php/AAAI/article/view/30150 https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
VisCRA VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models EMNLP 2025 https://arxiv.org/abs/2505.19684 https://github.com/DyMessi/VisCRA
UMK White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key) ACMMM 2024 https://arxiv.org/abs/2405.17894 https://github.com/roywang021/UMK
PBI-Attack Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization EMNLP 2025 https://aclanthology.org/2025.emnlp-main.32.pdf https://github.com/Rosy0912/PBI-Attack
ImgJP / DeltaJP Jailbreaking Attack against Multimodal Large Language Models arXiv 2024 https://arxiv.org/abs/2402.02309 https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model
JPS JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering ACMMM 2025 https://arxiv.org/abs/2508.05087 https://github.com/thu-coai/JPS

🛡️Integrated Defense Methods

Name Title Venue Paper Code
JailGuard JailGuard: A Universal Detection Framework for Prompt-based Attacks on LLM Systems TOSEM2025 link link
MLLM-Protector MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance EMNLP2024 link link
ECSO Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation ECCV2024 link link
ShieldLM ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors EMNLP2024 link link
AdaShield AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting ECCV2024 link link
Uniguard UNIGUARD: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models ECCV2024 link link
DPS Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision ICML2025 link link
CIDER Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models EMNLP2024 link link
GuardReasoner-VL GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning ICML2025 link link
Llama-Guard-4 Llama Guard 4 Model Card link link
QGuard QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety ArXiv link link
LlavaGuard LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models ICML2025 link link
Llama-Guard-3 Llama Guard 3 Model Card link link
HiddenDetect HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States ACL2025 link link
CoCA CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration COLM2024 link -
VLGuard Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models ICML2024 link link

More methods are coming soon!!

Description
No description provided
Readme 22 MiB
Languages
Python 100%