jiaxiaojunQAQ 0a30a1e973 add readme
2025-11-03 02:28:54 +08:00
2025-11-03 02:28:54 +08:00
2025-11-03 02:28:54 +08:00

OmniSafeBench-MM

A Unified Benchmark and Toolbox for Multimodal Jailbreak AttackDefense Evaluation

🗡️ Integrated Attack Methods

Name Full paper title Paper Code Classification
FigStep / FigStep-Pro FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts paper link code link Black-box — Structured visual carriers
QR-Attack (MM-SafetyBench) MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models paper link code link Black-box — Structured visual carriers
MML Jailbreak Large Vision-Language Models Through Multi-Modal Linkage paper link code link Black-box — Structured visual carriers
CS-DJ Distraction is All You Need for Multimodal Large Language Model Jailbreaking paper link code link Black-box — OOD (attention / distribution manipulation)
SI-Attack Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency paper link code link Black-box — OOD (shuffle / attention inconsistency)
JOOD Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy paper link code link Black-box — OOD (OOD strategy)
HIMRD Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack paper link code link Black-box — OOD / risk distribution
HADES Images are Achilles Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models paper link code link Black-box — Structured visual carriers
BAP Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) paper link code link White-box — Cross-modal
visual_adv Visual Adversarial Examples Jailbreak Aligned Large Language Models paper link code link White-box — Single-modal
VisCRA VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models paper link code link Black-box — Structured visual carriers
UMK White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK) paper link code link White-box — Cross-modal
PBI-Attack Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack) paper link code link Black-box — Query optimization & transfer
ImgJP / DeltaJP Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP) paper link code link White-box — Single-modal
JPS JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering paper link code link White-box — Cross-modal

🛡️Integrated Defense Methods

No. Title Type Venue Paper Code
1 Jailguard: A universal detection framework for llm prompt-based attacks 1 arXiv paper link code link
2 Mllm-protector: Ensuring mllm's safety without hurting performance 1 EMNLP 2024 paper link code link
3 Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation 1 ECCV 2024 paper link code link
4 Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors 1 EMNLP 2024 paper link code link
5 Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting 1 ECCV 2024 paper link code link
6 Uniguard: Towards universal safety guardrails for jailbreak attacks on multimodal large language models 1 ECCV 2024 paper link code link
7 Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision 1 ICML 2025 paper link code link
8 Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models 1 EMNLP 2024 paper link code link
9 Guardreasoner-vl: Safeguarding vlms via reinforced reasoning 1 ICML 2025 paper link code link
10 Llama Guard 4 1 None document link model link
11 QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety 1 arXiv paper link code link
12 Llavaguard: Vlm-based safeguard for vision dataset curation and safety assessment 1 CVPR 2024 paper link code link

More methods are coming soon!!

Description
No description provided
Readme 22 MiB
Languages
Python 100%