CalvinBackup/OmniSafeBench-MM

mirror of https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.git synced 2026-02-12 17:52:46 +00:00

Go to file

jiaxiaojunQAQ 0a30a1e973 add readme

2025-11-03 02:28:54 +08:00

add readme

2025-11-03 02:28:54 +08:00

README.md

add readme

2025-11-03 02:28:54 +08:00

README.md

OmniSafeBench-MM

A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation

🗡️ Integrated Attack Methods

Name	Full paper title	Paper	Code	Classification
FigStep / FigStep-Pro	FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts	paper link	code link	Black-box — Structured visual carriers
QR-Attack (MM-SafetyBench)	MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models	paper link	code link	Black-box — Structured visual carriers
MML	Jailbreak Large Vision-Language Models Through Multi-Modal Linkage	paper link	code link	Black-box — Structured visual carriers
CS-DJ	Distraction is All You Need for Multimodal Large Language Model Jailbreaking	paper link	code link	Black-box — OOD (attention / distribution manipulation)
SI-Attack	Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency	paper link	code link	Black-box — OOD (shuffle / attention inconsistency)
JOOD	Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy	paper link	code link	Black-box — OOD (OOD strategy)
HIMRD	Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack	paper link	code link	Black-box — OOD / risk distribution
HADES	Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models	paper link	code link	Black-box — Structured visual carriers
BAP	Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP)	paper link	code link	White-box — Cross-modal
visual_adv	Visual Adversarial Examples Jailbreak Aligned Large Language Models	paper link	code link	White-box — Single-modal
VisCRA	VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models	paper link	code link	Black-box — Structured visual carriers
UMK	White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK)	paper link	code link	White-box — Cross-modal
PBI-Attack	Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack)	paper link	code link	Black-box — Query optimization & transfer
ImgJP / DeltaJP	Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP)	paper link	code link	White-box — Single-modal
JPS	JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering	paper link	code link	White-box — Cross-modal

🛡️Integrated Defense Methods

No.	Title	Type	Venue	Paper	Code
1	Jailguard: A universal detection framework for llm prompt-based attacks	1	arXiv	paper link	code link
2	Mllm-protector: Ensuring mllm's safety without hurting performance	1	EMNLP 2024	paper link	code link
3	Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation	1	ECCV 2024	paper link	code link
4	Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors	1	EMNLP 2024	paper link	code link
5	Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting	1	ECCV 2024	paper link	code link
6	Uniguard: Towards universal safety guardrails for jailbreak attacks on multimodal large language models	1	ECCV 2024	paper link	code link
7	Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision	1	ICML 2025	paper link	code link
8	Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models	1	EMNLP 2024	paper link	code link
9	Guardreasoner-vl: Safeguarding vlms via reinforced reasoning	1	ICML 2025	paper link	code link
10	Llama Guard 4	1	None	document link	model link
11	QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety	1	arXiv	paper link	code link
12	Llavaguard: Vlm-based safeguard for vision dataset curation and safety assessment	1	CVPR 2024	paper link	code link

More methods are coming soon!!