2025-11-02 21:47:16 +08:00
2025-11-02 21:47:16 +08:00

OmniSafeBench-MM

A Unified Benchmark and Toolbox for Multimodal Jailbreak AttackDefense Evaluation

Integrated Attack Methods

🛡️Integrated Defense Methods

No. Title Type Venue Paper Code
1 Jailguard: A universal detection framework for llm prompt-based attacks 1 arXiv paper link code link
2 Mllm-protector: Ensuring mllm's safety without hurting performance 1 EMNLP 2024 paper link code link
3 Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation 1 ECCV 2024 paper link code link
4 Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors 1 EMNLP 2024 paper link code link
5 Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting 1 ECCV 2024 paper link code link
6 Uniguard: Towards universal safety guardrails for jailbreak attacks on multimodal large language models 1 ECCV 2024 paper link code link
7 Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision 1 ICML 2025 paper link code link
8 Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models 1 EMNLP 2024 paper link code link
9 Guardreasoner-vl: Safeguarding vlms via reinforced reasoning 1 ICML 2025 paper link code link
10 Llama Guard 4 1 None document link model link
11 QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety 1 arXiv paper link code link
12 Llavaguard: Vlm-based safeguard for vision dataset curation and safety assessment 1 CVPR 2024 paper link code link

More methods are coming soon!!

Description
No description provided
Readme 22 MiB
Languages
Python 100%