From c8f00cb565de8ce5297d325eb58a4a93071006e5 Mon Sep 17 00:00:00 2001
From: jiaxiaojunQAQ <1642768580@qq.com>
Date: Wed, 3 Dec 2025 01:42:18 +0800
Subject: [PATCH] Update
---
README.md | 82 ++++++++++++++++++++++++++++---------------------------
1 file changed, 42 insertions(+), 40 deletions(-)
diff --git a/README.md b/README.md
index 80abb45..4964751 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,3 @@
-
-
-
-
-
@@ -10,6 +5,13 @@
+
+
+
+
+
+
+
## ๐ฐ News ๐ฐ
* ***[2024/02/06] ๐ [Initial release of OmniSafeBench-MM with 13 attack methods and 15 defense strategies]()*** ๐
@@ -32,42 +34,42 @@ OmniSafeBench-MM is a unified benchmark and open-source toolbox for evaluating m
The benchmark unifies multi-modal jailbreak attackโdefense evaluation, 13 attack and 15 defense methods, and a three-dimensional scoring protocol measuring harmfulness, alignment, and detail.
## ๐ก๏ธ Integrated Attack Methods
-| Name | Title | Paper | Code | Classification |
-|------|-----------------------------------------------------------------------------------------------------------------------------|-------|------|----------------|
-| FigStep / FigStep-Pro | FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts | [link](https://arxiv.org/abs/2311.05608) | [link](https://github.com/ThuCCSLab/FigStep) | Black-box โ Structured visual carriers |
-| QR-Attack (MM-SafetyBench) | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | [link](https://arxiv.org/abs/2311.17600) | [link](https://github.com/isXinLiu/MM-SafetyBench) | Black-box โ Structured visual carriers |
-| MML | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | [link](https://aclanthology.org/2025.acl-long.74/) | [link](https://github.com/wangyu-ovo/MML) | Black-box โ Structured visual carriers |
-| CS-DJ | Distraction is All You Need for Multimodal Large Language Model Jailbreaking | [link](https://arxiv.org/abs/2502.10794) | [link](https://github.com/TeamPigeonLab/CS-DJ) | Black-box โ OOD (attention / distribution manipulation) |
-| SI-Attack | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | [link](https://arxiv.org/abs/2501.04931) | [link](https://github.com/zhaoshiji123/SI-Attack) | Black-box โ OOD (shuffle / attention inconsistency) |
-| JOOD | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | [link](https://arxiv.org/abs/2503.20823) | [link](https://github.com/naver-ai/JOOD) | Black-box โ OOD (OOD strategy) |
-| HIMRD | Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack | [link](https://arxiv.org/abs/2412.05934) | [link](https://github.com/MaTengSYSU/HIMRD-jailbreak) | Black-box โ OOD / risk distribution |
-| HADES | Images are Achillesโ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | [link](https://arxiv.org/abs/2403.09792) | [link](https://github.com/AoiDragon/HADES) | Black-box โ Structured visual carriers |
-| BAP | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) | [link](https://arxiv.org/abs/2406.04031) | [link](https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt) | White-box โ Cross-modal |
-| visual_adv | Visual Adversarial Examples Jailbreak Aligned Large Language Models | [link](https://ojs.aaai.org/index.php/AAAI/article/view/30150) | [link](https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models) | White-box โ Single-modal |
-| VisCRA | VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | [link](https://arxiv.org/abs/2505.19684) | [link](https://github.com/DyMessi/VisCRA) | Black-box โ Structured visual carriers |
-| UMK | White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK) | [link](https://arxiv.org/abs/2405.17894) | [link](https://github.com/roywang021/UMK) | White-box โ Cross-modal |
-| PBI-Attack | Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack) | [link](https://aclanthology.org/2025.emnlp-main.32.pdf) | [link](https://github.com/Rosy0912/PBI-Attack) | Black-box โ Query optimization & transfer |
-| ImgJP / DeltaJP | Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP) | [link](https://arxiv.org/abs/2402.02309) | [link](https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model) | White-box โ Single-modal |
-| JPS | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering | [link](https://arxiv.org/abs/2508.05087) | [link](https://github.com/thu-coai/JPS) | White-box โ Cross-modal |
+| Name | Title | Paper | Code | Classification |
+|:----------------------------:|:-----------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------:|
+| FigStep / FigStep-Pro | FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts | [link](https://arxiv.org/abs/2311.05608) | [link](https://github.com/ThuCCSLab/FigStep) | Black-box โ Structured visual carriers |
+| QR-Attack (MM-SafetyBench) | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | [link](https://arxiv.org/abs/2311.17600) | [link](https://github.com/isXinLiu/MM-SafetyBench) | Black-box โ Structured visual carriers |
+| MML | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | [link](https://aclanthology.org/2025.acl-long.74/) | [link](https://github.com/wangyu-ovo/MML) | Black-box โ Structured visual carriers |
+| CS-DJ | Distraction is All You Need for Multimodal Large Language Model Jailbreaking | [link](https://arxiv.org/abs/2502.10794) | [link](https://github.com/TeamPigeonLab/CS-DJ) | Black-box โ OOD (attention / distribution manipulation) |
+| SI-Attack | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | [link](https://arxiv.org/abs/2501.04931) | [link](https://github.com/zhaoshiji123/SI-Attack) | Black-box โ OOD (shuffle / attention inconsistency) |
+| JOOD | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | [link](https://arxiv.org/abs/2503.20823) | [link](https://github.com/naver-ai/JOOD) | Black-box โ OOD (OOD strategy) |
+| HIMRD | Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack | [link](https://arxiv.org/abs/2412.05934) | [link](https://github.com/MaTengSYSU/HIMRD-jailbreak) | Black-box โ OOD / risk distribution |
+| HADES | Images are Achillesโ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | [link](https://arxiv.org/abs/2403.09792) | [link](https://github.com/AoiDragon/HADES) | Black-box โ Structured visual carriers |
+| BAP | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) | [link](https://arxiv.org/abs/2406.04031) | [link](https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt) | White-box โ Cross-modal |
+| visual_adv | Visual Adversarial Examples Jailbreak Aligned Large Language Models | [link](https://ojs.aaai.org/index.php/AAAI/article/view/30150) | [link](https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models) | White-box โ Single-modal |
+| VisCRA | VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | [link](https://arxiv.org/abs/2505.19684) | [link](https://github.com/DyMessi/VisCRA) | Black-box โ Structured visual carriers |
+| UMK | White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK) | [link](https://arxiv.org/abs/2405.17894) | [link](https://github.com/roywang021/UMK) | White-box โ Cross-modal |
+| PBI-Attack | Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack) | [link](https://aclanthology.org/2025.emnlp-main.32.pdf) | [link](https://github.com/Rosy0912/PBI-Attack) | Black-box โ Query optimization & transfer |
+| ImgJP / DeltaJP | Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP) | [link](https://arxiv.org/abs/2402.02309) | [link](https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model) | White-box โ Single-modal |
+| JPS | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering | [link](https://arxiv.org/abs/2508.05087) | [link](https://github.com/thu-coai/JPS) | White-box โ Cross-modal |
## ๐ก๏ธIntegrated Defense Methods
-| Name | Title | Venue | Paper | Code |
-|---|--------------------------------------------------------------------------------------------------------------------|---|---|---|
-| JailGuard | JailGuard: A Universal Detection Framework for Prompt-based Attacks on LLM Systems | TOSEM2025 | [link](https://arxiv.org/abs/2312.10766) | [link](https://github.com/shiningrain/JailGuard) |
-| MLLM-Protector | MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | EMNLP2024 | [link](https://arxiv.org/abs/2401.02906) | [link](https://github.com/pipilurj/MLLM-protector) |
-| ECSO | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | ECCV2024 | [link](https://arxiv.org/abs/2403.09572) | [link](https://gyhdog99.github.io/projects/ecso/) |
-| ShieldLM | ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | EMNLP2024 | [link](https://arxiv.org/abs/2402.16444) | [link](https://github.com/thu-coai/ShieldLM) |
-| AdaShield | AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting | ECCV2024 | [link](https://arxiv.org/abs/2403.09513) | [link](https://github.com/rain305f/AdaShield) |
-| Uniguard | UNIGUARD: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models | ECCV2024 | [link](https://arxiv.org/abs/2411.01703) | [link](https://anonymous.4open.science/r/UniGuard/README.md) |
-| DPS | Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision | ICML2025 | [link](https://arxiv.org/abs/2412.12722) | [link](https://github.com/tools-only/DPS) |
-| CIDER | Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models | EMNLP2024 | [link](https://arxiv.org/abs/2407.21659) | [link](https://github.com/PandragonXIII/CIDER) |
-| GuardReasoner-VL | GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning | ICML2025 | [link](https://arxiv.org/abs/2505.11049) | [link](https://github.com/yueliu1999/GuardReasoner-VL) |
-| Llama-Guard-4 | Llama Guard 4 | Model Card | [link](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-4/) | [link](https://huggingface.co/meta-llama/Llama-Guard-4-12B) |
-| QGuard | QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety | arXiv | [link](https://arxiv.org/abs/2506.12299) | [link](https://github.com/taegyeong-lee/QGuard-Question-based-Zero-shot-Guard-for-Multi-modal-LLM-Safety) |
-| LlavaGuard | LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models | ICML2025 | [link](https://arxiv.org/abs/2406.05113) | [link](https://github.com/ml-research/LlavaGuard) |
-| Llama-Guard-3 | Llama Guard 3 | Model Card | [link](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/) | [link](https://huggingface.co/meta-llama/Llama-Guard-3-8B) |
-| HiddenDetect | HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States | ACL2025 | [link](https://arxiv.org/abs/2502.14744) | [link](https://github.com/leigest519/HiddenDetect) |
-| CoCA | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | COLM2024 | [link](https://arxiv.org/abs/2409.11365) | Null |
-| VLGuard | Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models | ICML2024 | [link](https://arxiv.org/abs/2402.02207) | [link](https://github.com/ys-zong/VLGuard) |
+| Name | Title | Venue | Paper | Code |
+|:-----------------:|:--------------------------------------------------------------------------------------------------------------------:|:----------:|:---------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------:|
+| JailGuard | JailGuard: A Universal Detection Framework for Prompt-based Attacks on LLM Systems | TOSEM2025 | [link](https://arxiv.org/abs/2312.10766) | [link](https://github.com/shiningrain/JailGuard) |
+| MLLM-Protector | MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | EMNLP2024 | [link](https://arxiv.org/abs/2401.02906) | [link](https://github.com/pipilurj/MLLM-protector) |
+| ECSO | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | ECCV2024 | [link](https://arxiv.org/abs/2403.09572) | [link](https://gyhdog99.github.io/projects/ecso/) |
+| ShieldLM | ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | EMNLP2024 | [link](https://arxiv.org/abs/2402.16444) | [link](https://github.com/thu-coai/ShieldLM) |
+| AdaShield | AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting | ECCV2024 | [link](https://arxiv.org/abs/2403.09513) | [link](https://github.com/rain305f/AdaShield) |
+| Uniguard | UNIGUARD: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models | ECCV2024 | [link](https://arxiv.org/abs/2411.01703) | [link](https://anonymous.4open.science/r/UniGuard/README.md) |
+| DPS | Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision | ICML2025 | [link](https://arxiv.org/abs/2412.12722) | [link](https://github.com/tools-only/DPS) |
+| CIDER | Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models | EMNLP2024 | [link](https://arxiv.org/abs/2407.21659) | [link](https://github.com/PandragonXIII/CIDER) |
+| GuardReasoner-VL | GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning | ICML2025 | [link](https://arxiv.org/abs/2505.11049) | [link](https://github.com/yueliu1999/GuardReasoner-VL) |
+| Llama-Guard-4 | Llama Guard 4 | Model Card | [link](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-4/) | [link](https://huggingface.co/meta-llama/Llama-Guard-4-12B) |
+| QGuard | QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety | ArXiv | [link](https://arxiv.org/abs/2506.12299) | [link](https://github.com/taegyeong-lee/QGuard-Question-based-Zero-shot-Guard-for-Multi-modal-LLM-Safety) |
+| LlavaGuard | LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models | ICML2025 | [link](https://arxiv.org/abs/2406.05113) | [link](https://github.com/ml-research/LlavaGuard) |
+| Llama-Guard-3 | Llama Guard 3 | Model Card | [link](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/) | [link](https://huggingface.co/meta-llama/Llama-Guard-3-8B) |
+| HiddenDetect | HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States | ACL2025 | [link](https://arxiv.org/abs/2502.14744) | [link](https://github.com/leigest519/HiddenDetect) |
+| CoCA | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | COLM2024 | [link](https://arxiv.org/abs/2409.11365) | - |
+| VLGuard | Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models | ICML2024 | [link](https://arxiv.org/abs/2402.02207) | [link](https://github.com/ys-zong/VLGuard) |
> More methods are coming soon!!