From 94ffd2e59c19de969cbea0ccb862b2de19f08027 Mon Sep 17 00:00:00 2001 From: gq-max <74671963+gq-max@users.noreply.github.com> Date: Wed, 3 Dec 2025 20:52:41 +0800 Subject: [PATCH] Update table format and add venue information --- README.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 8bf9f5c..bc8ab76 100644 --- a/README.md +++ b/README.md @@ -34,23 +34,23 @@ OmniSafeBench-MM is a unified benchmark and open-source toolbox for evaluating m The benchmark unifies multi-modal jailbreak attack–defense evaluation, 13 attack and 15 defense methods, and a three-dimensional scoring protocol measuring harmfulness, alignment, and detail. ## 🗡️ Integrated Attack Methods -| Name | Title | Paper | Code | Classification | -|:----------------------------:|:-----------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------:| -| FigStep / FigStep-Pro | FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts | [link](https://arxiv.org/abs/2311.05608) | [link](https://github.com/ThuCCSLab/FigStep) | Black-box — Structured visual carriers | -| QR-Attack (MM-SafetyBench) | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | [link](https://arxiv.org/abs/2311.17600) | [link](https://github.com/isXinLiu/MM-SafetyBench) | Black-box — Structured visual carriers | -| MML | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | [link](https://aclanthology.org/2025.acl-long.74/) | [link](https://github.com/wangyu-ovo/MML) | Black-box — Structured visual carriers | -| CS-DJ | Distraction is All You Need for Multimodal Large Language Model Jailbreaking | [link](https://arxiv.org/abs/2502.10794) | [link](https://github.com/TeamPigeonLab/CS-DJ) | Black-box — OOD (attention / distribution manipulation) | -| SI-Attack | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | [link](https://arxiv.org/abs/2501.04931) | [link](https://github.com/zhaoshiji123/SI-Attack) | Black-box — OOD (shuffle / attention inconsistency) | -| JOOD | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | [link](https://arxiv.org/abs/2503.20823) | [link](https://github.com/naver-ai/JOOD) | Black-box — OOD (OOD strategy) | -| HIMRD | Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack | [link](https://arxiv.org/abs/2412.05934) | [link](https://github.com/MaTengSYSU/HIMRD-jailbreak) | Black-box — OOD / risk distribution | -| HADES | Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | [link](https://arxiv.org/abs/2403.09792) | [link](https://github.com/AoiDragon/HADES) | Black-box — Structured visual carriers | -| BAP | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) | [link](https://arxiv.org/abs/2406.04031) | [link](https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt) | White-box — Cross-modal | -| visual_adv | Visual Adversarial Examples Jailbreak Aligned Large Language Models | [link](https://ojs.aaai.org/index.php/AAAI/article/view/30150) | [link](https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models) | White-box — Single-modal | -| VisCRA | VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | [link](https://arxiv.org/abs/2505.19684) | [link](https://github.com/DyMessi/VisCRA) | Black-box — Structured visual carriers | -| UMK | White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key, UMK) | [link](https://arxiv.org/abs/2405.17894) | [link](https://github.com/roywang021/UMK) | White-box — Cross-modal | -| PBI-Attack | Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization (PBI-Attack) | [link](https://aclanthology.org/2025.emnlp-main.32.pdf) | [link](https://github.com/Rosy0912/PBI-Attack) | Black-box — Query optimization & transfer | -| ImgJP / DeltaJP | Jailbreaking Attack against Multimodal Large Language Model (imgJP / deltaJP) | [link](https://arxiv.org/abs/2402.02309) | [link](https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model) | White-box — Single-modal | -| JPS | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering | [link](https://arxiv.org/abs/2508.05087) | [link](https://github.com/thu-coai/JPS) | White-box — Cross-modal | +| Name | Title | Venue | Paper | Code | +| :------------------------: | :----------------------------------------------------------: | :--------: | :----------------------------------------------------: | :----------------------------------------------------------: | +| FigStep / FigStep-Pro | FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts | AAAI 2025 | https://arxiv.org/abs/2311.05608 | https://github.com/ThuCCSLab/FigStep | +| QR-Attack (MM-SafetyBench) | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | ECCV 2024 | https://arxiv.org/abs/2311.17600 | https://github.com/isXinLiu/MM-SafetyBench | +| MML | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | ACL 2025 | https://aclanthology.org/2025.acl-long.74/ | https://github.com/wangyu-ovo/MML | +| CS-DJ | Distraction is All You Need for Multimodal Large Language Model Jailbreaking | CVPR 2025 | https://arxiv.org/abs/2502.10794 | https://github.com/TeamPigeonLab/CS-DJ | +| SI-Attack | Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency | ICCV 2025 | https://arxiv.org/abs/2501.04931 | https://github.com/zhaoshiji123/SI-Attack | +| JOOD | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | CVPR 2025 | https://arxiv.org/abs/2503.20823 | https://github.com/naver-ai/JOOD | +| HIMRD | Heuristic-Induced Multimodal Risk Distribution (HIMRD) Jailbreak Attack | ICCV 2025 | https://arxiv.org/abs/2412.05934 | https://github.com/MaTengSYSU/HIMRD-jailbreak | +| HADES | Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking MLLMs | ECCV 2024 | https://arxiv.org/abs/2403.09792 | https://github.com/AoiDragon/HADES | +| BAP | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt (BAP) | TIFS 2025 | https://arxiv.org/abs/2406.04031 | https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt | +| visual_adv | Visual Adversarial Examples Jailbreak Aligned Large Language Models | AAAI 2024 | https://ojs.aaai.org/index.php/AAAI/article/view/30150 | https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models | +| VisCRA | VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | EMNLP 2025 | https://arxiv.org/abs/2505.19684 | https://github.com/DyMessi/VisCRA | +| UMK | White-box Multimodal Jailbreaks Against Large Vision-Language Models (Universal Master Key) | ACMMM 2024 | https://arxiv.org/abs/2405.17894 | https://github.com/roywang021/UMK | +| PBI-Attack | Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization | EMNLP 2025 | https://aclanthology.org/2025.emnlp-main.32.pdf | https://github.com/Rosy0912/PBI-Attack | +| ImgJP / DeltaJP | Jailbreaking Attack against Multimodal Large Language Models | arXiv 2024 | https://arxiv.org/abs/2402.02309 | https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model | +| JPS | JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering | ACMMM 2025 | https://arxiv.org/abs/2508.05087 | https://github.com/thu-coai/JPS | ## 🛡️Integrated Defense Methods