diff --git a/README.md b/README.md index 10802f6..80abb45 100644 --- a/README.md +++ b/README.md @@ -55,19 +55,19 @@ The benchmark unifies multi-modal jailbreak attack–defense evaluation, 13 atta | Name | Title | Venue | Paper | Code | |---|--------------------------------------------------------------------------------------------------------------------|---|---|---| | JailGuard | JailGuard: A Universal Detection Framework for Prompt-based Attacks on LLM Systems | TOSEM2025 | [link](https://arxiv.org/abs/2312.10766) | [link](https://github.com/shiningrain/JailGuard) | -| MLLM-Protector | MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | EMNLP 2024 | [link](https://arxiv.org/abs/2401.02906) | [link](https://github.com/pipilurj/MLLM-protector) | -| ECSO | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | ECCV 2024 | [link](https://arxiv.org/abs/2403.09572) | [link](https://gyhdog99.github.io/projects/ecso/) | -| ShieldLM | ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | EMNLP 2024 | [link](https://arxiv.org/abs/2402.16444) | [link](https://github.com/thu-coai/ShieldLM) | -| AdaShield | AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting | ECCV 2024 | [link](https://arxiv.org/abs/2403.09513) | [link](https://github.com/rain305f/AdaShield) | -| Uniguard | UNIGUARD: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models | ECCV 2024 | [link](https://arxiv.org/abs/2411.01703) | [link](https://anonymous.4open.science/r/UniGuard/README.md) | -| DPS | Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision | ICML 2025 | [link](https://arxiv.org/abs/2412.12722) | [link](https://github.com/tools-only/DPS) | -| CIDER | Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models | EMNLP 2024 | [link](https://arxiv.org/abs/2407.21659) | [link](https://github.com/PandragonXIII/CIDER) | -| GuardReasoner-VL | GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning | ICML 2025 | [link](https://arxiv.org/abs/2505.11049) | [link](https://github.com/yueliu1999/GuardReasoner-VL) | +| MLLM-Protector | MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | EMNLP2024 | [link](https://arxiv.org/abs/2401.02906) | [link](https://github.com/pipilurj/MLLM-protector) | +| ECSO | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | ECCV2024 | [link](https://arxiv.org/abs/2403.09572) | [link](https://gyhdog99.github.io/projects/ecso/) | +| ShieldLM | ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | EMNLP2024 | [link](https://arxiv.org/abs/2402.16444) | [link](https://github.com/thu-coai/ShieldLM) | +| AdaShield | AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting | ECCV2024 | [link](https://arxiv.org/abs/2403.09513) | [link](https://github.com/rain305f/AdaShield) | +| Uniguard | UNIGUARD: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models | ECCV2024 | [link](https://arxiv.org/abs/2411.01703) | [link](https://anonymous.4open.science/r/UniGuard/README.md) | +| DPS | Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision | ICML2025 | [link](https://arxiv.org/abs/2412.12722) | [link](https://github.com/tools-only/DPS) | +| CIDER | Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models | EMNLP2024 | [link](https://arxiv.org/abs/2407.21659) | [link](https://github.com/PandragonXIII/CIDER) | +| GuardReasoner-VL | GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning | ICML2025 | [link](https://arxiv.org/abs/2505.11049) | [link](https://github.com/yueliu1999/GuardReasoner-VL) | | Llama-Guard-4 | Llama Guard 4 | Model Card | [link](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-4/) | [link](https://huggingface.co/meta-llama/Llama-Guard-4-12B) | | QGuard | QGuard: Question-based Zero-shot Guard for Multi-modal LLM Safety | arXiv | [link](https://arxiv.org/abs/2506.12299) | [link](https://github.com/taegyeong-lee/QGuard-Question-based-Zero-shot-Guard-for-Multi-modal-LLM-Safety) | -| LlavaGuard | LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models | ICML 2025 | [link](https://arxiv.org/abs/2406.05113) | [link](https://github.com/ml-research/LlavaGuard) | +| LlavaGuard | LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models | ICML2025 | [link](https://arxiv.org/abs/2406.05113) | [link](https://github.com/ml-research/LlavaGuard) | | Llama-Guard-3 | Llama Guard 3 | Model Card | [link](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/) | [link](https://huggingface.co/meta-llama/Llama-Guard-3-8B) | -| HiddenDetect | HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States | ACL 2025 | [link](https://arxiv.org/abs/2502.14744) | [link](https://github.com/leigest519/HiddenDetect) | -| CoCA | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | COLM 2024 | [link](https://arxiv.org/abs/2409.11365) | Null | -| VLGuard | Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models | ICML 2024 | [link](https://arxiv.org/abs/2402.02207) | [link](https://github.com/ys-zong/VLGuard) | +| HiddenDetect | HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States | ACL2025 | [link](https://arxiv.org/abs/2502.14744) | [link](https://github.com/leigest519/HiddenDetect) | +| CoCA | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | COLM2024 | [link](https://arxiv.org/abs/2409.11365) | Null | +| VLGuard | Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models | ICML2024 | [link](https://arxiv.org/abs/2402.02207) | [link](https://github.com/ys-zong/VLGuard) | > More methods are coming soon!!