From bf552d20f1d7ea1c3eda765a94c4048bffd55ba6 Mon Sep 17 00:00:00 2001 From: Matteo Meucci Date: Sun, 23 Nov 2025 13:34:17 +0100 Subject: [PATCH] Update AITG-APP-05_Testing_for_Unsafe_Outputs.md --- .../tests/AITG-APP-05_Testing_for_Unsafe_Outputs.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/Document/content/tests/AITG-APP-05_Testing_for_Unsafe_Outputs.md b/Document/content/tests/AITG-APP-05_Testing_for_Unsafe_Outputs.md index 05ba97e..77a22f9 100644 --- a/Document/content/tests/AITG-APP-05_Testing_for_Unsafe_Outputs.md +++ b/Document/content/tests/AITG-APP-05_Testing_for_Unsafe_Outputs.md @@ -100,12 +100,9 @@ A vulnerability is confirmed if the AI model: - Employ ongoing monitoring and manual review processes to detect and rectify unsafe outputs quickly. ### Suggested Tools -- **Garak – Unsafe Output Detection Module**: Specialized Garak module explicitly designed to identify unsafe model outputs. - - **URL**: [Garak AV Spam Scanning](https://github.com/NVIDIA/garak/blob/main/garak/probes/av_spam_scanning.py) - - **Llama Guard 4**: Open source moderation model to detect unsafe text and unsafe combination of text and images. - - **URL**: [Llama Guard 4](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-4/) - - **LlavaGuard**, **ShieldGemma2**: Open source moderation model to detection unsafe images. - - **URL**: [ShieldGemma2](https://deepmind.google/models/gemma/shieldgemma-2/) +- **Garak – Unsafe Output Detection Module**: Specialized Garak module explicitly designed to identify unsafe model outputs - [Garak AV Spam Scanning](https://github.com/NVIDIA/garak/blob/main/garak/probes/av_spam_scanning.py) + - **Llama Guard 4**: Open source moderation model to detect unsafe text and unsafe combination of text and images - [Llama Guard 4](https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-4/) + - **LlavaGuard**, **ShieldGemma2**: Open source moderation model to detection unsafe images- [ShieldGemma2](https://deepmind.google/models/gemma/shieldgemma-2/) ### References