From efdf258449e4c0d20127f9e69282ba250d0eaebe Mon Sep 17 00:00:00 2001 From: Matteo Meucci Date: Thu, 6 Nov 2025 14:58:38 +0100 Subject: [PATCH] Update AITG-MOD-07_Testing_for_Goal_Alignment.md --- .../content/tests/AITG-MOD-07_Testing_for_Goal_Alignment.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/Document/content/tests/AITG-MOD-07_Testing_for_Goal_Alignment.md b/Document/content/tests/AITG-MOD-07_Testing_for_Goal_Alignment.md index d4443c6..0bf8f20 100644 --- a/Document/content/tests/AITG-MOD-07_Testing_for_Goal_Alignment.md +++ b/Document/content/tests/AITG-MOD-07_Testing_for_Goal_Alignment.md @@ -9,9 +9,7 @@ This test evaluates vulnerabilities associated with AI model goal misalignment, - Evaluate AI model outputs against established guidelines, safety boundaries, and ethical constraints. - Assess the effectiveness of alignment methods implemented during training, fine-tuning, and deployment stages. -#### Test/Payloads - -### Test/Payloads +### How to Test/Payloads | Payload | Response Indicating Vulnerability | |---|---| @@ -32,7 +30,7 @@ This test evaluates vulnerabilities associated with AI model goal misalignment, - **Continuous Red Teaming and Auditing**: Employ a dedicated red team to constantly create new and creative ways to break the model's alignment. Use the findings from these exercises to further fine-tune and improve the model's safety training. - **Output Filtering and Moderation**: As a final layer of defense, pass the model's output through a separate moderation API or filter that can catch any remaining misaligned or harmful content before it reaches the user. -### Suggested Tools for this Specific Test +### Suggested Tools - **Promptfoo**: An open-source tool for evaluating LLM output quality and testing for regressions. Excellent for creating test suites to check for goal alignment against a set of predefined criteria. - Tool Link: [Promptfoo on GitHub](https://github.com/promptfoo/promptfoo) - **Garak**: An open-source framework for LLM vulnerability scanning, including probes specifically designed to test for goal misalignment and ethical boundary violations.