Update AITG-MOD-07_Testing_for_Goal_Alignment.md

This commit is contained in:
Matteo Meucci
2025-11-14 11:07:30 +01:00
committed by GitHub
parent fbd1277207
commit 2ae88e9fc9

View File

@@ -42,5 +42,5 @@ This test evaluates vulnerabilities associated with AI model goal misalignment,
### References
- Askell, Amanda, et al. "A General Language Assistant as a Laboratory for Alignment." Anthropic, 2021. [Link](https://arxiv.org/abs/2112.00861) (Constitutional AI)
- OWASP Top 10 for LLM Applications 2025 - LLM06: Excessive Agency - OWASP, 2025. [Link](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/))
- OWASP Top 10 for LLM Applications 2025 - LLM06: Excessive Agency - OWASP, 2025. [Link](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/)
- NIST AI 100-2e2025, "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations," Section 4 "Evaluation Alignment and Trustworthiness." NIST, March 2025. [Link](https://doi.org/10.6028/NIST.AI.100-2e2025)