Files
www-project-ai-testing-guide/Document/content/1.2_Principles_of_AI_Testing.md
Matteo Meucci 64059cf00f Add OWASP AI Testing principles and lifecycle phases
This document outlines the principles of OWASP AI Testing, detailing four macro domains: Security, Privacy, Responsible AI, and Trustworthy AI Systems. It also describes the phases of the AI system lifecycle and the importance of integrating testing throughout.
2025-10-30 18:01:13 +01:00

8.8 KiB
Raw Blame History

1.2 Principles of OWASP AI Testing

Effective AI testing is built upon four macro domains: Security, Privacy, Responsible AI, and Trustworthy AI Systems. We chose these four core domains because they collectively address the full range of AI risks. Security ensures resilience against adversarial and infrastructure threats. Privacy prevents unintended data exposure and inference attacks. Responsible AI focuses on ethical behavior and fairness, guarding against bias and misuse. Trustworthy AI Systems maintain ongoing confidence through explainability, stability, and governance alignment. Together, they form a comprehensive framework for validating, controlling, and sustaining safe and reliable AI deployments. Each domain includes key principles that guide the evaluation of modern AI applications.

1. Security

AI systems must be resilient to adversarial threats and systemic exploitation. This includes not just model robustness but also the security of the full stack.

  • Prompt & Input Control: Ensure system prompts, instructions, and user inputs are protected from injection or manipulation.
  • Adversarial Robustness: Validate the system's resistance to evasion, poisoning, model theft, jailbreaks, and indirect prompt injections.
  • Infrastructure Security: Evaluate API endpoints, plugins, RAG pipelines, and agentic workflows for vulnerabilities.
  • Supply Chain Risk: Test models and dependencies for poisoning, unauthorized tampering, or third-party compromise.

2. Privacy

Ensure confidentiality and control over sensitive data exposed to or generated by AI systems.

  • Data Leakage Prevention: Test against unintended disclosure of training data, private context, or user inputs.
  • Membership & Property Inference Resistance: Assess model exposure to privacy attacks that infer if specific data was used in training.
  • Model Extraction & Exfiltration: Simulate attacks that try to copy or replicate proprietary models.

3. Responsible AI

Promote safe, ethical, and aligned outcomes through ongoing evaluation and mitigation strategies.

  • Bias & Fairness Audits: Identify discriminatory outputs and test model behavior across diverse demographic groups and edge cases.
  • Toxicity & Abuse Detection: Validate how models handle hate speech, misinformation, and harmful outputs.
  • Safety Alignment: Evaluate the systems response to alignment bypass attacks (e.g., DAN, roleplay exploits).
  • Guardrail Coverage: Test safety filters, refusal behaviors, and abuse-prevention mechanisms.

4. Trustworthy AI Systems

Support long-term confidence through transparency, monitoring, and governance.

  • Explainability: Ensure users and auditors can understand how and why decisions are made.
  • Consistency & Stability: Test models for response variance, regressions, and unexpected behavior under slight prompt changes.
  • Continuous Monitoring: Apply post-deployment observability, drift detection, and incident alerting.
  • Policy & Regulatory Alignment: Ensure testing processes and system behaviors comply with frameworks like NIST AI RMF [1], ISO 42001 [2], and OWASP Top 10 LLM [3].

When to Test AI

ISO/IEC 23053 [4] structures the ML-based AI system lifecycle into a series of repeatable phases, each with clear objectives, artifacts, and governance touchpoints:

  1. Planning & Scoping: In this phase, you establish clear business objectives, success metrics, and ML use cases while identifying key stakeholders, regulatory requirements, and the organizations risk tolerance.
  2. Data Preparation: In this phase, you gather and document raw data sources, conduct profiling and quality checks through preprocessing pipelines, and implement versioning and lineage tracking for full data traceability.
  3. Model Development & Training: In this phase, you choose appropriate algorithms and architectures, train models on curated datasets with feature engineering, and record experiments, including the parameters that govern the learning process (i.e hyperparameters) and performance metrics in a model registry.
  4. Validation & Evaluation: in this phase, you test models using reserved and adversarial datasets, perform fairness, robustness, and security evaluations, and ensure they meet functional, ethical, and regulatory standards.
  5. Deployment & Integration: in this phase, you are preparing and bundling your trained AI model into a deployable artifact for either service (i.e. wrap the model in a microservice or API) or edge deployment (i.e. convert and optimize the model for resource-constrained devices such as IoT gateways or mobile phones) automate build-test-release workflows via CI/CD, and verify infrastructure security measures
  6. Operation & Maintenance: in this phase while the AI product is in production environment, you will continuously monitor performance, data drift, and audit logs, triggering alerts on anomalies or compliance breaches, while periodically retraining models with fresh data, re-validating security, privacy, and fairness controls, and updating documentation, training, and policies as needed.

AI testing should be integrated throughout the entire AI system lifecycle to ensure AI systems remain accurate, secure, fair, and trustworthy from inception through ongoing operation:

  1. Planning & Scoping Phase: Confirm that business objectives, success metrics, and ML use cases are testable and traceable. Identify AI-specific risks (adversarial, privacy, compliance) and map them to controls. Verify stakeholder roles, regulatory constraints, and risk-tolerance criteria are documented.
  2. Data Preparation: Perform data quality tests to check for missing values, outliers, schema mismatches, and duplicates.Validate feature distributions (i.e. how the values of a particular variable are spread out or arranged) against historical profiles to set drift thresholds (i.e. for data drifts from this baseline). Ensure every data source, transformation, and version is recorded and traceable.
  3. Model Development & Training: Validate preprocessing code, custom layers, and feature engineering functions behave as expected. Run static code scans (e.g. SAST) on model code for insecure dependencies or misconfigurations. Confirm no data leakage between training, validation, and test splits. Ensure tuning changes improve generalization without regressions.
  4. Validation & Evaluation: Validate performance against benchmarks to measure accuracy, precision/recall, AUC, etc., on hold-out and adversarial test sets. Conduct fairness & bias audits to evaluate model outputs across demographic slices and edge cases. Conduct adversarial robustness tests by applying well-known techniques for crafting adversarial examples against neural networks or other adversarial attacks to assess resistance. Conduct privacy attacks to simulate membership inference, model extraction, and poisoning to confirm privacy protections. Verify model decisions are interpretable and valid by attributing predictions back to input features.
  5. Operation & Maintenance: Conduct regression tests for drift detection by Continuously comparing production inputs and outputs to validation baselines. Verify monitoring rules fire correctly on performance dips, data drift, or security anomalies.Re-evaluate performance, fairness, and robustness after model updates or data refreshes. Periodically confirm that security, privacy, and ethical controls remain effective and documented.

Among the testing goals of this guide is to integrate OWASPs LLMspecific test cases and broader OWASP AI Exchange [5] threats into your lifecycle phases to ensure both prerelease validation and continuous protection against emerging vulnerabilities. During planning and scoping phase for example threat modeling exercises can be used to enumerate OWASP Top 10 LLM risks (prompt injection, data leakage, model poisoning, overreliance, etc.) and AI Exchange threats to define your test scope and controls.

During the Validation & Evaluation phase for example, prompt injection tests can test direct and indirect prompt manipulations to verify guardrail coverage and refusal behaviors and Inject malicious samples in a controlled retraining loop to ensure poisoning defenses work, During development & operation tests can be directed to continuously scan newly installed or updated plugins for OWASPidentified weaknesses and to monitor outputs for signs of jailbreaks, backdoor prompts, or exploitation of known OWASP AI Exchange threat vectors.

In this initial release, the OWASP AI testing methodology is focused on guiding AI product owners to define the test scope and execute a comprehensive suite of assessments once an initial AI product version is test-ready; future updates will expand this guidance to cover earlier pre-production phases as well.