www-project-ai-testing-guide/Document/content/4.3_Appendix_C.md at main

CalvinBackup/www-project-ai-testing-guide

Fork 0

mirror of https://github.com/OWASP/www-project-ai-testing-guide.git synced 2026-02-12 21:52:45 +00:00

Files

Matteo Meucci 8df74cd164 Update and rename 2.2_Appendix_C.md to 4.3_Appendix_C.md

2025-11-13 16:47:36 +01:00

6.8 KiB

Raw Permalink Blame History

4.3 Appendix C: Risk Lifecycle Mapping for Secure AI Systems (Based on SAIF Framework)

This table provides a structured view of how key AI risks identified by the Secure AI Framework (SAIF) manifest across the AI system lifecycle. For each risk, we highlight:

Where the risk is introduced – the layers or components of the system where the root cause originates (e.g., insecure data ingestion, inadequate model training controls).
Where the risk is exposed – the stages or components where the consequences of the risk are likely to be observed (e.g., during model inference or application use).
Where the risk is controlled – the parts of the system where mitigation efforts can be applied to reduce the likelihood or impact of the risk.
How the risk is controlled – specific technical or procedural countermeasures (e.g., input validation, access control, CI/CD validation).

Key Insights:

Risks Originating in Data and Infrastructure Are Foundational - Several risks, such as Data Poisoning, Unauthorized Training Data, and Model Source Tampering, originate and propagate through the Data and Infrastructure layers, reflecting how security gaps early in the pipeline can compromise downstream components. These risks emphasize the need for:
1. Data validation and provenance tracking
2. Secure storage and access controls
3. Rigorous CI/CD and artifact integrity validation
Model Risks Require Both Static and Runtime Controls- Threats like Model Evasion, Model Reverse Engineering, and Sensitive Data Disclosure demonstrate how model behavior can be manipulated or interrogated post-deployment. These require dual layers of defense:
1. Pre-deployment controls such as adversarial training and differential privacy
2. Post-deployment controls including query monitoring, output filtering, and behavior analytics
Application Layer Hosts High-Frequency Attack Vectors -Risks like Prompt Injection, Denial of ML Service, and Insecure Integrated Components expose the application layer as a frequent point of compromise, especially in user-facing LLM interfaces or agent-based architectures. Recommended mitigations focus on:
1. Input sanitization and validation
2. Request throttling and anomaly detection
3. Trust boundaries and plugin governance
Control Implementation Requires Multi-Layer Enforcement- Many risks are controlled across multiple SAIF layers. For example:

Inferred Sensitive Data requires both Model-layer (privacy-preserving techniques) and Data-layer (access limitation) controls.
Model Exfiltration touches the Infrastructure (e.g., through exposed APIs) but also implicates the Model layer when embeddings or weights are accessed. This highlights the necessity of defense-in-depth, where single-point controls are insufficient against complex threat vectors.

Runtime Governance and Observability Are Critical - Across many risks (e.g., Model Reverse Engineering, Rogue Actions, Denial of ML Service), runtime observability and policy enforcement mechanisms—like rate limiting, sandboxing, and output moderation—are central to mitigating emerging AI threats in dynamic environments.

SAIF Risk	Description	Introduced In	Exposed In	Controlled In	How Is Controlled
Data Poisoning	Attackers inject malicious data to influence model behavior or degrade performance.	Data, Infrastructure	Data, Infrastructure	Data, Infrastructure	Data sanitization, validation, access controls, integrity checks during storage and training.
Unauthorized Training Data	Unapproved or low-integrity datasets used in training introduce bias or backdoors.	Data, Infrastructure	Model	Data, Infrastructure	Enforce dataset provenance, auditing, approval workflows, and restricted access.
Model Source Tampering	Modification of model files during storage, retrieval, or versioning operations.	Infrastructure	Infrastructure	Infrastructure	Use signing, integrity verification (hashing), and secure storage for model artifacts.
Excessive Data Handling	Exposing excessive or unnecessary data during model or pipeline execution.	Data, Infrastructure	Model, Infrastructure	Model, Infrastructure	Limit data exposure using minimization principles and role-based access controls.
Model Exfiltration	Extraction of model weights, architecture, or embeddings via direct access or inference.	Infrastructure	Infrastructure	Infrastructure, Model	Apply rate limiting, watermarking, access logs, and isolation of sensitive operations.
Model Deployment Tampering	Tampering with model configuration, routing, or versions during deployment.	Infrastructure	Infrastructure	Infrastructure	Automate CI/CD validation, secure configuration management, immutable deployment tools.
Denial of ML Service	Overloading the model serving layer to degrade availability or response quality.	Application, Model	Application	Application	Rate limiting, request quotas, circuit breakers, anomaly detection on input load.
Model Reverse Engineering	Inferred model behavior or logic via excessive querying or adversarial probes.	Application	Application	Application	Obfuscate outputs, monitor queries, detect extraction patterns, and enforce rate limits.
Insecure Integrated Component	Compromised plugin/tool introduces vulnerabilities or unexpected model behavior.	Application	Application	Application	Enforce plugin trust boundaries, validate APIs, use sandboxing and allowlisting.
Prompt Injection	User-supplied prompts hijack model behavior via embedded adversarial instructions.	Application	Application	Application	Sanitize input, enforce encoding rules, validate context pre/post inference.
Model Evasion	Attackers craft inputs to bypass detection or manipulate model behavior.	Model	Model	Training, Evaluation	Use adversarial training, implement runtime behavior monitoring, enforce input constraints.
Sensitive Data Disclosure	Outputs may inadvertently reveal PII or training data.	Model, Data	Model	Model, Data	Apply output filtering, redaction, enforce data minimization, validate prompt responses.
Inferred Sensitive Data	Attackers infer private data via repeated querying or model inversion.	Model, Data	Model	Model, Data	Rate limit queries, apply differential privacy, monitor unusual probing activity.
Insecure Model Output	Outputs may contain unsafe, toxic, or policy-violating content.	Model	Model	Model	Use content moderation, toxicity detection, post-process responses.
Rogue Actions	Plugins or tools triggered by the model perform unsafe operations.	Application	Application	Application	Sandbox tool use, validate outputs, enforce plugin governance and output filtering.

6.8 KiB Raw Permalink Blame History Unescape Escape

4.3 Appendix C: Risk Lifecycle Mapping for Secure AI Systems (Based on SAIF Framework)

6.8 KiB

Raw Permalink Blame History