6.8 KiB
4.3 Appendix C: Risk Lifecycle Mapping for Secure AI Systems (Based on SAIF Framework)
This table provides a structured view of how key AI risks identified by the Secure AI Framework (SAIF) manifest across the AI system lifecycle. For each risk, we highlight:
- Where the risk is introduced – the layers or components of the system where the root cause originates (e.g., insecure data ingestion, inadequate model training controls).
- Where the risk is exposed – the stages or components where the consequences of the risk are likely to be observed (e.g., during model inference or application use).
- Where the risk is controlled – the parts of the system where mitigation efforts can be applied to reduce the likelihood or impact of the risk.
- How the risk is controlled – specific technical or procedural countermeasures (e.g., input validation, access control, CI/CD validation).
Key Insights:
-
Risks Originating in Data and Infrastructure Are Foundational - Several risks, such as Data Poisoning, Unauthorized Training Data, and Model Source Tampering, originate and propagate through the Data and Infrastructure layers, reflecting how security gaps early in the pipeline can compromise downstream components. These risks emphasize the need for:
- Data validation and provenance tracking
- Secure storage and access controls
- Rigorous CI/CD and artifact integrity validation
-
Model Risks Require Both Static and Runtime Controls- Threats like Model Evasion, Model Reverse Engineering, and Sensitive Data Disclosure demonstrate how model behavior can be manipulated or interrogated post-deployment. These require dual layers of defense:
- Pre-deployment controls such as adversarial training and differential privacy
- Post-deployment controls including query monitoring, output filtering, and behavior analytics
-
Application Layer Hosts High-Frequency Attack Vectors -Risks like Prompt Injection, Denial of ML Service, and Insecure Integrated Components expose the application layer as a frequent point of compromise, especially in user-facing LLM interfaces or agent-based architectures. Recommended mitigations focus on:
- Input sanitization and validation
- Request throttling and anomaly detection
- Trust boundaries and plugin governance
-
Control Implementation Requires Multi-Layer Enforcement- Many risks are controlled across multiple SAIF layers. For example:
- Inferred Sensitive Data requires both Model-layer (privacy-preserving techniques) and Data-layer (access limitation) controls.
- Model Exfiltration touches the Infrastructure (e.g., through exposed APIs) but also implicates the Model layer when embeddings or weights are accessed. This highlights the necessity of defense-in-depth, where single-point controls are insufficient against complex threat vectors.
- Runtime Governance and Observability Are Critical - Across many risks (e.g., Model Reverse Engineering, Rogue Actions, Denial of ML Service), runtime observability and policy enforcement mechanisms—like rate limiting, sandboxing, and output moderation—are central to mitigating emerging AI threats in dynamic environments.
| SAIF Risk | Description | Introduced In | Exposed In | Controlled In | How Is Controlled |
|---|---|---|---|---|---|
| Data Poisoning | Attackers inject malicious data to influence model behavior or degrade performance. | Data, Infrastructure | Data, Infrastructure | Data, Infrastructure | Data sanitization, validation, access controls, integrity checks during storage and training. |
| Unauthorized Training Data | Unapproved or low-integrity datasets used in training introduce bias or backdoors. | Data, Infrastructure | Model | Data, Infrastructure | Enforce dataset provenance, auditing, approval workflows, and restricted access. |
| Model Source Tampering | Modification of model files during storage, retrieval, or versioning operations. | Infrastructure | Infrastructure | Infrastructure | Use signing, integrity verification (hashing), and secure storage for model artifacts. |
| Excessive Data Handling | Exposing excessive or unnecessary data during model or pipeline execution. | Data, Infrastructure | Model, Infrastructure | Model, Infrastructure | Limit data exposure using minimization principles and role-based access controls. |
| Model Exfiltration | Extraction of model weights, architecture, or embeddings via direct access or inference. | Infrastructure | Infrastructure | Infrastructure, Model | Apply rate limiting, watermarking, access logs, and isolation of sensitive operations. |
| Model Deployment Tampering | Tampering with model configuration, routing, or versions during deployment. | Infrastructure | Infrastructure | Infrastructure | Automate CI/CD validation, secure configuration management, immutable deployment tools. |
| Denial of ML Service | Overloading the model serving layer to degrade availability or response quality. | Application, Model | Application | Application | Rate limiting, request quotas, circuit breakers, anomaly detection on input load. |
| Model Reverse Engineering | Inferred model behavior or logic via excessive querying or adversarial probes. | Application | Application | Application | Obfuscate outputs, monitor queries, detect extraction patterns, and enforce rate limits. |
| Insecure Integrated Component | Compromised plugin/tool introduces vulnerabilities or unexpected model behavior. | Application | Application | Application | Enforce plugin trust boundaries, validate APIs, use sandboxing and allowlisting. |
| Prompt Injection | User-supplied prompts hijack model behavior via embedded adversarial instructions. | Application | Application | Application | Sanitize input, enforce encoding rules, validate context pre/post inference. |
| Model Evasion | Attackers craft inputs to bypass detection or manipulate model behavior. | Model | Model | Training, Evaluation | Use adversarial training, implement runtime behavior monitoring, enforce input constraints. |
| Sensitive Data Disclosure | Outputs may inadvertently reveal PII or training data. | Model, Data | Model | Model, Data | Apply output filtering, redaction, enforce data minimization, validate prompt responses. |
| Inferred Sensitive Data | Attackers infer private data via repeated querying or model inversion. | Model, Data | Model | Model, Data | Rate limit queries, apply differential privacy, monitor unusual probing activity. |
| Insecure Model Output | Outputs may contain unsafe, toxic, or policy-violating content. | Model | Model | Model | Use content moderation, toxicity detection, post-process responses. |
| Rogue Actions | Plugins or tools triggered by the model perform unsafe operations. | Application | Application | Application | Sandbox tool use, validate outputs, enforce plugin governance and output filtering. |