Files
2025-11-13 16:47:36 +01:00

6.8 KiB
Raw Permalink Blame History

4.3 Appendix C: Risk Lifecycle Mapping for Secure AI Systems (Based on SAIF Framework)

This table provides a structured view of how key AI risks identified by the Secure AI Framework (SAIF) manifest across the AI system lifecycle. For each risk, we highlight:

  1. Where the risk is introduced the layers or components of the system where the root cause originates (e.g., insecure data ingestion, inadequate model training controls).
  2. Where the risk is exposed the stages or components where the consequences of the risk are likely to be observed (e.g., during model inference or application use).
  3. Where the risk is controlled the parts of the system where mitigation efforts can be applied to reduce the likelihood or impact of the risk.
  4. How the risk is controlled specific technical or procedural countermeasures (e.g., input validation, access control, CI/CD validation).

Key Insights:

  1. Risks Originating in Data and Infrastructure Are Foundational - Several risks, such as Data Poisoning, Unauthorized Training Data, and Model Source Tampering, originate and propagate through the Data and Infrastructure layers, reflecting how security gaps early in the pipeline can compromise downstream components. These risks emphasize the need for:

    1. Data validation and provenance tracking
    2. Secure storage and access controls
    3. Rigorous CI/CD and artifact integrity validation
  2. Model Risks Require Both Static and Runtime Controls- Threats like Model Evasion, Model Reverse Engineering, and Sensitive Data Disclosure demonstrate how model behavior can be manipulated or interrogated post-deployment. These require dual layers of defense:

    1. Pre-deployment controls such as adversarial training and differential privacy
    2. Post-deployment controls including query monitoring, output filtering, and behavior analytics
  3. Application Layer Hosts High-Frequency Attack Vectors -Risks like Prompt Injection, Denial of ML Service, and Insecure Integrated Components expose the application layer as a frequent point of compromise, especially in user-facing LLM interfaces or agent-based architectures. Recommended mitigations focus on:

    1. Input sanitization and validation
    2. Request throttling and anomaly detection
    3. Trust boundaries and plugin governance
  4. Control Implementation Requires Multi-Layer Enforcement- Many risks are controlled across multiple SAIF layers. For example:

  • Inferred Sensitive Data requires both Model-layer (privacy-preserving techniques) and Data-layer (access limitation) controls.
  • Model Exfiltration touches the Infrastructure (e.g., through exposed APIs) but also implicates the Model layer when embeddings or weights are accessed. This highlights the necessity of defense-in-depth, where single-point controls are insufficient against complex threat vectors.
  1. Runtime Governance and Observability Are Critical - Across many risks (e.g., Model Reverse Engineering, Rogue Actions, Denial of ML Service), runtime observability and policy enforcement mechanisms—like rate limiting, sandboxing, and output moderation—are central to mitigating emerging AI threats in dynamic environments.
SAIF Risk Description Introduced In Exposed In Controlled In How Is Controlled
Data Poisoning Attackers inject malicious data to influence model behavior or degrade performance. Data, Infrastructure Data, Infrastructure Data, Infrastructure Data sanitization, validation, access controls, integrity checks during storage and training.
Unauthorized Training Data Unapproved or low-integrity datasets used in training introduce bias or backdoors. Data, Infrastructure Model Data, Infrastructure Enforce dataset provenance, auditing, approval workflows, and restricted access.
Model Source Tampering Modification of model files during storage, retrieval, or versioning operations. Infrastructure Infrastructure Infrastructure Use signing, integrity verification (hashing), and secure storage for model artifacts.
Excessive Data Handling Exposing excessive or unnecessary data during model or pipeline execution. Data, Infrastructure Model, Infrastructure Model, Infrastructure Limit data exposure using minimization principles and role-based access controls.
Model Exfiltration Extraction of model weights, architecture, or embeddings via direct access or inference. Infrastructure Infrastructure Infrastructure, Model Apply rate limiting, watermarking, access logs, and isolation of sensitive operations.
Model Deployment Tampering Tampering with model configuration, routing, or versions during deployment. Infrastructure Infrastructure Infrastructure Automate CI/CD validation, secure configuration management, immutable deployment tools.
Denial of ML Service Overloading the model serving layer to degrade availability or response quality. Application, Model Application Application Rate limiting, request quotas, circuit breakers, anomaly detection on input load.
Model Reverse Engineering Inferred model behavior or logic via excessive querying or adversarial probes. Application Application Application Obfuscate outputs, monitor queries, detect extraction patterns, and enforce rate limits.
Insecure Integrated Component Compromised plugin/tool introduces vulnerabilities or unexpected model behavior. Application Application Application Enforce plugin trust boundaries, validate APIs, use sandboxing and allowlisting.
Prompt Injection User-supplied prompts hijack model behavior via embedded adversarial instructions. Application Application Application Sanitize input, enforce encoding rules, validate context pre/post inference.
Model Evasion Attackers craft inputs to bypass detection or manipulate model behavior. Model Model Training, Evaluation Use adversarial training, implement runtime behavior monitoring, enforce input constraints.
Sensitive Data Disclosure Outputs may inadvertently reveal PII or training data. Model, Data Model Model, Data Apply output filtering, redaction, enforce data minimization, validate prompt responses.
Inferred Sensitive Data Attackers infer private data via repeated querying or model inversion. Model, Data Model Model, Data Rate limit queries, apply differential privacy, monitor unusual probing activity.
Insecure Model Output Outputs may contain unsafe, toxic, or policy-violating content. Model Model Model Use content moderation, toxicity detection, post-process responses.
Rogue Actions Plugins or tools triggered by the model perform unsafe operations. Application Application Application Sandbox tool use, validate outputs, enforce plugin governance and output filtering.