diff --git a/Document/content/4.4_Appendix_D.md b/Document/content/4.4_Appendix_D.md index 988549c..af21ac2 100644 --- a/Document/content/4.4_Appendix_D.md +++ b/Document/content/4.4_Appendix_D.md @@ -124,28 +124,29 @@ This section enumerates key AI-related security threats from OWASP LLM Top 10 an This table presents a detailed correlation between OWASP AI-related threats including the OWASP Top 10 for LLMs (2025) and selected OWASP AI Exchange threats and the Secure AI Framework (SAIF) components they exposed to risk. Each row links a specific threat to a corresponding test name, mapped risk category, and the SAIF architectural components (denoted by component numbers) where the risk is most likely to manifest. -| Threat ID | Threat Name | OWASP ID (Type) | Source | Tests Name | Mapped SAIF Risk | Impacted Component(s) (SAIF\#) | -| :---- | :---- | :---- | :---- | :---- | :---- | :---- | -| T01-DPIJ | Direct Prompt Injection (DPIJ) | LLM01:Prompt Injection (Direct) | OWASP Top 10 For LLM 2025 | Testing for Direct Prompt Injection (DPIJ) | (PIJ) Prompt Injection | **Application (4) :** the application interface receives user input, making it a key vector for prompt injection. **Input Handling (7):** Input handling forwards prompts to the model; without validation, it enables injection. **Modal Usage (9):** If successful, the injection alters model behavior during inference, producing harmful outputs. | -| T01-IPIJ | Indirect Prompt Injection (IPIJ) | LLM01:Prompt Injection (Indirect) | OWASP Top 10 For LLM 2025 | Testing for Indirect Prompt Injection (IPIJ) | (PIJ) Prompt Injection | **Application (4)**: May include external or user-generated content in prompts, enabling hidden injection paths. **Agents/Plugins (5)**: May insert unverified content into prompts via internal extensions. **External Sources (6)**: Supply data to agents; if not trusted or sanitized, they become indirect injection vectors. **Input Handling (7)**: Merges all prompt inputs; lack of filtering enables injection to reach the model **Model Usage (9)** *(if successful)*: Injected content alters model output or behavior | -| T01-AIE | Adversarial Input Evasion (AIE) | Threat 2.1. Evasion | OWASP AI Exchange | Testing for Evasion Attacks | (ME) Model Evasion | **Input Handling(7)**: May fail to detect adversarial inputs and accept and forward potentially adversarial inputs without proper validation. **Model Usage (9)** The model processes adversarial inputs at inference time, potentially leading to misclassification **Evaluation (12**) Responsible for robustness testing and drift detection, insufficient evaluation leads to blind spots for evasion. **Training & Tuning (13):** Mitigation occurs via adversarial training and regularization to resist manipulation. | -| T01-RMP | Runtime Model Poisoning (RMP) | LLM04: Data and Model Poisoning | OWASP Top 10 For LLM 2025 | Testing for Runtime Model Poisoning | (DP) Data Poisoning (Note\#1) | **Model Usage(9)** : Poisoned at runtime (via memory corruption, poisoning adaptive logic) during live inference or adaptive execution, attackers may influence internal states, memory, or adaptive logic at runtime **Data Filtering & Processing (17)** Poisoned or malicious input data streams can be injected here if not properly validated or sanitized, feeding into model decisions or updates **Model Storage Infrastructure** **(15)** *(optional)*if the model supports persistent adaptive behavior, poisoned model states could be written and persisted. | -| T01-DMP | Data Model Poisoning (MP) | LLM04: Data and Model Poisoning (Model) | OWASP Top 10 For LLM 2025 | Testing for Poisoned Training Sets | (DP) Data Poisoning | **Data Sources (6):** Plugins or agents sourcing data from external APIs, user content, or services may introduce poisoned inputs. **Agents/Plugins(5):**May inject or propagate poisoned data/model payloads during runtime, training, or feedback loop interactions. **Model Usage(9)** :Even though the poisoning happens earlier (e.g., in training or storage), the effects of poisoning (e.g., backdoors, biased behavior) can be triggered and become visible at inference time (model usage) **Evaluation**(12): Ineffective robustness or validation processes may fail to detect poisoned behavior **Model Storage Infrastructure (15)** Malicious models or poisoned weights may be stored and deployed if integrity checks (e.g., signatures or hashes) are not enforced. **Data Filtering & Processing (17)** Malicious models or poisoned weights may be stored and deployed if integrity checks (e.g., signatures or hashes) are not enforced. **Data Sources(18):** The original sources of training or fine-tuning data; if tainted, they propagate poisoning into the model. **External Data Sources(19): T**hird-party datasets (e.g., scraped, purchased) often lack integrity guarantees, enabling injection risks. | -| T01-DPFT | Data Model Poisoning during Fine-tuning (DPFT) (Note\#2) | LLM04: Data and Model Poisoning (Model) | OWASP Top 10 For LLM 2025 | Testing for Fine-tuning Poisoning | (DP) Data Poisoning | **Model Usage**(9) Even if poisoning occurred earlier, the malicious effects (e.g., specific trigger phrases or biased output) manifest during inference. **Model Training & Tuning (13),** Primary point of injection: Malicious data used during fine-tuning alters model weights to encode biased, harmful, or backdoored behavior. **Model Storage Infrastructure(10)**: Poisoned models may be persisted and deployed without artifact validation, enabling long-term backdoor access or policy violations. **Data Filtering & Processing (17)** if fine-tuning data is not validated or sanitized, poisoned examples can be injected (e.g., via feedback, logs, unlabeled public data). | -| T01-SCMP | Supply Chain Model Poisoning (SCMP) | LLM03: Supply-Chain | OWASP Top 10 For LLM 2025 | Testing for Supply Chain Tampering | (MST) Model Source Tampering | **Model (9)** Executes the poisoned logic if compromised artifacts are loaded **Model Storage Infrastructure (10)** Poisoned models or tampered weights may be persisted or retrieved without integrity verification. **Model Serving Infrastructure (11)** Compromised models may be injected during the inference phase if serving pipelines are not secured. **Model Training & Tuning (13)** Pre-trained models used during fine-tuning could introduce malicious behaviors from unverified sources. **Model Frameworks & Code (14)** Threat actors may tamper with model libraries or framework codebases (e.g., PyTorch, TensorFlow) **to introduce backdoors. Data Storage Infrastructure (15)** Poisoned data or model artifacts stored insecurely may be used in future training or inference, bypassing controls. | -| T01-SID | Sensitive Information Disclosure (SID) | LLM02: Sensitive Information Disclosure | OWASP Top 10 For LLM 2025 | Testing for Sensitive Data Leak | (SDD) Sensitive Data Disclosure | **Application (4)** LLM output might leak sensitive data (e.g., PII, credentials) via the user-facing app interface. **Agent/plugin(5):** May forward, store, or mishandle sensitive data without appropriate filtering, logging, or access control. **External Sources (6)** May include sensitive or malicious data that gets passed into prompts or outputs without validation. **Input Handling (7)** Inputs may trigger leakage if they reference prior interactions or cached memory (e.g., via embeddings). **Output handling (8)** Prompts may trigger leakage, and outputs may return sensitive or memorized data to the user **Model (9)** The LLM itself may inadvertently disclose memorized or unfiltered training data during inference. **Evaluation (12)** Weaknesses in evaluation pipelines may fail to detect leakage (e.g., prompt red teaming gaps).. **Data Storage Infrastructure (15)** Logs or intermediate data (e.g., chat histories, feedback) might store leaked info if not anonymized**. Training data(16):**Training on raw support tickets leads to leakage of customer PII in LLM output. **Data Filtering & Processing (17)** Inadequate sanitization of training/fine-tuning data leads to downstream leakage risks **Data Sources (18)** Sensitive responses may be stored in logs or telemetry without redaction or access control | -| T01-MIMI | Model Inversion & Membership Inference (MIMI) | Threat 2.3.2-MIMI | OWASP AI Exchange | Testing for Membership Inference | (ISD) Inferred Sensitive Data | **Model (9)** Inference-time attacks extract training data characteristics or individual records from model behavior. **Training Data (16)** The target of inference attacks; if not well protected, allows reconstruction of sensitive records. **Data Filtering & Processing (17)** Lack of anonymization or sanitization during ingestion may increase model susceptibility **External Data Sources (19)** Use of third-party datasets (e.g., scraped data, user submissions) may include unvetted or sensitive content. | -| T01-TDL | Training Data Leakage (TDL) | Threat 3.2 (Sensitive Data Leak Development Time) | OWASP AI Exchange | Testing for Training Data Exposure | (SDD) Sensitive Data Disclosure | **OutPut Handling(8):** Outputs from the model may directly leak training content if not filtered or rate-limited. **Model(9):** The model may return memorized training data during inference, especially if overfit or trained improperly. **Evaluation (12) L**ack of red-teaming, prompt injection fuzzing, or hallucination analysis leads to undetected leakage. **Training Data (16)** The original data used in training is the leakage source, especially if it includes unredacted PII. **Data Filtering & Processing (17) I**nadequate sanitization or anonymization of training data allows sensitive records to be memorized. | -| T01-MTU | Model Theft Through Use (MTU) | Threat 2.4 Model Theft Through Use | OWASP AI Exchange | Testing for Model Extraction | (MRE) Model Reverse Engineering | **Application (4) t**he application interface (e.g., chat UI, API endpoint) is the primary access point attackers use to interact with and probe the model. **Output Handling(8)** Outputs from the model may directly leak training content if not filtered or rate-limited. **Model (9):** The model may return memorized training data during inference, especially if overfit or trained improperly**. Model Serving Infrastructure(11)**Exposed inference APIs and endpoints can be abused to extract model knowledge without rate limiting or auth. **Evaluation(12)** Insufficient testing may fail to detect signs of extraction (e.g., suspicious query patterns or drift)**.** | -| T01-MTR | Model Theft at Runtime (MTR) | Threat 4.3 Direct Runtime Model Theft | OWASP AI Exchange | Testing for Runtime Exfiltration | (MXF)Model Exfiltration | **Model (9)** The model is resident in memory during inference and can be targeted for in-memory extraction or copying. Entry point for interactive model probing and adversarial input (red teaming surface) **Model Storage Infrastructure (10)** If runtime pipelines rely on mounting model files or artifacts from storage, improper access controls can lead to exfiltration. **Model Serving Infrastructure(11)** If inference pipelines (e.g., containers, APIs, cloud functions) are insecure, attackers may gain access to model internals. | -| T01-MTD | Model Theft during Development (MTD) | Threat 3.2.2 Model Theft Through Development \-Time Model Parameter | OWASP AI Exchange | Testing for Dev-Time Model Theft | (MST) Model Source Tampering | **External Sources (6)** Third-party integrations (e.g., open-source models or cloud notebooks) may be a vector for model theft. **Agents/Plugins(5):**May act as vectors during dev-time for reverse engineering or unauthorized access to model behavior and parameters. **Model Storage Infrastructure (10)** Dev-time models stored in artifact registries or file systems may be exfiltrated if not secured properly. **Model Training & Tuning(13)** Core area where theft occurs; stolen models can include configurations, weights, or tuned architectures. **Model Frameworks & Code(14)** Source code (e.g., PyTorch, TensorFlow scripts) can be tampered with or copied. **Data Storage Infrastructure(15)** Intermediate training data or logs stored during development may reveal model internals. | -| T01-DoSM | Denial of Service of Model Services (DoSM) | LLM10:Unbounded Consumption | OWASP Top 10 For LLM 2025 | Testing for Resource Exhaustion | (DMS) Denial of ML Service | **Application (4) T**he primary entry point where users can submit prompts or requests attackers may flood the system via this surface. **Input Handling (7)** Can be overwhelmed with excessive or malformed inputs, particularly large prompts or recursive queries **Output Handling(8)** Responses to large input payloads may lead to large outputs, consuming additional memory and bandwidth **Model Usage(9).** The model itself consumes compute resources during inference. Resource-intensive queries can degrade or halt service. **Model Serving Infrastructure(11)** Hosting inference endpoints can be targeted with high-frequency or complex queries to cause compute/latency bottlenecks. **Evaluation (12)** May fail to detect abuse patterns (e.g., prompt bombs, recursive inputs) during testing, allowing DoSM vectors to persist. | -| T01-LSID | Leak Sensitive Input Data (LSID) | LLM02:Sensitive Information Disclosure | OWASP Top 10 For LLM 2025 | Testing for Input Leakage | (SDD) Sensitive Data Disclosure | **Input Handling (7)** Accepts and processes user input. If not properly scoped (e.g., within a session), sensitive input may be cached and later leaked. **Output Handling (8)** Outputs may inadvertently reflect prior user inputs if the system lacks clear memory boundaries or context separation. **Model Usage (9)** The model may memorize or reference prior inputs during inference, especially if configured with long context memory or chat history features. **Evaluation(12)** Failure to test for prompt injections or prompt history leakage can result in undetected sensitive data exposure. **Data Storage Infrastructure (15)** Logs or histories storing raw inputs (e.g., telemetry, chat logs) may be accessed or leaked without redaction or anonymization. | -| T01-IOH | Improper Output Handling (IOH) | LLM05:Improper Output Handling | OWASP Top 10 For LLM 2025 | Testing for Unsafe Outputs | (IMO) Insecure Model Output | **Application(4)** If unsafe outputs are directly rendered in the app (e.g., UI/API), and no user-facing safeguards (rate-limiting, warnings) exist, the application layer becomes a risk surface. **Output Handling(8)** Core point where model responses are post-processed before user delivery. Weak sanitization or filters here can result in unsafe outputs. **Model Usage(9)** The model may generate unvalidated or harmful content at inference time if not aligned with safety controls**. Evaluation (12)** Lacking evaluation (e.g., hallucination detection, safety benchmarks, bias testing) fails to catch unsafe outputs before deployment. | -| T01-EA | Excessive Agency (EA) | LLM06:Excessive Agency | OWASP Top 10 For LLM 2025 | Testing for Agentic Behavior Limits | (RO) Rogue Actions | **Application (4)** The core logic that manages tool and agent invocations may not enforce strong boundaries or limitations. **Agents/Plugins(5)** Third-party or in-house plugins may invoke system actions (file write, API call, etc.) without restriction. **External Sources (6)** If agents fetch data or execute actions based on untrusted external input, it may trigger rogue or unbounded behavior. **Output Handling(8)** Autonomous models might generate directives that cause the agent to act beyond scope. **Model Usage(9)** Autonomous models might generate directives that cause the agent to act beyond scope. | -| T01-SPL | System Prompt Leakage (SPL) | LLM07:System Prompt Leakage | OWASP Top 10 For LLM 2025 | Testing for System Prompt Leakage | (SDD) Sensitive Data Disclosure | **Application (4)** The system prompt is typically embedded within the application logic (e.g., chat UI or agent framework); if mishandled or exposed via the interface, it can leak. **Input Handling(7)** If malicious prompts can extract system-level instructions, this layer enables the injection or probing**. Output Handling(8)** If outputs are not filtered, system prompts may be reflected or leaked to users. **Model (9)** The model’s behavior is influenced by system prompts; leakage may occur via unintended completions. | -| T01-VEW | Vector & Embedding Weaknesses (VEW) | LLM08:Vector and Embedding Weaknesses . | OWASP Top 10 For LLM 2025 | Testing for Embedding Manipulation | (PJ) Prompt Injection (MST) Model Source Tampering | **Agents/Plugins(5):**Plugins that interface with embedding models (e.g., RAG pipelines, semantic search, context injection) may alter vector inputs dynamically by dynamically injecting, altering, or retrieving poisoned vectors. **External Sources(6)** Embedding poisoning often originates from untrusted external data (e.g., web content, scraped data). **Input Handling(7)** User inputs may be transformed into embeddings, introducing poisoned or manipulated vectors. **Model Usage(9)** The model relies on embedding representations; poisoned vectors can alter inference behavior. **Model Frameworks & Code(14)** Embedding layers and vector DB integrations can be vulnerable to manipulation or unvalidated input. **Data Filtering & Processing(17)** Lack of sanitization during vector generation or ingestion can lead to embedding-based vulnerabilities. **Data Source(18)** Pre-trained embeddings or third-party vector databases may contain malicious or biased vector entries. | -| T01-MIS | Misinformation (MIS) | LLM09:Misinformation | OWASP Top 10 For LLM 2025 | Testing for Harmful Content Bias | (IMO) Insecure Model Output | **Application (4):**The application is the delivery channel through which misinformation or biased content reaches users (e.g., chat UI, dashboards, APIs). **Output Handling(8):** Output filters, post-processing, and content moderation may fail to catch or redact misleading or biased content. **Model Usage(9):** The LLM itself may generate misinformation due to biased training data, misaligned objectives, or manipulated inference conditions. **Evaluation(12):**Inadequate evaluation (e.g., missing hallucination tests, lack of diversity metrics) may fail to detect and mitigate content bias or misinformation risks. **Model Training & Tuning(\!3);** Bias can be introduced through skewed or unbalanced training datasets or during fine-tuning phases, reinforcing misinformation in outputs. **Data Filtering and Processing(17):**Poor pre-processing, tokenization, or curation of training data can fail to remove false, misleading, or biased examples**. Data Sources(18):** Misinformation may originate from unreliable or low-quality data sources used for model training. **External Data Sources(19):T**hird-party sources (e.g., scraped web content, user-generated input) can introduce biased or false narratives into training or inference stages. | +| Threat ID | Tests Name | Mapped SAIF Risk | Impacted Component(s) (SAIF#) | +| :---- | :---- | :---- | :---- | +| T01-DPIJ | Testing for Direct Prompt Injection (DPIJ) | (PIJ) Prompt Injection | **Application (4):** receives user input → injection vector. **Input Handling (7):** forwards prompts unsafely. **Model Usage (9):** injection alters inference behavior. | +| T01-IPIJ | Testing for Indirect Prompt Injection (IPIJ) | (PIJ) Prompt Injection | **Application (4):** includes external/user content. **Agents/Plugins (5):** inject unverified content. **External Sources (6):** indirect vectors. **Input Handling (7):** merges inputs blindly. **Model Usage (9):** injected content alters output. | +| T01-AIE | Testing for Evasion Attacks | (ME) Model Evasion | **Input Handling (7):** accepts adversarial inputs. **Model Usage (9):** misclassification. **Evaluation (12):** weak robustness tests. **Training & Tuning (13):** mitigated by adversarial training. | +| T01-RMP | Testing for Runtime Model Poisoning | (DP) Data Poisoning | **Model Usage (9):** runtime state corruption. **Data Filtering (17):** malicious streams injected. **Model Storage (15):** persistent poisoning in adaptive models. | +| T01-DMP | Testing for Poisoned Training Sets | (DP) Data Poisoning | **Data Sources (6/18/19):** poisoned inputs. **Agents/Plugins (5):** propagate poisoned payloads. **Model Usage (9):** backdoors visible at inference. **Evaluation (12):** undetected poisoning. **Model Storage (15).** | +| T01-DPFT | Testing for Fine-tuning Poisoning | (DP) Data Poisoning | **Model Usage (9)** poisoned behavior emerges. **Training & Tuning (13):** primary injection point. **Model Storage (10):** poisoned models persisted. **Data Filtering (17):** unvalidated fine-tuning sets. | +| T01-SCMP | Testing for Supply Chain Tampering | (MST) Model Source Tampering | **Model (9):** executes tampered logic. **Model Storage (10):** poisoned artifacts. **Serving Infra (11):** tampered models loaded. **Training (13):** compromised base models. **Frameworks/Code (14).** | +| T01-SID | Testing for Sensitive Data Leak | (SDD) Sensitive Data Disclosure | **Application (4):** leaks via outputs. **Agents (5):** mishandle data. **External Sources (6):** inject sensitive content. **Input (7):** triggers leakage. **Output (8).** **Model (9).** **Evaluation (12):** misses leakage. **Data Storage (15–18).** | +| T01-MIMI | Testing for Membership Inference | (ISD) Inferred Sensitive Data | **Model (9):** enables reconstruction. **Training Data (16):** target of inference. **Filtering (17):** poor anonymization. **External Sources (19).** | +| T01-TDL | Testing for Training Data Exposure | (SDD) Sensitive Data Disclosure | **Output (8):** direct leaks. **Model (9):** memorized data returned. **Evaluation (12):** misses it. **Training Data (16).** **Filtering (17).** | +| T01-MTU | Testing for Model Extraction | (MRE) Model Reverse Engineering | **Application (4):** probing surface. **Output (8):** leaks features. **Model (9):** overfit leaks. **Serving Infra (11):** exposed endpoints. **Evaluation (12).** | +| T01-MTR | Testing for Runtime Exfiltration | (MXF) Model Exfiltration | **Model (9):** in-memory theft. **Storage (10):** insecure artifacts. **Serving (11):** compromised inference pipelines. | +| T01-MTD | Testing for Dev-Time Model Theft | (MST) Model Source Tampering | **External Sources (6):** unsafe integrations. **Plugins (5):** dev-time theft vector. **Storage (10):** dev models exposed. **Training (13):** theft of configs/weights. **Frameworks (14):** tampering. | +| T01-DoSM | Testing for Resource Exhaustion | (DMS) Denial of ML Service | **Application (4):** flooding. **Input (7):** oversized queries. **Output (8):** heavy payloads. **Model (9):** compute exhaustion. **Serving (11):** bottlenecks. **Evaluation (12).** | +| T01-LSID | Testing for Input Leakage | (SDD) Sensitive Data Disclosure | **Input (7):** cached input leaked. **Output (8):** reflects prior sessions. **Model (9):** memorization. **Evaluation (12).** **Data Storage (15).** | +| T01-IOH | Testing for Unsafe Outputs | (IMO) Insecure Model Output | **Application (4):** unsafe rendering. **Output (8):** weak filters. **Model (9):** harmful outputs. **Evaluation (12):** poor testing. | +| T01-EA | Testing for Agentic Behavior Limits | (RO) Rogue Actions | **Application (4):** poor boundaries. **Agents (5):** unrestricted actions. **External Sources (6):** trigger rogue behavior. **Output (8):** unsafe directives. **Model (9).** | +| T01-SPL | Testing for System Prompt Leakage | (SDD) Sensitive Data Disclosure | **Application (4):** mishandles system prompts. **Input (7):** extraction vectors. **Output (8):** reflects prompts. **Model (9).** | +| T01-VEW | Testing for Embedding Manipulation | (PJ/MST) Prompt Injection / Model Source Tampering | **Plugins (5):** vector manipulation. **External Sources (6):** poisoned data. **Input (7):** unsafe embeddings. **Model (9):** poisoned vectors. **Frameworks (14):** vulnerable. **Filtering (17):** weak sanitization. **Data Source (18).** | +| T01-MIS | Testing for Harmful Content Bias | (IMO) Insecure Model Output | **Application (4):** misinformation delivery. **Output (8):** weak moderation. **Model (9):** biased generation. **Evaluation (12):** missing hallucination tests. **Training (13):** biased sources. **Filtering (17).** **Sources (18–19).** | + *Note (1) Runtime Model Poisoning (RMP) and not general data poisoning during training so we’ll focus solely on runtime-impact components involved in model use, mutable memory, adaptive updates, or live data feedback loops and not general data poisoning during training (e.g. SAIF components related to training (SAIF \#13), evaluation (SAIF \#12), and initial data ingestion pipelines)*