Merge pull request #38 from mmorana1/patch-11

This commit is contained in:
Matteo Meucci
2025-10-15 07:43:28 +02:00
committed by GitHub
+615
View File
@@ -0,0 +1,615 @@
# Appendix E: SAIF AI Threat Targeted Components & CVEs/CWEs
The purpose of this appendix is to provide pen testers with a detailed tech stack of the targeted SAIF component and sub-components of the AI architecture, whether it be user inputs, the model layer, supporting infrastructure, or data sources and CVEs/CWEs that can be exploited by the AI threats targeting these components and sub-components. During test scoping, the tester determines which components and sub-components of the AI application are in scope for testing. For example, user input through a Slack bot connected to a FastAPI backend may be included, while certain external APIs might be excluded. The tester then can cross-reference the **Tech Stack** column of the mapping table to understand exactly which frameworks and services are deployed, such as FastAPI, Redis, or Pinecone, and uses this knowledge to conduct specific tech-stack aware pen tests. This is a step by step process.
The first step is **Threat enumeration and CVE exploit path mapping**. The CVE column provides known vulnerabilities that can be validated with scanners (like Nessus or Nuclei) or manual proof-of-concept scripts. For instance, Redis in the data storage layer may expose `CVE-2022-0543` (Lua sandbox escape), which could be exploited to poison embeddings and trigger runtime data poisoning (`T01-RMP`). Similarly, an outdated Confluence instance may expose `CVE-2021-22911`, leading to sensitive information disclosure (`T01-SID`) if training data leaks from a knowledge base. By tying each CVE to an AI-specific threat, the pen tester demonstrates not only a technical flaw but also its effect on model behavior and trustworthiness.
Once vulnerabilities are identified, they are mapped to AI-specific threats using the **AI Threats** column. This is where the table delivers unique value: it bridges traditional software flaws with AI-centric risks. For example, FastAPI sanitization weaknesses (`CVE-2022-36067`) might appear to be routine web vulnerabilities, but in the context of an LLM they translate to `T01-DPJI` (direct prompt injection). Similarly, weaknesses in an Airflow ETL pipeline (`CVE-2022-40127`) could be exploited not just for remote code execution but for `T01-DMP` (data poisoning), corrupting training or retrieval data. This mapping ensures testers move beyond “server RCE” reports and instead demonstrate AI model compromise impacts.
With this foundation, the tester moves into a **systematic execution strategy**. For each SAIF component, they review the subcomponents to see where injection, poisoning, or manipulation is possible. They verify which technologies are actually deployed and run tests to identify CVEs that can be exploited due to vulnerable and un-patched libraries/components. They then simulate attack driven tests for AI-specific threats such as prompt injection, model inversion, membership inference, poisoning attacks, or runtime denial of service. For example, in the case of data storage infrastructure (SAIF component 15), a vulnerable CVE in Weaviate instance could be targeted. A known plugin path traversal vulnerability (`CVE-2023-41267`) may allow the attacker to inject poisoned vector entries, resulting in `T01-RMP` runtime data poisoning where the chatbot retrieves manipulated facts.
Reporting leverages the tables structure to maintain **traceability**. A finding might read: “Redis vulnerable to `CVE-2022-0543`,” which maps to `CWE-94` (code injection) and aligns with AI Threat `T01-RMP` (runtime data poisoning). The impact statement would explain that this weakness allows the chatbot to output attacker-controlled responses. This creates a clear chain from vulnerability to exploit to AI-specific risk, making the report resonate with both security engineers and AI/ML practitioners.
The second step is to conduct a **Threat enumeration and CWE exploit path mapping**. The CWE-based table adds another layer by framing these vulnerabilities/findings as **design weaknesses**, not just CVEs that need patching. For example, `CWE-20` (improper input validation) points to weak parsing logic, `CWE-276` (incorrect default permissions) highlights misconfigurations in data storage or S3 buckets, and `CWE-345` (insufficient verification of data authenticity) shows systemic flaws in RAG ingestion.
Finally, the third step is to look at **AI Threats, Targeted CWEs and Provide Recommendations to Fix Them** in the Pen Testing Report. CWEs being targeted by a threat needs to be accompanied by secure design recommendations, such as enforcing schema validation, disabling default public access, verifying dataset authenticity, or encrypting sensitive data. This means pen testers can move from “here is how I broke it” to “here is how you should redesign it to prevent recurrence.”
As pen testers revisit systems, they can update the CVE and CWE of newly discovered vulnerabilities and use the AI Threats column as a checklist for attack simulations in future red-team exercises. Over time, this evolving matrix becomes a living test harness — a fusion of exploit paths, systemic weaknesses, AI threats, and design-level fixes — that supports secure design, ongoing validation, and resilience in AI-enabled systems.
## AI Threat enumeration and CVE exploit path mapping
> In this section we provide a mapping of SAIF components to AI threats and examples of component dependent tech-stack CVEs that can be exploited
| SAIF Component (Number) | Sub-Components | Tech Stack (Chatbot + RAG) | Mapped Threats | Example CVEs in Tech Stack |
|--------------------------|----------------|-----------------------------|----------------|----------------------------|
| (2) User Input | Text, voice, multimodal parsers | React/Next.js, Slack SDK, Teams Bot, Twilio, Whisper/ASR, FastAPI/Pydantic | T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU | React XSS (CVE-2021-24033); FastAPI vuln (CVE-2023-27533); Twilio SDK (CVE-2022-36449) |
| (3) User Output | Renderers, formatting, TTS/visual output | React chat widgets, Slack/Teams cards, Polly/ElevenLabs, Markdown renderers | T01-EA, T01-SPL, T01-MIS, T01-IOH | Slack API auth bypass (CVE-2020-10753); Markdown injection (CVE-2022-21681) |
| (4) Application | Orchestration, session mgmt, APIs, business logic | LangChain, LlamaIndex, Semantic Kernel, FastAPI/Flask, Redis sessions, GraphQL APIs | T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS | Flask template injection (CVE-2019-8341); Redis RCE (CVE-2022-0543); GraphQL DoS (CVE-2020-15159) |
| (5) Agent/Plugin | Connectors, plugin registry, tool adapters | LangGraph Agents, OpenAI Functions, Zapier/n8n, custom OpenAPI tools | T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW | n8n RCE (CVE-2023-37925); OpenAPI tooling parser injection (CVE-2021-32640) |
| (6) External Sources (App) | APIs, SaaS services, enterprise connectors | Salesforce, ServiceNow, Confluence, SharePoint APIs | T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP | Confluence RCE (CVE-2023-22515); SharePoint RCE (CVE-2023-29357) |
| (7) Input Handling | Validation, sanitization, PII detection, scanning | Pydantic, JSON Schema, Presidio, ClamAV | T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW | ClamAV RCE (CVE-2023-20032); JSON Schema validator injection (GitHub advisories) |
| (8) Output Handling | Filters, moderation, redaction, grounding checks | Guardrails.ai, OpenAI Moderation, NeMo Guardrails, RAGAS | T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS | NeMo Guardrails Python deps RCE (via PyTorch CVEs) |
| (9) Model | LLM weights, embeddings, rerankers | GPT-4o, Claude, Llama-3, Mistral, Cohere reranker, BGE embeddings | T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS | PyTorch vuln (CVE-2022-45907); TensorFlow overflow (CVE-2021-37678); Hugging Face sandbox escape (CVE-2023-6730) |
| (10) Model Storage Infrastructure | Registry, encrypted artifacts | MLflow, S3/GCS, Azure Blob, Vertex AI Registry | T01-DPFT, T01-SCMP, T01-MTR, T01-MTD | MLflow path traversal (CVE-2023-6836); AWS S3 bucket takeover misconfigs (CWE-based) |
| (11) Model Serving Infrastructure | GPU runtimes, inference servers, autoscaling | vLLM, NVIDIA Triton, TensorRT-LLM, Kubernetes GPU nodes | T01-SCMP, T01-MTU, T01-MTR, T01-DoSM | NVIDIA Triton RCE (CVE-2023-31036); Kubernetes privilege escalation (CVE-2023-3676); NVIDIA GPU DoS (CVE-2024-0146) |
| (12) Evaluation | Golden sets, drift/bias eval, safety harness | RAGAS, DeepEval, W&B, Evidently AI, Great Expectations | T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS | Weights & Biases CLI vuln (GitHub advisories); Great Expectations YAML injection (potential CWE-74) |
| (13) Training & Tuning | Pipelines, fine-tuning, HPO | Kubeflow, SageMaker, Hugging Face PEFT, Optuna | T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD | Kubeflow dashboard RCE (CVE-2021-31812); SageMaker Jupyter RCE (AWS advisory); Hugging Face PEFT vuln (CVE-2023-6730) |
| (14) Model Frameworks & Code | Frameworks, tokenizers, compilers | PyTorch, TensorFlow, Hugging Face, ONNX Runtime | T01-SCMP, T01-MTD, T01-VEW | TensorFlow buffer overflow (CVE-2021-37678); PyTorch vulnerability (CVE-2022-45907); ONNX Runtime DoS (CVE-2022-25883) |
| (15) Data Storage Infrastructure | Vector DBs, RDBMS, object stores | Weaviate, Pinecone, Milvus, Redis, Postgres, S3 | T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID | Redis RCE (CVE-2022-0543); PostgreSQL escalation (CVE-2023-2454); Milvus injection (CVE-2023-48022) |
| (16) Training Data | Raw corpora, labeled, synthetic | Chat logs, FAQs, Label Studio, synthetic Q&A | T01-MIMI, T01-TDL, T01-SID | Label Studio auth bypass (CVE-2021-36701) |
| (17) Data Filtering & Processing | ETL, cleaning, chunking, tagging | Airflow, dbt, Unstructured.io, spaCy, NLTK | T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS | Apache Airflow RCE (CVE-2023-42793); dbt adapter injection (GitHub advisories) |
| (18) Data Sources | Internal KBs, CRM, telemetry | Confluence, Jira, Elastic, Splunk | T01-SID, T01-DMP, T01-VEW, T01-MIS | Confluence RCE (CVE-2023-22515); Jira auth bypass (CVE-2020-14181); ElasticSearch RCE (CVE-2015-1427); Splunk RCE (CVE-2022-32158) |
| (19) External Sources | Public datasets, 3rd party APIs/feeds | Wikipedia, Common Crawl, arXiv, News APIs | T01-MIMI, T01-SID, T01-DMP, T01-MIS | Dataset poisoning risks (no CVEs, CWE-driven); API poisoning (CWE-345: Insufficient Verification of Data Authenticity) |
## AI Threat enumeration and Targeted CWEs
> In this section we provide a mapping of SAIF components to AI threats and examples of vulnerability types/CWEs that can be exploited
| SAIF Component | Mapped Threats | Targeted CWEs |
|----------------|----------------|----------------|
| (2) User Input | T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU | CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-359, CWE-400, CWE-522, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-94 |
| (3) User Output | T01-EA, T01-SPL, T01-MIS, T01-IOH | CWE-116, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-640, CWE-79, CWE-825 |
| (4) Application | T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS | CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-640, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-94 |
| (5) Agent/Plugin | T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW | CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94 |
| (6) External Sources | T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP | CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94 |
| (7) Input Handling | T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW | CWE-117, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-359, CWE-400, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-770, CWE-787, CWE-829, CWE-918 |
| (8) Output Handling | T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS | CWE-116, CWE-117, CWE-1204, CWE-200, CWE-201, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-532, CWE-640, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825 |
| (9) Model | T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS | CWE-116, CWE-117, CWE-119, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-203, CWE-209, CWE-276, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94 |
| (10) Model Storage Infra | T01-DPFT, T01-SCMP, T01-MTR, T01-MTD | CWE-276, CWE-284, CWE-285, CWE-494, CWE-522, CWE-829, CWE-830 |
| (11) Model Serving Infra | T01-SCMP, T01-MTU, T01-MTR, T01-DoSM | CWE-1204, CWE-276, CWE-284, CWE-400, CWE-494, CWE-522, CWE-75, CWE-770, CWE-787, CWE-829 |
| (12) Evaluation | T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS | CWE-116, CWE-117, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-522, CWE-532, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825 |
| (13) Training & Tuning | T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD | CWE-1389, CWE-20, CWE-276, CWE-285, CWE-345, CWE-352, CWE-494, CWE-693, CWE-825, CWE-829, CWE-830 |
| (14) Model Frameworks & Code | T01-SCMP, T01-MTD, T01-VEW | CWE-276, CWE-285, CWE-494, CWE-502, CWE-829, CWE-918 |
| (15) Data Storage Infra | T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID | CWE-117, CWE-119, CWE-20, CWE-200, CWE-276, CWE-285, CWE-359, CWE-494, CWE-522, CWE-532, CWE-74, CWE-829, CWE-830, CWE-94 |
| (16) Training Data | T01-MIMI, T01-TDL, T01-SID | CWE-200, CWE-201, CWE-203, CWE-359, CWE-522 |
| (17) Data Filtering & Processing | T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS | CWE-119, CWE-20, CWE-200, CWE-201, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94 |
| (18) Data Sources | T01-SID, T01-DMP, T01-VEW, T01-MIS | CWE-20, CWE-200, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-918 |
| (19) External Sources | T01-MIMI, T01-SID, T01-DMP, T01-MIS | CWE-20, CWE-200, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-522, CWE-74, CWE-825 |
## AI Threats, Targeted CWEs and Recommendations to Fix Them
> In this section we provide a mapping of SAIF components to threats, possibly targeted CWEs, the rationale for CWEs being targeted, and recommendations for fixing them.
- [(2) User Input](#2-user-input)
- [(3) User Output](#3-user-output)
- [(4) Application](#4-application)
- [(5) Agent / Plugin](#5-agent--plugin)
- [(6) External Sources](#6-external-sources)
- [(7) Input Handling](#7-input-handling)
- [(8) Output Handling](#8-output-handling)
- [(9) Model](#9-model)
- [(10) Model Storage Infrastructure](#10-model-storage-infrastructure)
- [(11) Model Serving Infrastructure](#11-model-serving-infrastructure)
- [(12) Evaluation](#12-evaluation)
- [(13) Training & Tuning](#13-training--tuning)
- [(14) Model Frameworks & Code](#14-model-frameworks--code)
- [(15) Data Storage Infrastructure](#15-data-storage-infrastructure)
- [(16) Training Data](#16-training-data)
- [(17) Data Filtering & Processing](#17-data-filtering--processing)
- [(18) Data Sources](#18-data-sources)
- [(19) External Sources](#19-external-sources)
---
## (2) User Input
**Summary:** User Input is the front door of the system — every downstream component depends on it. Without strong input validation, filtering, and limits, it becomes the main vector for prompt injection, data leakage, DoS, and toxicity propagation.
**Threats:** T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-707, CWE-200, CWE-359, CWE-522, CWE-400, CWE-770, CWE-787, CWE-116, CWE-79
### Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)
**Mapped CWEs:** CWE-20, CWE-74, CWE-94, CWE-707
**Rationale:** Maliciously crafted inputs (user prompts or embedded instructions) can override instructions or trigger unintended actions.
**Recommendations:**
- Apply strict input validation and canonicalization before passing content to the model.
- Use prompt isolation/sandboxing (separate user and system instructions).
- Enforce allowlist-based instruction patterns.
- Test with adversarial prompt fuzzing.
### Sensitive Information Disclosure (T01-SID)
**Mapped CWEs:** CWE-200, CWE-359, CWE-522
**Rationale:** Inputs may include secrets/PII that can be reflected in outputs or logs.
**Recommendations:**
- Integrate DLP filters into input channels.
- Mask/tokenize secrets and PII before forwarding to the model.
- Restrict logging of raw inputs.
### Denial of Service Model (T01-DoSM)
**Mapped CWEs:** CWE-400, CWE-770, CWE-787
**Rationale:** Oversized or adversarial inputs can exhaust tokens/compute.
**Recommendations:**
- Set input size and tokenization limits.
- Apply rate-limits and per-user quotas.
- Use circuit breakers/autoscaling.
### Insecure Output Handling Triggered by Inputs (T01-IOH)
**Mapped CWEs:** CWE-116, CWE-79
**Rationale:** Malicious inputs may propagate to rendered outputs (e.g., XSS).
**Recommendations:**
- Sanitize and encode outputs by context (HTML/MD/JSON).
- Separate data from control characters; use safe rendering frameworks.
### Model Toxicity / Unreliable Outputs (T01-MTU)
**Mapped CWEs:** CWE-707, CWE-345, CWE-1204
**Rationale:** Inputs can steer models toward toxic or unreliable content.
**Recommendations:**
- Add toxicity/bias classifiers and context filters.
- Escalate high-risk cases to human review.
---
## (3) User Output
**Summary:** The last mile to users/connected systems; without control, its a vector for excessive agency, prompt leakage, misinformation, and unsafe rendering.
**Threats:** T01-EA, T01-SPL, T01-MIS, T01-IOH
**Targeted CWEs:**
CWE-284, CWE-285, CWE-200, CWE-209, CWE-359, CWE-532, CWE-116, CWE-79, CWE-75, CWE-345, CWE-1204
### Excessive Agency (T01-EA)
**Mapped CWEs:** CWE-284, CWE-285
**Rationale:** Action-bearing outputs can trigger privileged operations without proper scoping.
**Recommendations:**
- Enforce least-privilege scopes for action outputs.
- Require policy checks before rendering actionable UI.
- Use allowlists and out-of-band approvals for high-risk actions.
### Sensitive Prompt Leakage (T01-SPL)
**Mapped CWEs:** CWE-200, CWE-209, CWE-359, CWE-532
**Rationale:** Hidden prompts/keys/PII can surface in responses, errors, or logs.
**Recommendations:**
- Redact secrets/PII/system instructions before render/logging.
- Wrap errors safely; never show raw tool/model errors.
- Separate user-visible and operator logs with DLP.
### Misinformation (T01-MIS)
**Mapped CWEs:** CWE-345, CWE-1204
**Rationale:** Ungrounded claims appear credible in UI.
**Recommendations:**
- Require grounding/citations for high-risk claims.
- Add verification metrics and “needs review” flags.
### Insecure Output Handling (T01-IOH)
**Mapped CWEs:** CWE-116, CWE-79, CWE-75
**Rationale:** Unsanitized text can execute in rich renderers.
**Recommendations:**
- Render from structured formats; encode per context.
- Sanitize Markdown/HTML via allowlists; disable unsafe embeds.
---
## (4) Application
**Summary:** Orchestration brain (sessions, APIs, business logic). Weak validation or access controls can cascade into systemic compromise.
**Threats:** T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-400, CWE-770, CWE-787, CWE-116, CWE-79, CWE-75, CWE-284, CWE-285, CWE-345, CWE-1204
### Prompt Injection (T01-DPIJ, T01-IPIJ)
**Mapped CWEs:** CWE-20, CWE-74, CWE-94
**Rationale:** Unvalidated inputs into core instruction sets allow overrides.
**Recommendations:** Schema validation, role separation, safe interpreter layer.
### Sensitive Information Disclosure (T01-SID, T01-SPL)
**Mapped CWEs:** CWE-200, CWE-209, CWE-359, CWE-522
**Rationale:** Secrets leak via logs/prompts/plugins.
**Recommendations:** Redact secrets, RBAC on sensitive data, safe error handling.
### Denial of Service Model (T01-DoSM)
**Mapped CWEs:** CWE-400, CWE-770, CWE-787
**Recommendations:** Rate-limit orchestration, circuit breakers, size checks.
### Model Toxicity / Misinformation (T01-MTU, T01-MIS)
**Mapped CWEs:** CWE-345, CWE-1204
**Recommendations:** Grounding checks, toxicity/bias filters, confidence flags.
### Insecure Output Handling (T01-IOH)
**Mapped CWEs:** CWE-79, CWE-116, CWE-75
**Recommendations:** Contextual encoding/sanitization; strip unsafe HTML/MD.
### Excessive Agency (T01-EA)
**Mapped CWEs:** CWE-284, CWE-285
**Recommendations:** Least privilege, allowlists, secondary approvals.
---
## (5) Agent / Plugin
**Summary:** Extended arms of the system; vulnerable to IPIJ, secrets handling, tampering, excessive actions, and unsafe workflows.
**Threats:** T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-284, CWE-285, CWE-276, CWE-494, CWE-829, CWE-918, CWE-502
### Indirect Prompt Injection (T01-IPIJ)
**Mapped CWEs:** CWE-20, CWE-74, CWE-94
**Recommendations:** Strict I/O schemas, escape parameters, forbid dynamic eval.
### Sensitive Information Disclosure (T01-SID)
**Mapped CWEs:** CWE-200, CWE-359, CWE-522
**Recommendations:** Scoped credentials, redact tool responses, data minimization.
### Model Tampering / Disclosure (T01-MTD)
**Mapped CWEs:** CWE-276, CWE-285, CWE-494
**Recommendations:** Hardened permissions, signed manifests, artifact signing.
### Excessive Agency (T01-EA)
**Mapped CWEs:** CWE-284, CWE-285
**Recommendations:** Per-action least privilege, policy gates, human-in-the-loop.
### Vulnerable External Workflow (T01-VEW)
**Mapped CWEs:** CWE-829, CWE-918, CWE-502
**Recommendations:** Tool allowlists, egress proxy, safe content types.
**Operational Hardening (cross-cutting):** Per-tool rate limits/timeouts; container isolation; telemetry; signed releases/SBOMs; tenant isolation for state.
---
## (6) External Sources
**Summary:** Bridges to the outside world; unverified data can inject poison, trigger unsafe actions, or spread misinformation.
**Threats:** T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-276, CWE-284, CWE-285, CWE-494, CWE-829, CWE-918, CWE-502, CWE-353, CWE-345
### Indirect Prompt Injection (T01-IPIJ)
**Recommendations:** Sanitize/normalize external content; restrict content types; segregate retrieved content.
### Model Tampering/Disclosure (T01-MTD)
**Recommendations:** Integrity/signature checks; least-privilege access; explicit approvals; hardened storage permissions.
### Sensitive Information Disclosure (T01-SID)
**Recommendations:** Mask sensitive fields; scoped OAuth; DLP policies.
### Excessive Agency (T01-EA)
**Recommendations:** RBAC and allowlists for sources; policy checks before executing; sandboxed connectors.
### Vulnerable External Workflow (T01-VEW)
**Recommendations:** Egress proxy + allowlists; safe content types; SBOM verification.
### Data / Model Poisoning (T01-DMP)
**Recommendations:** Provenance/reputation scoring; adversarial sample testing; cryptographic integrity checks.
---
## (7) Input Handling
**Summary:** The filter layer; weak parsing/schema enforcement lets adversarial inputs/injections slip through.
**Threats:** T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-532, CWE-209, CWE-400, CWE-770, CWE-787, CWE-79, CWE-116, CWE-75, CWE-918
### Prompt Injection (T01-DPIJ)
**Recommendations:** Strict schemas and typing; strip unsafe control sequences; sandbox inputs.
### Adversarial Input Evasion (T01-AIE)
**Recommendations:** Unicode normalization; adversarial testing; layered validation.
### Sensitive Information Disclosure (T01-SID, T01-LSID, T01-SPL)
**Recommendations:** Ingestion-time redaction; masked logging; sanitize logs and errors.
### Denial of Service Model (T01-DoSM)
**Recommendations:** Input size/rate quotas; buffer validation.
### Vulnerable External Workflow (T01-VEW)
**Recommendations:** Domain allowlists + proxy; content-type validation.
---
## (8) Output Handling
**Summary:** Safety gate before delivery; failure here leaks sensitive data, misinformation, and unsafe content.
**Threats:** T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS
**Targeted CWEs:**
CWE-79, CWE-116, CWE-75, CWE-200, CWE-209, CWE-359, CWE-532, CWE-522, CWE-400, CWE-770, CWE-787, CWE-284, CWE-285, CWE-345, CWE-1204
### Log/Storage Information Disclosure (T01-LSID)
**Recommendations:** Strip sensitive context; RBAC for logs; safe error messages.
### Sensitive Information Disclosure (T01-SID, T01-SPL, T01-TDL)
**Recommendations:** Post-output DLP; encrypt/mask sensitive fields; prevent recall of sensitive training rows.
### Denial of Service Model (T01-DoSM)
**Recommendations:** Cap output size/tokens; quarantine oversized outputs; validate downstream buffers.
### Insecure Output Handling (T01-IOH)
**Recommendations:** Contextual encoding; allowlist sanitizers; disable rich rendering for untrusted text.
### Training Data Leakage (T01-TDL)
**Recommendations:** Differential privacy; verbatim/entropy filters; redact prompts; restrict logging.
### Model Toxicity / Misinformation (T01-MTU, T01-MIS)
**Recommendations:** Toxicity/bias filters; grounding/citations; fallbacks.
### Excessive Agency (T01-EA)
**Recommendations:** Allowlisted commands; authorization checks; explicit confirmation.
---
## (9) Model
**Summary:** The core intelligence; targeted by injection, poisoning, theft, inversion, DoS, and unsafe outputs.
**Threats:**
T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-532, CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-918, CWE-502, CWE-494, CWE-345, CWE-353, CWE-1204, CWE-116, CWE-119, CWE-830, CWE-829, CWE-640, CWE-693, CWE-75, CWE-79
### Prompt Injection (T01-DPIJ, T01-IPIJ)
**Recommendations:** Separate system/developer prompts; tokenizer-stage filtering; adversarial training.
### Supply Chain / Data & Fine-tuning Poisoning (T01-SCMP, T01-DPFT, T01-RMP, T01-DMP)
**Recommendations:** Signed weights/datasets; provenance scoring; adversarial sanitation; SBOMs.
### Adversarial Input Evasion (T01-AIE)
**Recommendations:** Normalize before tokenization; robustness testing; monitor embeddings.
### Sensitive Information Disclosure / Training Data Leakage (T01-SID, T01-TDL, T01-LSID, T01-SPL)
**Recommendations:** DP in training; block verbatim sequences; redact system prompts; restrict logging.
### Model Inversion / Membership Inference (T01-MIMI)
**Recommendations:** DP-SGD; rate limits/randomization; run MI red-teaming.
### Denial of Service Model (T01-DoSM)
**Recommendations:** Cap context; detect anomalies; harden serving buffers.
### Insecure Output Handling / Unsafe Integrations (T01-IOH, T01-VEW)
**Recommendations:** Sanitize outputs; whitelist tools; enforce policy layers.
### Model Theft / Exfiltration (T01-MTR, T01-MTD)
**Recommendations:** Access controls; encryption at rest; monitor for exfil.
### Model Toxicity / Misinformation / Excessive Agency (T01-MTU, T01-MIS, T01-EA)
**Recommendations:** Toxicity/bias post-filters; grounding; restrict actionable outputs; approvals.
---
## (10) Model Storage Infrastructure
**Summary:** Crown jewels at rest — must be encrypted, signed, and access-controlled.
**Threats:** T01-DPFT, T01-SCMP, T01-MTR, T01-MTD
**Targeted CWEs:**
CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-494, CWE-353, CWE-922
### Data/Prompt Fine-Tuning Poisoning (T01-DPFT)
**Recommendations:** Cryptographic signing + checksums; read-only versioned storage; attestation.
### Supply Chain Model Poisoning (T01-SCMP)
**Recommendations:** Trusted registries; verify lineage; pin dependencies.
### Model Theft / Exfiltration (T01-MTR)
**Recommendations:** Encrypt with KMS; least-privilege; monitor bulk downloads; harden defaults.
### Model Tampering / Disclosure (T01-MTD)
**Recommendations:** WORM storage; integrity verification on load; restrict access to service accounts.
---
## (11) Model Serving Infrastructure
**Summary:** Execution gateway; must resist poisoning, theft, DoS, and unsafe outputs.
**Threats:** T01-SCMP, T01-MTU, T01-MTR, T01-DoSM
**Targeted CWEs:**
CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345, CWE-1204, CWE-75
### Supply Chain Model Poisoning (T01-SCMP)
**Recommendations:** Signed container images; checksums; SBOM-enforced provenance; block untrusted registries.
### Model Toxicity / Unreliable Outputs (T01-MTU)
**Recommendations:** Moderation/toxicity filters; grounding checks; safe fallbacks.
### Model Theft / Exfiltration (T01-MTR)
**Recommendations:** Rate limits/anomaly detection; mTLS + RBAC; encrypt weights; harden FS perms.
### Denial of Service Model (T01-DoSM)
**Recommendations:** Cap request size/tokens; quotas at gateway; circuit breakers/autoscaling; robust parsers.
---
## (12) Evaluation
**Summary:** The safety lens; poison/bypass here yields false assurance.
**Threats:** T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-116, CWE-200, CWE-209, CWE-359, CWE-532, CWE-400, CWE-770, CWE-787, CWE-345, CWE-1204
### Adversarial Input Evasion (T01-AIE)
**Recommendations:** Schema validation; normalization; adversarial red-teaming.
### Data/Model Poisoning (T01-DMP)
**Recommendations:** Verify dataset provenance; cross-check baselines; ensemble evaluation.
### Information Disclosure (T01-LSID, T01-SID, T01-TDL)
**Recommendations:** Sanitize logs; encrypt/ACL datasets; monitor for memorization leakage.
### Denial of Service Model (T01-DoSM)
**Recommendations:** Limit dataset size/runs; rate-limit jobs; fault isolation.
### Model Toxicity / Unsafe Output / Misinformation (T01-MTU, T01-IOH, T01-MIS)
**Recommendations:** Include toxicity/factuality benchmarks; require grounding; scan for unsafe HTML/MD.
---
## (13) Training & Tuning
**Summary:** Where knowledge is forged; poor data embeds lasting bias/backdoors.
**Threats:** T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD
**Targeted CWEs:**
CWE-20, CWE-116, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-200, CWE-359
### Adversarial Input Evasion (T01-AIE)
**Recommendations:** Enforce schemas + canonical normalization; adversarial resilience tests; anomaly detection in preprocessing.
### Misinformation (T01-MIS)
**Recommendations:** Validate vs trusted sources; human oversight; training-time grounding.
### Data/Prompt Fine-Tuning Poisoning (T01-DPFT)
**Recommendations:** Signed datasets; immutable baselines; adversarial testing pre-deploy.
### Supply Chain Model Poisoning (T01-SCMP)
**Recommendations:** Trusted registries; signatures; hardened defaults and scoped access.
### Model Tampering / Disclosure (T01-MTD)
**Recommendations:** Encrypt checkpoints/logs; RBAC; regular permission audits.
---
## (14) Model Frameworks & Code
**Summary:** ML runtime backbone; supply chain or unsafe integrations taint the system.
**Threats:** T01-SCMP, T01-MTD, T01-VEW
**Targeted CWEs:**
CWE-94, CWE-95, CWE-829, CWE-494, CWE-353, CWE-276, CWE-284, CWE-285, CWE-918, CWE-502
### Supply Chain Model Poisoning (T01-SCMP)
**Recommendations:** Pin versions; require signed packages; scan dependencies; maintain SBOMs.
### Model Tampering / Disclosure (T01-MTD)
**Recommendations:** Harden runtimes; least-privilege service accounts; audit framework binaries.
### Vulnerable External Workflow / Unsafe Integration (T01-VEW)
**Recommendations:** Disable/sandbox dynamic eval; restrict plugin loading; isolate untrusted code; harden deserialization.
---
## (15) Data Storage Infrastructure
**Summary:** Knowledge vault; poisoning/tampering/leaks here undermine integrity & confidentiality.
**Threats:** T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID
**Targeted CWEs:**
CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-532, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345, CWE-922
### Runtime/Model/Data Poisoning (T01-RMP, T01-DMP, T01-DPFT, T01-SCMP)
**Recommendations:** Integrity checks; provenance scoring; append-only/versioned stores; anomaly monitoring.
### Sensitive Information Disclosure (T01-SID, T01-LSID)
**Recommendations:** Encrypt at rest + KMS; RBAC; sanitized logging; access monitoring.
### Model/Data Tampering or Exfiltration (T01-MTD)
**Recommendations:** Disable public/broad ACLs; per-tenant keys; least-privilege; immutable storage for critical data.
### Denial of Service Storage
**Recommendations:** Quotas and rate limits; hardened parsers/buffers; ingestion throttling.
---
## (16) Training Data
**Summary:** Root of trust; compromise propagates to all downstream behavior.
**Threats:** T01-MIMI, T01-TDL, T01-SID
**Targeted CWEs:**
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285
### Model Inversion / Membership Inference (T01-MIMI)
**Recommendations:** Differential privacy; strict RBAC on raw data; detect inversion patterns.
### Training Data Leakage (T01-TDL)
**Recommendations:** Encrypt datasets; keep creds out of pipelines; tokenize sensitive fields pre-ingestion.
### Sensitive Information Disclosure (T01-SID)
**Recommendations:** Least-privilege; row/column-level policies; audit all access.
### Data Authenticity
**Recommendations:** Signed/versioned datasets; provenance scoring; golden-set cross-validation.
---
## (17) Data Filtering & Processing
**Summary:** Gatekeeper stage; weak validation lets poisoned/sensitive data pass.
**Threats:** T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-116, CWE-200, CWE-359, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-829, CWE-918, CWE-502
### Runtime / Data Poisoning (T01-RMP, T01-DMP, T01-DPFT)
**Recommendations:** Signed datasets; hash verification; drift detection.
### Sensitive Information Disclosure (T01-SID, T01-TDL, T01-MIMI)
**Recommendations:** DLP in preprocessing; masking/tokenization; RBAC for feature stores.
### Vulnerable External Workflow (T01-VEW)
**Recommendations:** Sandbox transforms; egress filtering; forbid unsafe deserialization.
### Misinformation (T01-MIS)
**Recommendations:** Reputation/ground-truth validation; cross-dataset checks; human review for high-risk domains.
### Denial of Service on Pipelines
**Recommendations:** Size quotas; ingestion rate limits; anomaly monitoring.
---
## (18) Data Sources
**Summary:** Entry point of truth; without provenance checks, they introduce poisoned/unsafe content.
**Threats:** T01-SID, T01-DMP, T01-VEW, T01-MIS
**Targeted CWEs:**
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-829, CWE-918, CWE-502
### Sensitive Information Disclosure (T01-SID)
**Recommendations:** DLP at ingestion; least-privilege credentials; encrypt sensitive datasets.
### Data/Model Poisoning (T01-DMP)
**Recommendations:** Signature/hash checks; reputation scoring; golden-set cross-validation.
### Vulnerable External Workflow (T01-VEW)
**Recommendations:** Proxy + allowlists; forbid unsafe formats; isolate connectors.
### Misinformation (T01-MIS)
**Recommendations:** Reliability scoring; ground-truth cross-referencing; drift monitoring.
---
## (19) External Sources
**Summary:** Outside the trust boundary; major vectors for poisoning, leakage, and misinformation.
**Threats:** T01-MIMI, T01-SID, T01-DMP, T01-MIS
**Targeted CWEs:**
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-918, CWE-829
### Model Inversion / Membership Inference (T01-MIMI)
**Recommendations:** Privacy-preserving APIs; throttle/detect anomalies; k-anonymity/data minimization.
### Sensitive Information Disclosure (T01-SID)
**Recommendations:** Secret managers; token rotation; TLS + mutual auth.
### Data/Model Poisoning (T01-DMP)
**Recommendations:** Data signing/ checksums; cross-validate with references; vendor trust contracts.
### Misinformation (T01-MIS)
**Recommendations:** Source reliability scores; ground-truth validation; human review for high-impact feeds.
---