www-project-ai-testing-guide/Document/content/2.2_Appendix_E.md at 34dbbccccc457729e1c396fab10a7d6b00aadfdd

CalvinBackup/www-project-ai-testing-guide

mirror of https://github.com/OWASP/www-project-ai-testing-guide.git synced 2026-02-12 21:52:45 +00:00

Files

Marco Morana 9a9fa8448c Update 2.2_Appendix_E.md

Riveduto il mapping threats CWE, rattionale, reccomendations per consisenza

2025-10-22 15:25:50 -04:00

76 KiB

Raw Blame History

2.2 Appendix E: AI Threats Mapping to AI Components Vulnerabilities (CVEs & CWEs)

AI Penetration Testing Framework: Scoping, CVE/CWE Mapping, and Threat Correlation

This appendix guides penetration testers on mapping discovered CVEs and CWEs in SAIF components of an AI architecture to AI-specific threats. CVEs (Common Vulnerabilities and Exposures) generally point to specific, documented vulnerabilities in the underlying technology stack, such as libraries, frameworks, or APIs used to build AI systems and applications. CWEs (Common Weakness Enumerations), on the other hand, describe classes of software design or implementation flaws that may lead to such vulnerabilities.

Step 1 — Scoping AI Penetration Tests Within the SAIF Architecture

Because the pen tests described here target a live AI system/Application, careful scoping is essential: testers must first identify which SAIF components and subcomponents are in scope, enumerate the exact technologies deployed for each, and use that inventory to prioritize CVE/CWE enumeration and threat simulations. In-scope items commonly include components owned or operated by the organization and directly involved in the request→response flow, for example, chat UIs, API backends (e.g., FastAPI), session/orchestration layers, model orchestration frameworks (e.g., LangChain or LlamaIndex), vector stores (Redis, Pinecone, Weaviate), ETL/data pipelines, model-serving endpoints, and internally managed connectors. Because these components can contain outdated, misconfigured, or otherwise exploitable dependencies, the first operational step is threat enumeration: map each in-scope SAIF component to its tech stack, identify relevant CVEs (and corresponding CWEs), and derive likely exploit paths. That mapping then drives focused validation with scanners, SCA tools, and proof-of-concept testing so testers can prioritize, reproduce, and demonstrate how conventional software flaws translate into AI-centric impacts.

Step 2 — Threat Enumeration and CVE Exploit Path Mapping

The process of mapping threats to Ai system vulnerabilities starts by identifying known vulnerabilities expressed as CVEs in AI systems/applications using Software composition analyzers (SCAs) and runtime tools. SCA Tools (e.g., Snyk, Trivy, Dependabot, OWASP Dependency-Check, and GitHub Advanced Security) will flag vulnerable third party software dependencies, while scanners such as Nessus and Nuclei can confirm active CVE exposures in APIs and services. Runtime telemetry and host inspection can also validate which CVEs are exploitable in live environments. These CVEs are then mapped to AI-specific threats (i.e. TA0i-XX threats) outlined in this guide: for example, a FastAPI sanitization flaw (CVE-2022-36067) can be part of a prompt-injection vector (T01-DPIJ), and an Airflow ETL vulnerability (CVE-2022-40127) can lead to data poisoning (T01-DMP) in a RAG pipeline.

For each SAIF component in scope, testers review subcomponents, confirm deployed technologies, and run focused tests to find exploitable or unpatched libraries. These findings drive AI-specific attack simulations such as prompt injection, model inversion, data poisoning, or runtime DoS to reveal real application impact. Using the CVE exploit-path mapping table, testers can maintain traceability from vulnerability to AI impact. For instance, Redis in SAIF #4 (Application Layer) vulnerable to CVE-2022-0543 links to risks like data leakage (T01-SID), model disruption (T01-DoSM), and manipulation (T01-MTD). A single Redis compromise can escalate from infrastructure control to model tampering—compromising data integrity, availability, and trust.

Step 3 — AI Threat-to-CWE Mapping for Root Cause and Remediation

The final recommended step is to perform AI threat enumeration and CWE exploit-path mapping, transforming vulnerability centric testing into design level assurance. This appendix provides a Threat-to-SAIF-Component-to-CWE mapping, complementing the Threat-to-Test-Case mapping (AITG tests) presented earlier in this guide. Together, these enable testers to link AI-specific vulnerabilities—such as prompt injection, data leakage, or model poisoning—to their root causes, whether insecure design, implementation weakness, or misconfiguration. By classifying findings under CWE categories, testers connect penetration testing results to recognized software weakness patterns. This approach bridges the gap between patch management and secure architecture, guiding fixes that strengthen entire system layers rather than individual components. For example, CWE-20 (Improper Input Validation) reveals weak parsing logic; CWE-276 (Incorrect Default Permissions) highlights insecure cloud storage defaults; and CWE-345 (Insufficient Verification of Data Authenticity) exposes trust flaws in RAG ingestion pipelines.

During AITG testing across in-scope SAIF components, each failed test should identify the immediate issue and trace it to a corresponding CWE root cause. Reports should include both the weakness and an actionable recommendation—for instance, enforcing input validation, disabling public defaults, verifying dataset authenticity, or encrypting sensitive data. This shifts the tester’s message from “how I broke it” to “how to fix and redesign it.” As systems evolve, testers can update the CVE and CWE mappings to reflect new vulnerabilities and use the AI Threats column as a living checklist for future red-team exercises. This evolving matrix supports continuous validation and resilience in AI-enabled systems. Once fixes are implemented, corresponding AITG tests should be re-run to verify closure, with findings prioritized by risk severity (Critical, High, Medium, Low) and resolved per SLA targets. This structured, CWE-driven approach ensures AI testing results are not just diagnostic but actionable, improving both software resilience and long-term AI system risk posture.

AI Threat enumeration and CVE exploit path mapping

In this section we provide a mapping of SAIF components to AI threats and examples of component dependent tech-stack CVEs that can be exploited

SAIF Component (Number)	Sub-Components	Tech Stack (Chatbot + RAG)	Mapped Threats	Example CVEs in Tech Stack
(2) User Input	Text, voice, multimodal parsers	React/Next.js, Slack SDK, Teams Bot, Twilio, Whisper/ASR, FastAPI/Pydantic	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU	React XSS (CVE-2021-24033); FastAPI vuln (CVE-2023-27533); Twilio SDK (CVE-2022-36449)
(3) User Output	Renderers, formatting, TTS/visual output	React chat widgets, Slack/Teams cards, Polly/ElevenLabs, Markdown renderers	T01-EA, T01-SPL, T01-MIS, T01-IOH	Slack API auth bypass (CVE-2020-10753); Markdown injection (CVE-2022-21681)
(4) Application	Orchestration, session mgmt, APIs, business logic	LangChain, LlamaIndex, Semantic Kernel, FastAPI/Flask, Redis sessions, GraphQL APIs	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS	Flask template injection (CVE-2019-8341); Redis RCE (CVE-2022-0543); GraphQL DoS (CVE-2020-15159)
(5) Agent/Plugin	Connectors, plugin registry, tool adapters	LangGraph Agents, OpenAI Functions, Zapier/n8n, custom OpenAPI tools	T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW	n8n RCE (CVE-2023-37925); OpenAPI tooling parser injection (CVE-2021-32640)
(6) External Sources (App)	APIs, SaaS services, enterprise connectors	Salesforce, ServiceNow, Confluence, SharePoint APIs	T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP	Confluence RCE (CVE-2023-22515); SharePoint RCE (CVE-2023-29357)
(7) Input Handling	Validation, sanitization, PII detection, scanning	Pydantic, JSON Schema, Presidio, ClamAV	T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW	ClamAV RCE (CVE-2023-20032); JSON Schema validator injection (GitHub advisories)
(8) Output Handling	Filters, moderation, redaction, grounding checks	Guardrails.ai, OpenAI Moderation, NeMo Guardrails, RAGAS	T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS	NeMo Guardrails Python deps RCE (via PyTorch CVEs)
(9) Model	LLM weights, embeddings, rerankers	GPT-4o, Claude, Llama-3, Mistral, Cohere reranker, BGE embeddings	T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS	PyTorch vuln (CVE-2022-45907); TensorFlow overflow (CVE-2021-37678); Hugging Face sandbox escape (CVE-2023-6730)
(10) Model Storage Infrastructure	Registry, encrypted artifacts	MLflow, S3/GCS, Azure Blob, Vertex AI Registry	T01-DPFT, T01-SCMP, T01-MTR, T01-MTD	MLflow path traversal (CVE-2023-6836); AWS S3 bucket takeover misconfigs (CWE-based)
(11) Model Serving Infrastructure	GPU runtimes, inference servers, autoscaling	vLLM, NVIDIA Triton, TensorRT-LLM, Kubernetes GPU nodes	T01-SCMP, T01-MTU, T01-MTR, T01-DoSM	NVIDIA Triton RCE (CVE-2023-31036); Kubernetes privilege escalation (CVE-2023-3676); NVIDIA GPU DoS (CVE-2024-0146)
(12) Evaluation	Golden sets, drift/bias eval, safety harness	RAGAS, DeepEval, W&B, Evidently AI, Great Expectations	T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS	Weights & Biases CLI vuln (GitHub advisories); Great Expectations YAML injection (potential CWE-74)
(13) Training & Tuning	Pipelines, fine-tuning, HPO	Kubeflow, SageMaker, Hugging Face PEFT, Optuna	T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD	Kubeflow dashboard RCE (CVE-2021-31812); SageMaker Jupyter RCE (AWS advisory); Hugging Face PEFT vuln (CVE-2023-6730)
(14) Model Frameworks & Code	Frameworks, tokenizers, compilers	PyTorch, TensorFlow, Hugging Face, ONNX Runtime	T01-SCMP, T01-MTD, T01-VEW	TensorFlow buffer overflow (CVE-2021-37678); PyTorch vulnerability (CVE-2022-45907); ONNX Runtime DoS (CVE-2022-25883)
(15) Data Storage Infrastructure	Vector DBs, RDBMS, object stores	Weaviate, Pinecone, Milvus, Redis, Postgres, S3	T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID	Redis RCE (CVE-2022-0543); PostgreSQL escalation (CVE-2023-2454); Milvus injection (CVE-2023-48022)
(16) Training Data	Raw corpora, labeled, synthetic	Chat logs, FAQs, Label Studio, synthetic Q&A	T01-MIMI, T01-TDL, T01-SID	Label Studio auth bypass (CVE-2021-36701)
(17) Data Filtering & Processing	ETL, cleaning, chunking, tagging	Airflow, dbt, Unstructured.io, spaCy, NLTK	T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS	Apache Airflow RCE (CVE-2023-42793); dbt adapter injection (GitHub advisories)
(18) Data Sources	Internal KBs, CRM, telemetry	Confluence, Jira, Elastic, Splunk	T01-SID, T01-DMP, T01-VEW, T01-MIS	Confluence RCE (CVE-2023-22515); Jira auth bypass (CVE-2020-14181); ElasticSearch RCE (CVE-2015-1427); Splunk RCE (CVE-2022-32158)
(19) External Sources	Public datasets, 3rd party APIs/feeds	Wikipedia, Common Crawl, arXiv, News APIs	T01-MIMI, T01-SID, T01-DMP, T01-MIS	Dataset poisoning risks (no CVEs, CWE-driven); API poisoning (CWE-345: Insufficient Verification of Data Authenticity)

AI Threat enumeration and Targeted CWEs

In this section we provide a mapping of SAIF components to AI threats and examples of vulnerability types/CWEs that can be exploited

SAIF Component	Mapped Threats	Targeted CWEs
(2) User Input	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU	CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-359, CWE-400, CWE-522, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-94
(3) User Output	T01-EA, T01-SPL, T01-MIS, T01-IOH	CWE-116, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-640, CWE-79, CWE-825
(4) Application	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS	CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-640, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-94
(5) Agent/Plugin	T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW	CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94
(6) External Sources	T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP	CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94
(7) Input Handling	T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW	CWE-117, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-359, CWE-400, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-770, CWE-787, CWE-829, CWE-918
(8) Output Handling	T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS	CWE-116, CWE-117, CWE-1204, CWE-200, CWE-201, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-532, CWE-640, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825
(9) Model	T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS	CWE-116, CWE-117, CWE-119, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-203, CWE-209, CWE-276, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94
(10) Model Storage Infra	T01-DPFT, T01-SCMP, T01-MTR, T01-MTD	CWE-276, CWE-284, CWE-285, CWE-494, CWE-522, CWE-829, CWE-830
(11) Model Serving Infra	T01-SCMP, T01-MTU, T01-MTR, T01-DoSM	CWE-1204, CWE-276, CWE-284, CWE-400, CWE-494, CWE-522, CWE-75, CWE-770, CWE-787, CWE-829
(12) Evaluation	T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS	CWE-116, CWE-117, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-522, CWE-532, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825
(13) Training & Tuning	T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD	CWE-1389, CWE-20, CWE-276, CWE-285, CWE-345, CWE-352, CWE-494, CWE-693, CWE-825, CWE-829, CWE-830
(14) Model Frameworks & Code	T01-SCMP, T01-MTD, T01-VEW	CWE-276, CWE-285, CWE-494, CWE-502, CWE-829, CWE-918
(15) Data Storage Infra	T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID	CWE-117, CWE-119, CWE-20, CWE-200, CWE-276, CWE-285, CWE-359, CWE-494, CWE-522, CWE-532, CWE-74, CWE-829, CWE-830, CWE-94
(16) Training Data	T01-MIMI, T01-TDL, T01-SID	CWE-200, CWE-201, CWE-203, CWE-359, CWE-522
(17) Data Filtering & Processing	T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS	CWE-119, CWE-20, CWE-200, CWE-201, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94
(18) Data Sources	T01-SID, T01-DMP, T01-VEW, T01-MIS	CWE-20, CWE-200, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-918
(19) External Sources	T01-MIMI, T01-SID, T01-DMP, T01-MIS	CWE-20, CWE-200, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-522, CWE-74, CWE-825

AI Threat-to-Component-to-CWE Mapping and Remediation Guidance

In this section, we present a mapping between AI system components, associated AI threats (as defined in the guide’s threat model), corresponding CWE categories, and remediation recommendations. Each mapping includes the rationale explaining how specific CWEs are exploited or exposed by those AI threats, providing a direct link between identified weaknesses and actionable fixes.

AI System Architectural Components & Data (Note):

(2) User Input
(3) User Output
(4) Application
(5) Agent / Plugin
(6) External Sources
(7) Input Handling
(8) Output Handling
(9) Model
(10) Model Storage Infrastructure
(11) Model Serving Infrastructure
(12) Evaluation
(13) Training & Tuning
(14) Model Frameworks & Code
(15) Data Storage Infrastructure
(16) Training Data
(17) Data Filtering & Processing
(18) Data Sources
(19) External Sources

Note: Component identifiers correspond to the SAIF numbering scheme illustrated in the threat model diagram within this guide.

(2) User Input

Summary: User Input is the front door of the system, every downstream component depends on it. Without strong input validation, filtering, and limits, it becomes the main vector for prompt injection, data leakage, DoS, and toxicity propagation.

Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94, CWE-707

Rationale: Maliciously crafted inputs (user prompts or embedded instructions) can override instructions, alter reasoning chains, or trigger unintended actions in connected tools.

Recommendations:

Apply strict input validation and canonicalization before passing content to the model.
Use prompt isolation or sandboxing (separate user and system instructions).
Enforce allowlist-based instruction and function patterns.
Perform adversarial prompt fuzzing and red-team testing.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Oversized, malformed, or adversarial inputs can exhaust tokenization, GPU, or compute capacity, leading to degraded performance or service unavailability.

Recommendations:

Enforce maximum input size and token limits.
Apply rate-limits and per-user quotas at API gateways.
Use circuit breakers and autoscaling to mitigate load spikes.

Insecure Output Handling Triggered by Inputs (T01-IOH)

Mapped CWEs: CWE-116, CWE-79

Rationale: Malicious inputs may propagate into rendered outputs (e.g., HTML, Markdown, or JSON), enabling injection or cross-site scripting attacks.

Recommendations:

Sanitize and contextually encode all rendered outputs.
Separate data from control characters; use safe templating and rendering frameworks.
Enforce strict content-type validation before presentation.

Model Toxicity / Unreliable Outputs (T01-MTU)

Mapped CWEs: CWE-707, CWE-345, CWE-1204

Rationale: Crafted or provocative user inputs can bias model behavior, steering it toward toxic, discriminatory, or ungrounded responses.

Recommendations:

Integrate toxicity and bias classifiers to pre-screen user prompts.
Use contextual and sentiment filters on incoming requests.
Escalate high-risk or policy-violating cases to human review workflows.

(3) User Output

Summary: The last mile to users/connected systems; without control, it’s a vector for excessive agency, prompt leakage, misinformation, and unsafe rendering.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Action-bearing model outputs (e.g., generated commands, API calls, workflow triggers) can execute privileged or irreversible operations without authorization or user oversight.

Recommendations:

Enforce least-privilege scopes for all actionable outputs.
Apply policy and authorization checks before rendering or executing UI-driven actions.
Maintain allowlists and require explicit human approvals for high-impact or sensitive actions.

Sensitive Prompt Leakage (T01-SPL)

Mapped CWEs: CWE-200, CWE-209, CWE-359, CWE-532

Rationale: Model outputs, error messages, or logs may inadvertently reveal hidden prompts, credentials, API keys, or personal information embedded in the conversation context.

Recommendations:

Redact secrets, PII, and system instructions prior to rendering or logging.
Use structured error wrappers; never expose raw stack traces or backend errors.
Segregate user-visible and operator logs; apply DLP scanning to prevent prompt or secret leakage.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Ungrounded, fabricated, or biased statements can appear credible when presented in the UI, eroding user trust or propagating false information.

Recommendations:

Require grounding and citation checks for high-risk or factual claims.
Integrate verification confidence scores and “needs review” flags for uncertain responses.
Route flagged outputs to human review or moderation pipelines.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-116, CWE-79, CWE-75

Rationale: Unsanitized model outputs rendered in rich text, HTML, or Markdown can lead to script execution, injection, or UI manipulation in downstream clients.

Recommendations:

Render outputs from structured formats (e.g., JSON, plain text) with context-aware encoding.
Sanitize HTML/Markdown through allowlisted elements and attributes.
Disable unsafe embeds, links, and inline scripts in all rendering environments.

(4) Application

Summary: The orchestration brain that manages sessions, APIs, and business logic. Weak validation, error handling, or access controls at this layer can cascade into systemic compromise across the entire application stack.

Prompt Injection (T01-DPIJ, T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Unvalidated or unescaped input injected into model orchestration logic or prompt templates can override instructions, bypass business rules, or trigger unintended system actions.

Recommendations:

Perform strict schema validation and canonicalization on all inputs.
Separate roles for user-authored, developer, and system instructions.
Introduce a safe interpreter or mediation layer between user input and model orchestration.
Conduct adversarial prompt-injection testing as part of QA.

Sensitive Information Disclosure (T01-SID, T01-SPL)

Mapped CWEs: CWE-200, CWE-209, CWE-359, CWE-522

Rationale: Secrets, credentials, or internal configuration details may leak through logs, prompt contexts, or plugin responses, exposing sensitive data or business logic.

Recommendations:

Redact secrets and PII from logs, prompts, and API responses.
Enforce RBAC and scoped access to sensitive configuration data.
Implement safe, user-friendly error handling that hides stack traces and internal state.
Apply DLP scanning on logs and telemetry.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive or malformed requests to the orchestration or inference service can saturate compute, memory, or token resources, leading to service unavailability.

Recommendations:

Apply rate-limiting and circuit breakers at API gateways and orchestration tiers.
Enforce input size, token, and format validation.
Implement workload isolation and quotas per tenant, API, or model instance.
Monitor runtime metrics to detect anomalous consumption patterns.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Models embedded in the application can generate harmful, biased, or false content when the orchestration lacks grounding, confidence thresholds, or moderation layers.

Recommendations:

Implement grounding and factual consistency checks using trusted data sources.
Integrate toxicity and bias filters in the inference pipeline.
Flag low-confidence or high-risk outputs for review before dissemination.
Apply continuous evaluation of model reliability and fairness metrics.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-116, CWE-75

Rationale: Improperly sanitized or encoded model outputs (HTML, Markdown, or JSON) rendered in dashboards or downstream clients can lead to injection, cross-site scripting, or data corruption.

Recommendations:

Apply contextual encoding and sanitization before rendering.
Strip or escape unsafe HTML/Markdown tags and attributes.
Use safe templating libraries or rendering frameworks.
Enforce output validation and content-type boundaries between services.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Autonomous agents or model-driven APIs may perform privileged actions—such as initiating transactions or modifying files—without appropriate oversight or authorization.

Recommendations:

Enforce least-privilege access for model plugins, agents, and integrations.
Maintain allowlists for sensitive operations and external service calls.
Require secondary approvals or human-in-the-loop validation for high-impact actions.
Log and audit all agent-initiated operations for accountability.

(5) Agent / Plugin

Summary: Extended arms of the system; vulnerable to IPIJ, secrets handling, tampering, excessive actions, and unsafe workflows.

Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Plugins or connected tools may receive crafted or hidden instructions embedded within user or system prompts that manipulate downstream components, alter intended behavior, or trigger unsafe code execution.

Recommendations: Enforce strict input/output schemas; escape or sanitize all parameters; prohibit dynamic code evaluation or direct command execution from model-generated content.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Model, plugin, or connected service exposes confidential data such as credentials, tokens, or personal information through logs, prompts, or API responses due to insufficient data protection or contextual awareness.

Recommendations: Use scoped, short-lived credentials; redact sensitive fields in tool and model outputs; apply data minimization and need-to-know access controls.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Model artifacts, weights, or configurations can be modified, replaced, or exfiltrated due to weak file permissions, missing integrity checks, or insecure deployment pipelines—allowing attackers to alter model behavior or leak intellectual property.

Recommendations: Enforce hardened file and storage permissions; validate model integrity via signed manifests; require digital signing and verification of all model artifacts before deployment.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Model or autonomous agent executes actions beyond its intended authority—such as invoking privileged APIs, modifying external systems, or performing unapproved transactions—due to insufficient access controls or unrestricted delegation.

Recommendations: Enforce per-action least privilege; implement policy gates for sensitive operations; require human-in-the-loop approval for high-risk or irreversible actions.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Model-integrated tools or external workflow components can be exploited through untrusted dependencies, SSRF vectors, or unsafe deserialization—allowing attackers to pivot into internal networks, exfiltrate data, or execute arbitrary code.

Recommendations: Maintain strict tool allowlists and egress proxy controls; enforce validation of content types and schema for external responses.

(6) External Sources

Summary: Bridges to the outside world; unverified data can inject poison, trigger unsafe actions, or spread misinformation.

Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Plugins or retrieval components may process crafted or malicious content from external sources (web pages, documents, APIs) that inject hidden instructions or alter model behavior through prompt manipulation.

Recommendations: Sanitize and normalize all retrieved external content; restrict accepted content types and formats; segregate and label retrieved data to prevent cross-context prompt injection.

Model Tampering/Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Model files, weights, or configurations can be modified or leaked through weak storage permissions, unverified updates, or insecure pipelines—allowing attackers to alter outputs, inject backdoors, or exfiltrate proprietary data.

Recommendations: Implement integrity and signature verification for all model artifacts; enforce least-privilege access and explicit change approvals; apply hardened storage permissions across training and deployment environments.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: AI models or connected tools may expose confidential data (e.g., tokens, credentials, personal identifiers) through logs, responses, or stored context due to insufficient redaction or access controls.

Recommendations: Mask sensitive fields in logs and outputs; use scoped OAuth credentials with minimal privileges; enforce data-loss-prevention (DLP) policies for prompt and response data flows.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Model or agent autonomously performs privileged or unintended actions—such as calling sensitive APIs, modifying resources, or invoking external tools—without appropriate authorization or contextual policy validation.

Recommendations: Enforce RBAC and allowlists for data sources and actions; perform policy and safety checks before executing model-initiated operations; use sandboxed or isolated connectors to restrict external access.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Integrations or tools that interact with external workflows can be compromised via untrusted dependencies, SSRF, or unsafe deserialization, leading to unauthorized network access or remote code execution.

Recommendations: Enforce egress proxy and strict allowlists for outbound connections; validate and enforce safe content types; verify software supply chain integrity through signed releases and SBOM verification.

Data / Model Poisoning (T01-DMP)

Mapped CWEs: CWE-20, CWE-494, CWE-353

Rationale: Attackers inject malicious data or manipulate model artifacts during training, fine-tuning, or update pipelines, causing biased outputs, backdoors, or performance degradation.

Recommendations: Establish data provenance and reputation scoring mechanisms; perform adversarial sample and anomaly testing; apply cryptographic integrity checks on datasets and model artifacts throughout the pipeline.

(7) Input Handling

Summary: The filter layer; weak parsing/schema enforcement lets adversarial inputs/injections slip through.

Prompt Injection (T01-DPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Malicious user or system input manipulates model prompts to override instructions, inject new goals, or trigger unintended actions in downstream tools or connected systems.

Recommendations: Enforce strict input schemas and strong typing; strip unsafe control sequences and escape characters; sandbox and isolate user inputs before prompt assembly.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-1384

Rationale: Attackers craft adversarial inputs (e.g., perturbed tokens, unicode tricks, or obfuscated payloads) to evade model detection or classification boundaries, resulting in mispredictions or bypassing safety filters.

Recommendations: Normalize and sanitize Unicode and encoding variations; conduct adversarial robustness testing; apply layered input validation and confidence thresholding.

Sensitive Information Disclosure (T01-SID, T01-LSID, T01-SPL)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Sensitive data (e.g., API keys, secrets, PII, or training data) is exposed during ingestion, inference, or logging due to unredacted inputs, verbose errors, or unsafe context retention.

Recommendations: Apply ingestion-time redaction for sensitive terms; mask or tokenize secrets in logs; sanitize logs, error traces, and tool responses to prevent data leakage.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Oversized or malformed inputs and unbounded request rates can exhaust GPU, memory, or CPU resources in model inference services, leading to degraded performance or service outages.

Recommendations: Enforce input size and rate quotas; validate buffer dimensions and tensor structures before inference execution.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: External toolchains, webhooks, or retrieval flows can be exploited through untrusted dependencies, SSRF, or unsafe deserialization to access internal networks or execute arbitrary code.

Recommendations: Use domain-based allowlists with outbound proxy enforcement; validate and enforce safe content types for all retrieved or external resources.

(8) Output Handling

Summary: Safety gate before delivery; failure here leaks sensitive data, misinformation, and unsafe content.

Log/Storage Information Disclosure (T01-LSID)

Mapped CWEs: CWE-200, CWE-532, CWE-522

Rationale: Logs or persistent storage may capture raw model outputs, user prompts, or tokens that contain sensitive information. Without redaction, encryption, or access controls, these records can expose secrets, PII, or proprietary context.

Recommendations: Strip sensitive context from stored logs and outputs; enforce RBAC and least privilege for log access; use sanitized and generic error messages.

Sensitive Information Disclosure (T01-SID, T01-SPL, T01-TDL)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Output layers may inadvertently reveal secrets, PII, or confidential training data through generated responses, summaries, or recalled examples.

Recommendations: Apply post-output DLP scanning; encrypt or mask sensitive fields before returning to clients; prevent recall or verbatim exposure of sensitive training data rows.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessively large or malformed outputs (e.g., runaway text generation, long JSON sequences) can overflow downstream buffers or consume significant rendering resources, impacting availability.

Recommendations: Cap output size and token limits; quarantine or truncate oversized responses; validate downstream buffer and rendering capacities.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-116, CWE-75

Rationale: Untrusted model outputs rendered as HTML, Markdown, or code without proper encoding can lead to injection attacks or content manipulation in client or downstream systems.

Recommendations: Use contextual output encoding and allowlisted sanitization routines; disable rich rendering for untrusted text or code blocks; enforce strict content-type boundaries.

Training Data Leakage (T01-TDL)

Mapped CWEs: CWE-201, CWE-359

Rationale: Models may emit verbatim snippets or memorized content from their training data, including personally identifiable or proprietary information.

Recommendations: Employ differential privacy during training; use verbatim and entropy-based leakage filters; redact prompt and output logs; restrict access to model telemetry or trace data.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Generated outputs may include harmful, biased, or false information due to unfiltered model behavior or insufficient grounding in verified sources.

Recommendations: Integrate toxicity and bias filters; require grounding and citations to trusted datasets; implement fallback responses when confidence is low or bias is detected.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: The model or its connected tools execute actions automatically (e.g., API calls, file writes, system changes) without explicit authorization or confirmation.

Recommendations: Restrict actions to allowlisted commands; apply authorization and policy checks before execution; require explicit human confirmation for high-impact operations.

(9) Model

Summary: The core intelligence; targeted by injection, poisoning, theft, inversion, DoS, and unsafe outputs.

Prompt Injection (T01-DPIJ, T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Crafted user or system inputs can override, manipulate, or insert instructions within model prompts—altering the model’s intended reasoning path or causing execution of untrusted actions.

Recommendations: Separate system, developer, and user prompts into isolated contexts; apply tokenizer-stage filtering and normalization; conduct adversarial training to harden against prompt manipulation.

Supply Chain / Data & Fine-tuning Poisoning (T01-SCMP, T01-DPFT, T01-RMP, T01-DMP)

Mapped CWEs: CWE-494, CWE-353, CWE-829

Rationale: Model training or fine-tuning data, dependencies, or weights can be poisoned or replaced through compromised datasets, malicious model checkpoints, or tampered packages in the supply chain.

Recommendations: Use digitally signed model weights and datasets; apply provenance and reputation scoring; sanitize fine-tuning data for adversarial patterns; maintain SBOMs for all model components.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-1384

Rationale: Adversarially perturbed inputs exploit model weaknesses to evade detection or cause misclassification, often through subtle token-level or embedding-space manipulation.

Recommendations: Normalize inputs prior to tokenization; perform robustness and adversarial testing across datasets; monitor embedding distributions for drift or anomalies.

Sensitive Information Disclosure / Training Data Leakage (T01-SID, T01-TDL, T01-LSID, T01-SPL)

Mapped CWEs: CWE-200, CWE-359, CWE-201

Rationale: Model parameters or outputs may expose memorized training data, sensitive context, or private attributes through unfiltered responses or model inversion attempts.

Recommendations: Apply differential privacy during training (e.g., DP-SGD); block verbatim sequence recall; redact sensitive tokens in system prompts; restrict or sanitize inference-time logging.

Model Inversion / Membership Inference (T01-MIMI)

Mapped CWEs: CWE-203, CWE-359

Rationale: Attackers query the model to infer whether specific data records were used during training, or reconstruct sensitive training examples via inversion techniques.

Recommendations: Use DP-SGD or other noise-based privacy mechanisms; enforce rate limits and output randomization; conduct dedicated membership-inference red teaming to validate resilience.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive model context lengths, complex prompt chains, or malformed inference payloads can overload GPU/CPU resources, leading to degraded performance or outages.

Recommendations: Cap model context and token limits; detect abnormal inference patterns or anomalies; harden serving buffers and apply per-request resource quotas.

Insecure Output Handling / Unsafe Integrations (T01-IOH, T01-VEW)

Mapped CWEs: CWE-79, CWE-116, CWE-829

Rationale: Model outputs may contain untrusted data or unsafe formatting passed to external systems, or integrations may process outputs without sanitization—leading to injection or workflow compromise.

Recommendations: Sanitize and encode all model outputs; restrict integrations to whitelisted tools and trusted domains; enforce policy and validation layers between model and tool execution.

Model Theft / Exfiltration (T01-MTR, T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Unauthorized access or exfiltration of model artifacts, weights, or parameters can lead to IP theft, cloning, or malicious redistribution of compromised versions.

Recommendations: Apply strict access controls to model repositories and serving endpoints; encrypt weights and checkpoints at rest; monitor for unauthorized exfiltration or replication.

Model Toxicity / Misinformation / Excessive Agency (T01-MTU, T01-MIS, T01-EA)

Mapped CWEs: CWE-345, CWE-1204, CWE-284

Rationale: Models may generate biased, harmful, or false information—or take autonomous actions based on toxic or deceptive outputs—causing reputational, ethical, or operational harm.

Recommendations: Integrate toxicity and bias post-filters; ground model outputs in verified sources; restrict actionable outputs via policy enforcement; require approvals for high-risk autonomous actions.

(10) Model Storage Infrastructure

Summary: Crown jewels at rest — must be encrypted, signed, and access-controlled.

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Mapped CWEs: CWE-494, CWE-353

Rationale: Attackers may modify or replace stored training or fine-tuning datasets, prompt templates, or embeddings in model storage repositories—resulting in malicious model behavior or backdoored outputs.

Recommendations: Apply cryptographic signing and checksums to all stored artifacts; maintain read-only and versioned storage for model and dataset files; require cryptographic attestation for model load operations.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-829, CWE-494

Rationale: Model dependencies, pre-trained weights, or third-party registries can be compromised, introducing malicious code or poisoned weights into the build and deployment pipelines.

Recommendations: Source models and dependencies only from trusted registries; verify lineage and digital signatures; pin dependency versions and verify integrity before loading or deployment.

Model Theft / Exfiltration (T01-MTR)

Mapped CWEs: CWE-276, CWE-284

Rationale: Unauthorized access or large-scale export of model artifacts, checkpoints, or container images can lead to theft of proprietary IP or replication of protected models.

Recommendations: Encrypt stored models and weights using KMS-managed keys; enforce least-privilege access for repositories and buckets; monitor for bulk download or anomalous access; harden default permissions and configurations.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Stored models or weight files can be altered, replaced, or disclosed if access controls, integrity checks, or permissions are weak—allowing attackers to inject malicious behavior or leak proprietary data.

Recommendations: Use WORM (Write Once, Read Many) or immutable storage for production models; perform integrity verification on model load; restrict access to service accounts with strict RBAC and scoped tokens.

(11) Model Serving Infrastructure

Summary: Execution gateway; must resist poisoning, theft, DoS, and unsafe outputs.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-494, CWE-353, CWE-829

Rationale: Model serving containers, preloaded weights, or dependencies may be replaced or tampered with during build or deployment, introducing malicious payloads or backdoored models into production pipelines.

Recommendations: Use signed and verified container images; validate checksums and digests for all model files; enforce SBOM-based provenance and signature verification; block deployment from untrusted or public registries.

Model Toxicity / Unreliable Outputs (T01-MTU)

Mapped CWEs: CWE-345, CWE-1204, CWE-75

Rationale: Deployed models may generate harmful, biased, or misleading content due to unmoderated outputs, missing grounding, or unreliable post-processing mechanisms.

Recommendations: Integrate moderation and toxicity filters into inference pipelines; perform grounding checks against trusted data sources; implement fallback or neutral responses when confidence is low or results are potentially unsafe.

Model Theft / Exfiltration (T01-MTR)

Mapped CWEs: CWE-276, CWE-284, CWE-285

Rationale: Insecure endpoints, weak authentication, or misconfigured storage permissions may allow adversaries to exfiltrate model weights, clone serving containers, or reconstruct models through inference scraping.

Recommendations: Enforce API rate limits and anomaly detection on inference endpoints; require mutual TLS (mTLS) and RBAC-based authorization; encrypt model weights at rest; harden file system permissions and disable anonymous or default service accounts.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Oversized, malformed, or high-rate inference requests can exhaust serving resources such as memory, CPU, or GPU queues—causing degraded availability or total service outages.

Recommendations: Cap input request sizes and token lengths; configure quotas and throttling at the API gateway; use circuit breakers and autoscaling for load protection; validate input buffers and parsers to prevent overflow or runaway generation.

(12) Evaluation

Summary: Where model quality and trustworthiness are validated; weak evaluation enables unsafe, biased, or manipulated outputs to pass undetected.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-116, CWE-1389

Rationale: Evaluation datasets and inputs can be crafted to evade detection or distort performance metrics, leading to false confidence in model robustness.

Recommendations: Normalize and validate evaluation inputs; perform adversarial testing under varied perturbations; apply outlier and embedding-space drift detection.

Data/Model Poisoning (T01-DMP)

Mapped CWEs: CWE-345, CWE-353, CWE-494

Rationale: Compromised datasets or poisoned models used during evaluation can skew metrics and conceal malicious alterations.

Recommendations: Validate datasets with cryptographic checksums and signatures; maintain golden reference baselines; verify model lineage before evaluation.

Log/Storage Information Disclosure (T01-LSID)

Mapped CWEs: CWE-117, CWE-200, CWE-532

Rationale: Logging of sensitive evaluation outputs, prompts, or internal metrics can expose confidential data or model behavior to unauthorized users.

Recommendations: Sanitize and minimize logged output; redact sensitive context or metadata; restrict access to evaluation logs and reports.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Evaluation pipelines may process datasets containing private or regulated information that could leak via reports, dashboards, or telemetry.

Recommendations: Apply data masking and DLP filters in evaluation output; enforce least-privilege access; encrypt all evaluation artifacts and summaries at rest.

Training Data Leakage (T01-TDL)

Mapped CWEs: CWE-201, CWE-359

Rationale: Evaluation datasets overlapping with training data can cause inflated scores and unintentional exposure of memorized content.

Recommendations: De-duplicate evaluation data against training sets; implement entropy and verbatim leakage filters; isolate training and evaluation environments.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Large or malformed evaluation inputs can overload inference services, exhausting compute resources or crashing evaluation pipelines.

Recommendations: Limit input and output sizes; apply quotas and circuit breakers on evaluation workloads; validate and sanitize input buffers.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Without toxicity, bias, or factual consistency tests, evaluation may miss unsafe, unreliable, or ungrounded model behaviors.

Recommendations: Include toxicity and bias detection in evaluation metrics; perform grounding verification against trusted sources; use human validation for high-impact outputs.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-74, CWE-75, CWE-693

Rationale: Unsafe rendering or display of model outputs in dashboards or visualization tools can lead to injection, cross-site scripting, or data corruption.

Recommendations: Apply contextual encoding for rendered outputs; sanitize HTML/Markdown before display; restrict rich content in evaluation interfaces.

Unsafe Evaluation Practices (TO1-UEP)

Mapped CWEs: CWE-352, CWE-825

Rationale: Lack of test isolation or dependency validation in evaluation frameworks can lead to contaminated results or untrusted code execution.

Recommendations: Isolate evaluation from training environments; enforce CSRF protection in evaluation tools; validate external dependencies and ensure reproducible runs.

(13) Training & Tuning

Summary: Where knowledge is forged; poor data embeds lasting bias and backdoors.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-116

Rationale: Adversarial or malformed training inputs (e.g., mislabeled, perturbed, or poisoned samples) can distort model learning and weaken resilience against evasion or misclassification attacks.

Recommendations: Enforce strict data schemas and canonical normalization during ingestion; perform adversarial resilience testing on training data; deploy anomaly detection to flag abnormal patterns in preprocessing pipelines.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-200

Rationale: Training datasets or feedback loops can contain inaccurate, biased, or manipulated content that skews model reasoning and propagates false or unsafe knowledge into production models.

Recommendations: Validate datasets against trusted reference sources; integrate human oversight for labeling and feedback verification; implement training-time grounding and periodic data quality audits.

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Mapped CWEs: CWE-353, CWE-494

Rationale: Attackers can inject poisoned examples or tampered prompt templates during fine-tuning or reinforcement learning phases, embedding persistent backdoors or bias.

Recommendations: Require cryptographically signed and versioned datasets; preserve immutable baselines for training runs; conduct adversarial and data integrity testing before deploying tuned models.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-494, CWE-284, CWE-285

Rationale: Compromised third-party packages, pre-trained weights, or data pipelines may introduce malicious code or tainted components into the model training environment.

Recommendations: Use trusted registries for dependencies and pre-trained models; enforce signature verification and provenance checks; apply hardened configuration defaults and scoped access for all training assets.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-359

Rationale: Insecure permissions or lack of encryption on model checkpoints and logs can allow unauthorized modification or exposure of sensitive model parameters and training data.

Recommendations: Encrypt model checkpoints, logs, and gradient data with strong key management (KMS); apply RBAC and access scoping to all storage locations; conduct regular permission audits and integrity checks across training infrastructure.

(14) Model Frameworks & Code

Summary: ML runtime backbone; supply chain or unsafe integrations taint the system.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-494, CWE-353, CWE-829

Rationale: Compromised ML frameworks, pre-compiled binaries, or third-party libraries can introduce backdoors, poisoned dependencies, or malicious behavior into runtime environments and training pipelines.

Recommendations: Pin dependency versions and require signed packages; scan for known vulnerabilities and integrity mismatches; maintain comprehensive SBOMs for all model and runtime components.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-284, CWE-285

Rationale: Weak runtime permissions, insecure service accounts, or lack of binary integrity validation can allow unauthorized modification or inspection of core ML frameworks, leading to altered inference logic or model theft.

Recommendations: Harden runtimes with restricted privileges; run services under least-privilege accounts; perform regular integrity and permission audits on framework binaries and configuration files.

Vulnerable External Workflow / Unsafe Integration (T01-VEW)

Mapped CWEs: CWE-94, CWE-95, CWE-918, CWE-502

Rationale: Unsafe plugin loading, dynamic evaluation, or insecure integrations within ML frameworks can enable remote code execution, SSRF, or deserialization exploits that compromise the serving environment.

Recommendations: Disable or sandbox dynamic eval and code-generation features; restrict plugin or module loading to trusted sources; isolate untrusted or experimental code in containers; harden deserialization routines and enforce strict content-type validation.

(15) Data Storage Infrastructure

Summary: Knowledge vault; poisoning/tampering/leaks here undermine integrity & confidentiality.

Runtime/Model/Data Poisoning (T01-RMP, T01-DMP, T01-DPFT, T01-SCMP)

Mapped CWEs: CWE-353, CWE-494, CWE-345

Rationale: Malicious or manipulated data, runtime parameters, or stored model artifacts can inject backdoors or bias into downstream inference and retraining workflows, compromising model integrity.

Recommendations: Perform integrity checks and cryptographic verification on stored artifacts; apply provenance and reputation scoring; maintain append-only or versioned storage; monitor for anomalies and poisoning indicators.

Sensitive Information Disclosure (T01-SID, T01-LSID)

Mapped CWEs: CWE-200, CWE-359, CWE-522, CWE-532

Rationale: Misconfigured databases, verbose logs, or shared storage buckets may expose credentials, tokens, or PII contained in datasets, checkpoints, or system logs.

Recommendations: Encrypt all sensitive data at rest using KMS-managed keys; enforce RBAC and access segmentation; sanitize and minimize logging of secrets or identifiers; monitor data-access patterns for anomalies.

Model/Data Tampering or Exfiltration (T01-MTD)

Mapped CWEs: CWE-276, CWE-284, CWE-285, CWE-922

Rationale: Weak storage permissions, shared access tokens, or lack of immutability controls can enable attackers to alter or exfiltrate stored model or dataset assets.

Recommendations: Disable public or overly broad ACLs; use per-tenant encryption keys; enforce least-privilege storage access; apply immutable or WORM storage for critical datasets and production models.

Denial of Service – Storage (T01-DoSS)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive data ingestion, unbounded file uploads, or malformed objects can exhaust storage capacity or crash parsers and metadata services, disrupting model training and access.

Recommendations: Enforce quotas and rate limits for data ingestion; validate and harden file parsers and buffer handling; apply throttling and back-pressure controls for high-volume writes or uploads.

(16) Training Data

Summary: Root of trust; compromise propagates to all downstream behavior.

Model Inversion / Membership Inference (T01-MIMI)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Attackers can query models or inspect intermediate representations to infer whether specific data records were included in training, exposing sensitive personal or proprietary information.

Recommendations: Apply differential privacy (e.g., DP-SGD) to limit per-sample influence; enforce strict RBAC and isolation for raw training data; monitor inference activity to detect inversion or membership-inference patterns.

Training Data Leakage (T01-TDL)

Mapped CWEs: CWE-200, CWE-359, CWE-353

Rationale: Sensitive or secret data can be inadvertently exposed during dataset preparation, preprocessing, or ingestion, allowing leakage through logs, pipelines, or model memory.

Recommendations: Encrypt datasets at rest and in transit; scrub credentials or tokens from preprocessing pipelines; tokenize or mask sensitive fields prior to ingestion and model training.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-276, CWE-284, CWE-285

Rationale: Inadequate access control on raw or processed training datasets enables unauthorized viewing or extraction of confidential or regulated data.

Recommendations: Enforce least-privilege access; implement row- and column-level data-access policies; continuously audit and alert on all access to sensitive data stores.

Data Authenticity (T01-DAU)

Mapped CWEs: CWE-345, CWE-494

Rationale: Lack of dataset provenance or version control allows tampered, mislabeled, or malicious data to contaminate training, degrading model reliability and security.

Recommendations: Maintain signed and version-controlled datasets; apply provenance and reputation scoring for all data sources; perform golden-set cross-validation to detect data drift or contamination.

(17) Data Filtering & Processing

Summary: Gatekeeper stage; weak validation lets poisoned/sensitive data pass.

Runtime / Data Poisoning (T01-RMP, T01-DMP, T01-DPFT)

Mapped CWEs: CWE-353, CWE-494, CWE-345

Rationale: Compromised or tampered datasets entering preprocessing pipelines can introduce malicious bias, backdoors, or instability in downstream models if integrity validation is weak.

Recommendations: Require signed and versioned datasets; verify file hashes and checksums during ingestion; apply statistical drift and anomaly detection to identify poisoned or manipulated data.

Sensitive Information Disclosure (T01-SID, T01-TDL, T01-MIMI)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Preprocessing or feature extraction may expose raw sensitive data—such as PII, credentials, or proprietary information—through intermediate files, logs, or feature stores.

Recommendations: Implement DLP scanning during preprocessing; mask or tokenize sensitive attributes before feature extraction; apply RBAC and access segmentation for all feature store operations.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Data processing scripts, plugins, or third-party connectors may invoke untrusted resources or deserialize unsafe content, enabling SSRF, RCE, or data exfiltration through external workflows.

Recommendations: Execute transformation jobs in sandboxed environments; apply outbound egress filtering and domain allowlists; prohibit unsafe deserialization and enforce strict content-type validation.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-353

Rationale: Preprocessing stages that fail to validate data sources or cross-check content may propagate incorrect or manipulated data into model training, resulting in biased or false learning outcomes.

Recommendations: Validate dataset sources through reputation and ground-truth scoring; perform cross-dataset consistency checks; require human review for data from high-risk or low-trust domains.

Denial of Service on Pipelines (T01-DoSP)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive data volume, malformed records, or unbounded streaming inputs can overwhelm preprocessing pipelines, causing latency, storage exhaustion, or crashes.

Recommendations: Enforce data size quotas and schema validation; apply ingestion rate limits and back-pressure controls; monitor for pipeline anomalies and memory spikes in ETL workloads.

(18) Data Sources

Summary: Entry point of truth; without provenance checks, they introduce poisoned/unsafe content.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Ingestion processes may inadvertently capture and store sensitive data such as PII, API keys, or confidential content without encryption or access control, leading to downstream exposure.

Recommendations: Implement DLP scanning at ingestion; enforce least-privilege credentials for ingestion pipelines; encrypt sensitive datasets in transit and at rest.

Data/Model Poisoning (T01-DMP)

Mapped CWEs: CWE-345, CWE-353, CWE-494

Rationale: Attackers can insert poisoned or manipulated data into ingestion sources, corrupting the model’s training corpus or runtime cache, leading to bias or hidden backdoors.

Recommendations: Enforce digital signatures and hash verification for ingested datasets; apply source reputation and provenance scoring; perform golden-set cross-validation to detect inconsistencies or anomalies.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Ingestion connectors or pipelines that pull data from external systems may process untrusted or malformed content, enabling SSRF, deserialization attacks, or malicious payload execution.

Recommendations: Use egress proxies and strict domain allowlists; reject unsafe data formats or content types; isolate ingestion connectors and third-party integrations in sandboxed environments.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-353

Rationale: Unverified or low-quality data sources may introduce false, biased, or adversarial information into training or analysis pipelines, degrading model accuracy and trustworthiness.

Recommendations: Apply reliability and reputation scoring for all data sources; cross-reference new data against ground-truth sets; perform continuous drift and consistency monitoring across ingestion pipelines.

(19) External Sources

Summary: Outside the trust boundary; major vectors for poisoning, leakage, and misinformation.

Model Inversion / Membership Inference (T01-MIMI)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Adversaries may exploit external model endpoints or shared datasets to infer private training data or reconstruct sensitive inputs through repeated probing or correlation analysis.

Recommendations: Deploy privacy-preserving APIs with data minimization; implement throttling and anomaly detection on external access; apply k-anonymity and differential privacy where feasible.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: External data integrations or shared access credentials can leak secrets or confidential information through exposed endpoints or weak encryption.

Recommendations: Manage all credentials through secret managers with rotation policies; enforce TLS with mutual authentication for external data exchanges; restrict and log all token usage.

Data/Model Poisoning (T01-DMP)

Mapped CWEs: CWE-345, CWE-353, CWE-494

Rationale: External sources or third-party datasets may inject malicious data or corrupted models that contaminate pipelines, resulting in degraded accuracy or compromised behavior.

Recommendations: Require data signing and checksum verification from external providers; cross-validate new data with reference or golden sets; establish vendor trust and supply-chain integrity contracts.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-353

Rationale: External feeds and open data sources may provide low-reliability or adversarial content that misguides training or inference outputs, spreading false narratives or bias.

Recommendations: Assign reliability scores and reputation metrics to external sources; validate information against ground-truth datasets; require human review for high-impact or public-facing data feeds.

76 KiB Raw Blame History Unescape Escape

2.2 Appendix E: AI Threats Mapping to AI Components Vulnerabilities (CVEs & CWEs)

(2) User Input

Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)

Denial of Service – Model (T01-DoSM)

Insecure Output Handling Triggered by Inputs (T01-IOH)

Model Toxicity / Unreliable Outputs (T01-MTU)

(3) User Output

Excessive Agency (T01-EA)

Sensitive Prompt Leakage (T01-SPL)

Misinformation (T01-MIS)

Insecure Output Handling (T01-IOH)

(4) Application

Prompt Injection (T01-DPIJ, T01-IPIJ)

Sensitive Information Disclosure (T01-SID, T01-SPL)

Denial of Service – Model (T01-DoSM)

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Insecure Output Handling (T01-IOH)

Excessive Agency (T01-EA)

(5) Agent / Plugin

Indirect Prompt Injection (T01-IPIJ)

Sensitive Information Disclosure (T01-SID)

Model Tampering / Disclosure (T01-MTD)

Excessive Agency (T01-EA)

Vulnerable External Workflow (T01-VEW)

(6) External Sources

Indirect Prompt Injection (T01-IPIJ)

Model Tampering/Disclosure (T01-MTD)

Sensitive Information Disclosure (T01-SID)

Excessive Agency (T01-EA)

Vulnerable External Workflow (T01-VEW)

Data / Model Poisoning (T01-DMP)

(7) Input Handling

Prompt Injection (T01-DPIJ)

Adversarial Input Evasion (T01-AIE)

Sensitive Information Disclosure (T01-SID, T01-LSID, T01-SPL)

Denial of Service – Model (T01-DoSM)

Vulnerable External Workflow (T01-VEW)

(8) Output Handling

Log/Storage Information Disclosure (T01-LSID)

Sensitive Information Disclosure (T01-SID, T01-SPL, T01-TDL)

Denial of Service – Model (T01-DoSM)

Insecure Output Handling (T01-IOH)

Training Data Leakage (T01-TDL)

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Excessive Agency (T01-EA)

(9) Model

Prompt Injection (T01-DPIJ, T01-IPIJ)

Supply Chain / Data & Fine-tuning Poisoning (T01-SCMP, T01-DPFT, T01-RMP, T01-DMP)

Adversarial Input Evasion (T01-AIE)

Sensitive Information Disclosure / Training Data Leakage (T01-SID, T01-TDL, T01-LSID, T01-SPL)

Model Inversion / Membership Inference (T01-MIMI)

Denial of Service – Model (T01-DoSM)

Insecure Output Handling / Unsafe Integrations (T01-IOH, T01-VEW)

Model Theft / Exfiltration (T01-MTR, T01-MTD)

Model Toxicity / Misinformation / Excessive Agency (T01-MTU, T01-MIS, T01-EA)

(10) Model Storage Infrastructure

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Supply Chain Model Poisoning (T01-SCMP)

Model Theft / Exfiltration (T01-MTR)

Model Tampering / Disclosure (T01-MTD)

(11) Model Serving Infrastructure

Supply Chain Model Poisoning (T01-SCMP)

Model Toxicity / Unreliable Outputs (T01-MTU)

Model Theft / Exfiltration (T01-MTR)

Denial of Service – Model (T01-DoSM)

(12) Evaluation

Adversarial Input Evasion (T01-AIE)

Data/Model Poisoning (T01-DMP)

Log/Storage Information Disclosure (T01-LSID)

Sensitive Information Disclosure (T01-SID)

Training Data Leakage (T01-TDL)

Denial of Service – Model (T01-DoSM)

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Insecure Output Handling (T01-IOH)

Unsafe Evaluation Practices (TO1-UEP)

(13) Training & Tuning

Adversarial Input Evasion (T01-AIE)

Misinformation (T01-MIS)

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

76 KiB

Raw Blame History