Files
www-project-ai-testing-guide/Document/content/2.2_Appendix_E.md
Marco Morana 9a9fa8448c Update 2.2_Appendix_E.md
Riveduto il mapping threats CWE, rattionale, reccomendations per consisenza
2025-10-22 15:25:50 -04:00

76 KiB
Raw Blame History

2.2 Appendix E: AI Threats Mapping to AI Components Vulnerabilities (CVEs & CWEs)

AI Penetration Testing Framework: Scoping, CVE/CWE Mapping, and Threat Correlation

This appendix guides penetration testers on mapping discovered CVEs and CWEs in SAIF components of an AI architecture to AI-specific threats. CVEs (Common Vulnerabilities and Exposures) generally point to specific, documented vulnerabilities in the underlying technology stack, such as libraries, frameworks, or APIs used to build AI systems and applications. CWEs (Common Weakness Enumerations), on the other hand, describe classes of software design or implementation flaws that may lead to such vulnerabilities.

Step 1 — Scoping AI Penetration Tests Within the SAIF Architecture

Because the pen tests described here target a live AI system/Application, careful scoping is essential: testers must first identify which SAIF components and subcomponents are in scope, enumerate the exact technologies deployed for each, and use that inventory to prioritize CVE/CWE enumeration and threat simulations. In-scope items commonly include components owned or operated by the organization and directly involved in the request→response flow, for example, chat UIs, API backends (e.g., FastAPI), session/orchestration layers, model orchestration frameworks (e.g., LangChain or LlamaIndex), vector stores (Redis, Pinecone, Weaviate), ETL/data pipelines, model-serving endpoints, and internally managed connectors. Because these components can contain outdated, misconfigured, or otherwise exploitable dependencies, the first operational step is threat enumeration: map each in-scope SAIF component to its tech stack, identify relevant CVEs (and corresponding CWEs), and derive likely exploit paths. That mapping then drives focused validation with scanners, SCA tools, and proof-of-concept testing so testers can prioritize, reproduce, and demonstrate how conventional software flaws translate into AI-centric impacts.

Step 2 — Threat Enumeration and CVE Exploit Path Mapping

The process of mapping threats to Ai system vulnerabilities starts by identifying known vulnerabilities expressed as CVEs in AI systems/applications using Software composition analyzers (SCAs) and runtime tools. SCA Tools (e.g., Snyk, Trivy, Dependabot, OWASP Dependency-Check, and GitHub Advanced Security) will flag vulnerable third party software dependencies, while scanners such as Nessus and Nuclei can confirm active CVE exposures in APIs and services. Runtime telemetry and host inspection can also validate which CVEs are exploitable in live environments. These CVEs are then mapped to AI-specific threats (i.e. TA0i-XX threats) outlined in this guide: for example, a FastAPI sanitization flaw (CVE-2022-36067) can be part of a prompt-injection vector (T01-DPIJ), and an Airflow ETL vulnerability (CVE-2022-40127) can lead to data poisoning (T01-DMP) in a RAG pipeline.

For each SAIF component in scope, testers review subcomponents, confirm deployed technologies, and run focused tests to find exploitable or unpatched libraries. These findings drive AI-specific attack simulations such as prompt injection, model inversion, data poisoning, or runtime DoS to reveal real application impact. Using the CVE exploit-path mapping table, testers can maintain traceability from vulnerability to AI impact. For instance, Redis in SAIF #4 (Application Layer) vulnerable to CVE-2022-0543 links to risks like data leakage (T01-SID), model disruption (T01-DoSM), and manipulation (T01-MTD). A single Redis compromise can escalate from infrastructure control to model tampering—compromising data integrity, availability, and trust.

Step 3 — AI Threat-to-CWE Mapping for Root Cause and Remediation

The final recommended step is to perform AI threat enumeration and CWE exploit-path mapping, transforming vulnerability centric testing into design level assurance. This appendix provides a Threat-to-SAIF-Component-to-CWE mapping, complementing the Threat-to-Test-Case mapping (AITG tests) presented earlier in this guide. Together, these enable testers to link AI-specific vulnerabilities—such as prompt injection, data leakage, or model poisoning—to their root causes, whether insecure design, implementation weakness, or misconfiguration. By classifying findings under CWE categories, testers connect penetration testing results to recognized software weakness patterns. This approach bridges the gap between patch management and secure architecture, guiding fixes that strengthen entire system layers rather than individual components. For example, CWE-20 (Improper Input Validation) reveals weak parsing logic; CWE-276 (Incorrect Default Permissions) highlights insecure cloud storage defaults; and CWE-345 (Insufficient Verification of Data Authenticity) exposes trust flaws in RAG ingestion pipelines.

During AITG testing across in-scope SAIF components, each failed test should identify the immediate issue and trace it to a corresponding CWE root cause. Reports should include both the weakness and an actionable recommendation—for instance, enforcing input validation, disabling public defaults, verifying dataset authenticity, or encrypting sensitive data. This shifts the testers message from “how I broke it” to “how to fix and redesign it.” As systems evolve, testers can update the CVE and CWE mappings to reflect new vulnerabilities and use the AI Threats column as a living checklist for future red-team exercises. This evolving matrix supports continuous validation and resilience in AI-enabled systems. Once fixes are implemented, corresponding AITG tests should be re-run to verify closure, with findings prioritized by risk severity (Critical, High, Medium, Low) and resolved per SLA targets. This structured, CWE-driven approach ensures AI testing results are not just diagnostic but actionable, improving both software resilience and long-term AI system risk posture.

AI Threat enumeration and CVE exploit path mapping

In this section we provide a mapping of SAIF components to AI threats and examples of component dependent tech-stack CVEs that can be exploited

SAIF Component (Number) Sub-Components Tech Stack (Chatbot + RAG) Mapped Threats Example CVEs in Tech Stack
(2) User Input Text, voice, multimodal parsers React/Next.js, Slack SDK, Teams Bot, Twilio, Whisper/ASR, FastAPI/Pydantic T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU React XSS (CVE-2021-24033); FastAPI vuln (CVE-2023-27533); Twilio SDK (CVE-2022-36449)
(3) User Output Renderers, formatting, TTS/visual output React chat widgets, Slack/Teams cards, Polly/ElevenLabs, Markdown renderers T01-EA, T01-SPL, T01-MIS, T01-IOH Slack API auth bypass (CVE-2020-10753); Markdown injection (CVE-2022-21681)
(4) Application Orchestration, session mgmt, APIs, business logic LangChain, LlamaIndex, Semantic Kernel, FastAPI/Flask, Redis sessions, GraphQL APIs T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS Flask template injection (CVE-2019-8341); Redis RCE (CVE-2022-0543); GraphQL DoS (CVE-2020-15159)
(5) Agent/Plugin Connectors, plugin registry, tool adapters LangGraph Agents, OpenAI Functions, Zapier/n8n, custom OpenAPI tools T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW n8n RCE (CVE-2023-37925); OpenAPI tooling parser injection (CVE-2021-32640)
(6) External Sources (App) APIs, SaaS services, enterprise connectors Salesforce, ServiceNow, Confluence, SharePoint APIs T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP Confluence RCE (CVE-2023-22515); SharePoint RCE (CVE-2023-29357)
(7) Input Handling Validation, sanitization, PII detection, scanning Pydantic, JSON Schema, Presidio, ClamAV T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW ClamAV RCE (CVE-2023-20032); JSON Schema validator injection (GitHub advisories)
(8) Output Handling Filters, moderation, redaction, grounding checks Guardrails.ai, OpenAI Moderation, NeMo Guardrails, RAGAS T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS NeMo Guardrails Python deps RCE (via PyTorch CVEs)
(9) Model LLM weights, embeddings, rerankers GPT-4o, Claude, Llama-3, Mistral, Cohere reranker, BGE embeddings T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS PyTorch vuln (CVE-2022-45907); TensorFlow overflow (CVE-2021-37678); Hugging Face sandbox escape (CVE-2023-6730)
(10) Model Storage Infrastructure Registry, encrypted artifacts MLflow, S3/GCS, Azure Blob, Vertex AI Registry T01-DPFT, T01-SCMP, T01-MTR, T01-MTD MLflow path traversal (CVE-2023-6836); AWS S3 bucket takeover misconfigs (CWE-based)
(11) Model Serving Infrastructure GPU runtimes, inference servers, autoscaling vLLM, NVIDIA Triton, TensorRT-LLM, Kubernetes GPU nodes T01-SCMP, T01-MTU, T01-MTR, T01-DoSM NVIDIA Triton RCE (CVE-2023-31036); Kubernetes privilege escalation (CVE-2023-3676); NVIDIA GPU DoS (CVE-2024-0146)
(12) Evaluation Golden sets, drift/bias eval, safety harness RAGAS, DeepEval, W&B, Evidently AI, Great Expectations T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS Weights & Biases CLI vuln (GitHub advisories); Great Expectations YAML injection (potential CWE-74)
(13) Training & Tuning Pipelines, fine-tuning, HPO Kubeflow, SageMaker, Hugging Face PEFT, Optuna T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD Kubeflow dashboard RCE (CVE-2021-31812); SageMaker Jupyter RCE (AWS advisory); Hugging Face PEFT vuln (CVE-2023-6730)
(14) Model Frameworks & Code Frameworks, tokenizers, compilers PyTorch, TensorFlow, Hugging Face, ONNX Runtime T01-SCMP, T01-MTD, T01-VEW TensorFlow buffer overflow (CVE-2021-37678); PyTorch vulnerability (CVE-2022-45907); ONNX Runtime DoS (CVE-2022-25883)
(15) Data Storage Infrastructure Vector DBs, RDBMS, object stores Weaviate, Pinecone, Milvus, Redis, Postgres, S3 T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID Redis RCE (CVE-2022-0543); PostgreSQL escalation (CVE-2023-2454); Milvus injection (CVE-2023-48022)
(16) Training Data Raw corpora, labeled, synthetic Chat logs, FAQs, Label Studio, synthetic Q&A T01-MIMI, T01-TDL, T01-SID Label Studio auth bypass (CVE-2021-36701)
(17) Data Filtering & Processing ETL, cleaning, chunking, tagging Airflow, dbt, Unstructured.io, spaCy, NLTK T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS Apache Airflow RCE (CVE-2023-42793); dbt adapter injection (GitHub advisories)
(18) Data Sources Internal KBs, CRM, telemetry Confluence, Jira, Elastic, Splunk T01-SID, T01-DMP, T01-VEW, T01-MIS Confluence RCE (CVE-2023-22515); Jira auth bypass (CVE-2020-14181); ElasticSearch RCE (CVE-2015-1427); Splunk RCE (CVE-2022-32158)
(19) External Sources Public datasets, 3rd party APIs/feeds Wikipedia, Common Crawl, arXiv, News APIs T01-MIMI, T01-SID, T01-DMP, T01-MIS Dataset poisoning risks (no CVEs, CWE-driven); API poisoning (CWE-345: Insufficient Verification of Data Authenticity)

AI Threat enumeration and Targeted CWEs

In this section we provide a mapping of SAIF components to AI threats and examples of vulnerability types/CWEs that can be exploited

SAIF Component Mapped Threats Targeted CWEs
(2) User Input T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-359, CWE-400, CWE-522, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-94
(3) User Output T01-EA, T01-SPL, T01-MIS, T01-IOH CWE-116, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-640, CWE-79, CWE-825
(4) Application T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-640, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-94
(5) Agent/Plugin T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94
(6) External Sources T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94
(7) Input Handling T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW CWE-117, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-359, CWE-400, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-770, CWE-787, CWE-829, CWE-918
(8) Output Handling T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS CWE-116, CWE-117, CWE-1204, CWE-200, CWE-201, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-532, CWE-640, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825
(9) Model T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS CWE-116, CWE-117, CWE-119, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-203, CWE-209, CWE-276, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94
(10) Model Storage Infra T01-DPFT, T01-SCMP, T01-MTR, T01-MTD CWE-276, CWE-284, CWE-285, CWE-494, CWE-522, CWE-829, CWE-830
(11) Model Serving Infra T01-SCMP, T01-MTU, T01-MTR, T01-DoSM CWE-1204, CWE-276, CWE-284, CWE-400, CWE-494, CWE-522, CWE-75, CWE-770, CWE-787, CWE-829
(12) Evaluation T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS CWE-116, CWE-117, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-522, CWE-532, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825
(13) Training & Tuning T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD CWE-1389, CWE-20, CWE-276, CWE-285, CWE-345, CWE-352, CWE-494, CWE-693, CWE-825, CWE-829, CWE-830
(14) Model Frameworks & Code T01-SCMP, T01-MTD, T01-VEW CWE-276, CWE-285, CWE-494, CWE-502, CWE-829, CWE-918
(15) Data Storage Infra T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID CWE-117, CWE-119, CWE-20, CWE-200, CWE-276, CWE-285, CWE-359, CWE-494, CWE-522, CWE-532, CWE-74, CWE-829, CWE-830, CWE-94
(16) Training Data T01-MIMI, T01-TDL, T01-SID CWE-200, CWE-201, CWE-203, CWE-359, CWE-522
(17) Data Filtering & Processing T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS CWE-119, CWE-20, CWE-200, CWE-201, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94
(18) Data Sources T01-SID, T01-DMP, T01-VEW, T01-MIS CWE-20, CWE-200, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-918
(19) External Sources T01-MIMI, T01-SID, T01-DMP, T01-MIS CWE-20, CWE-200, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-522, CWE-74, CWE-825

AI Threat-to-Component-to-CWE Mapping and Remediation Guidance

In this section, we present a mapping between AI system components, associated AI threats (as defined in the guides threat model), corresponding CWE categories, and remediation recommendations. Each mapping includes the rationale explaining how specific CWEs are exploited or exposed by those AI threats, providing a direct link between identified weaknesses and actionable fixes.

AI System Architectural Components & Data (Note):

Note: Component identifiers correspond to the SAIF numbering scheme illustrated in the threat model diagram within this guide.


(2) User Input

Summary: User Input is the front door of the system, every downstream component depends on it. Without strong input validation, filtering, and limits, it becomes the main vector for prompt injection, data leakage, DoS, and toxicity propagation.

Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94, CWE-707

Rationale: Maliciously crafted inputs (user prompts or embedded instructions) can override instructions, alter reasoning chains, or trigger unintended actions in connected tools.

Recommendations:

  • Apply strict input validation and canonicalization before passing content to the model.
  • Use prompt isolation or sandboxing (separate user and system instructions).
  • Enforce allowlist-based instruction and function patterns.
  • Perform adversarial prompt fuzzing and red-team testing.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Oversized, malformed, or adversarial inputs can exhaust tokenization, GPU, or compute capacity, leading to degraded performance or service unavailability.

Recommendations:

  • Enforce maximum input size and token limits.
  • Apply rate-limits and per-user quotas at API gateways.
  • Use circuit breakers and autoscaling to mitigate load spikes.

Insecure Output Handling Triggered by Inputs (T01-IOH)

Mapped CWEs: CWE-116, CWE-79

Rationale: Malicious inputs may propagate into rendered outputs (e.g., HTML, Markdown, or JSON), enabling injection or cross-site scripting attacks.

Recommendations:

  • Sanitize and contextually encode all rendered outputs.
  • Separate data from control characters; use safe templating and rendering frameworks.
  • Enforce strict content-type validation before presentation.

Model Toxicity / Unreliable Outputs (T01-MTU)

Mapped CWEs: CWE-707, CWE-345, CWE-1204

Rationale: Crafted or provocative user inputs can bias model behavior, steering it toward toxic, discriminatory, or ungrounded responses.

Recommendations:

  • Integrate toxicity and bias classifiers to pre-screen user prompts.
  • Use contextual and sentiment filters on incoming requests.
  • Escalate high-risk or policy-violating cases to human review workflows.

(3) User Output

Summary: The last mile to users/connected systems; without control, its a vector for excessive agency, prompt leakage, misinformation, and unsafe rendering.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Action-bearing model outputs (e.g., generated commands, API calls, workflow triggers) can execute privileged or irreversible operations without authorization or user oversight.

Recommendations:

  • Enforce least-privilege scopes for all actionable outputs.
  • Apply policy and authorization checks before rendering or executing UI-driven actions.
  • Maintain allowlists and require explicit human approvals for high-impact or sensitive actions.

Sensitive Prompt Leakage (T01-SPL)

Mapped CWEs: CWE-200, CWE-209, CWE-359, CWE-532

Rationale: Model outputs, error messages, or logs may inadvertently reveal hidden prompts, credentials, API keys, or personal information embedded in the conversation context.

Recommendations:

  • Redact secrets, PII, and system instructions prior to rendering or logging.
  • Use structured error wrappers; never expose raw stack traces or backend errors.
  • Segregate user-visible and operator logs; apply DLP scanning to prevent prompt or secret leakage.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Ungrounded, fabricated, or biased statements can appear credible when presented in the UI, eroding user trust or propagating false information.

Recommendations:

  • Require grounding and citation checks for high-risk or factual claims.
  • Integrate verification confidence scores and “needs review” flags for uncertain responses.
  • Route flagged outputs to human review or moderation pipelines.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-116, CWE-79, CWE-75

Rationale: Unsanitized model outputs rendered in rich text, HTML, or Markdown can lead to script execution, injection, or UI manipulation in downstream clients.

Recommendations:

  • Render outputs from structured formats (e.g., JSON, plain text) with context-aware encoding.
  • Sanitize HTML/Markdown through allowlisted elements and attributes.
  • Disable unsafe embeds, links, and inline scripts in all rendering environments.

(4) Application

Summary: The orchestration brain that manages sessions, APIs, and business logic. Weak validation, error handling, or access controls at this layer can cascade into systemic compromise across the entire application stack.

Prompt Injection (T01-DPIJ, T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Unvalidated or unescaped input injected into model orchestration logic or prompt templates can override instructions, bypass business rules, or trigger unintended system actions.

Recommendations:

  • Perform strict schema validation and canonicalization on all inputs.
  • Separate roles for user-authored, developer, and system instructions.
  • Introduce a safe interpreter or mediation layer between user input and model orchestration.
  • Conduct adversarial prompt-injection testing as part of QA.

Sensitive Information Disclosure (T01-SID, T01-SPL)

Mapped CWEs: CWE-200, CWE-209, CWE-359, CWE-522

Rationale: Secrets, credentials, or internal configuration details may leak through logs, prompt contexts, or plugin responses, exposing sensitive data or business logic.

Recommendations:

  • Redact secrets and PII from logs, prompts, and API responses.
  • Enforce RBAC and scoped access to sensitive configuration data.
  • Implement safe, user-friendly error handling that hides stack traces and internal state.
  • Apply DLP scanning on logs and telemetry.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive or malformed requests to the orchestration or inference service can saturate compute, memory, or token resources, leading to service unavailability.

Recommendations:

  • Apply rate-limiting and circuit breakers at API gateways and orchestration tiers.
  • Enforce input size, token, and format validation.
  • Implement workload isolation and quotas per tenant, API, or model instance.
  • Monitor runtime metrics to detect anomalous consumption patterns.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Models embedded in the application can generate harmful, biased, or false content when the orchestration lacks grounding, confidence thresholds, or moderation layers.

Recommendations:

  • Implement grounding and factual consistency checks using trusted data sources.
  • Integrate toxicity and bias filters in the inference pipeline.
  • Flag low-confidence or high-risk outputs for review before dissemination.
  • Apply continuous evaluation of model reliability and fairness metrics.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-116, CWE-75

Rationale: Improperly sanitized or encoded model outputs (HTML, Markdown, or JSON) rendered in dashboards or downstream clients can lead to injection, cross-site scripting, or data corruption.

Recommendations:

  • Apply contextual encoding and sanitization before rendering.
  • Strip or escape unsafe HTML/Markdown tags and attributes.
  • Use safe templating libraries or rendering frameworks.
  • Enforce output validation and content-type boundaries between services.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Autonomous agents or model-driven APIs may perform privileged actions—such as initiating transactions or modifying files—without appropriate oversight or authorization.

Recommendations:

  • Enforce least-privilege access for model plugins, agents, and integrations.
  • Maintain allowlists for sensitive operations and external service calls.
  • Require secondary approvals or human-in-the-loop validation for high-impact actions.
  • Log and audit all agent-initiated operations for accountability.

(5) Agent / Plugin

Summary: Extended arms of the system; vulnerable to IPIJ, secrets handling, tampering, excessive actions, and unsafe workflows.

Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Plugins or connected tools may receive crafted or hidden instructions embedded within user or system prompts that manipulate downstream components, alter intended behavior, or trigger unsafe code execution.

Recommendations: Enforce strict input/output schemas; escape or sanitize all parameters; prohibit dynamic code evaluation or direct command execution from model-generated content.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Model, plugin, or connected service exposes confidential data such as credentials, tokens, or personal information through logs, prompts, or API responses due to insufficient data protection or contextual awareness.

Recommendations: Use scoped, short-lived credentials; redact sensitive fields in tool and model outputs; apply data minimization and need-to-know access controls.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Model artifacts, weights, or configurations can be modified, replaced, or exfiltrated due to weak file permissions, missing integrity checks, or insecure deployment pipelines—allowing attackers to alter model behavior or leak intellectual property.

Recommendations: Enforce hardened file and storage permissions; validate model integrity via signed manifests; require digital signing and verification of all model artifacts before deployment.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Model or autonomous agent executes actions beyond its intended authority—such as invoking privileged APIs, modifying external systems, or performing unapproved transactions—due to insufficient access controls or unrestricted delegation.

Recommendations: Enforce per-action least privilege; implement policy gates for sensitive operations; require human-in-the-loop approval for high-risk or irreversible actions.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Model-integrated tools or external workflow components can be exploited through untrusted dependencies, SSRF vectors, or unsafe deserialization—allowing attackers to pivot into internal networks, exfiltrate data, or execute arbitrary code.

Recommendations: Maintain strict tool allowlists and egress proxy controls; enforce validation of content types and schema for external responses.


(6) External Sources

Summary: Bridges to the outside world; unverified data can inject poison, trigger unsafe actions, or spread misinformation.

Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Plugins or retrieval components may process crafted or malicious content from external sources (web pages, documents, APIs) that inject hidden instructions or alter model behavior through prompt manipulation.

Recommendations: Sanitize and normalize all retrieved external content; restrict accepted content types and formats; segregate and label retrieved data to prevent cross-context prompt injection.

Model Tampering/Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Model files, weights, or configurations can be modified or leaked through weak storage permissions, unverified updates, or insecure pipelines—allowing attackers to alter outputs, inject backdoors, or exfiltrate proprietary data.

Recommendations: Implement integrity and signature verification for all model artifacts; enforce least-privilege access and explicit change approvals; apply hardened storage permissions across training and deployment environments.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: AI models or connected tools may expose confidential data (e.g., tokens, credentials, personal identifiers) through logs, responses, or stored context due to insufficient redaction or access controls.

Recommendations: Mask sensitive fields in logs and outputs; use scoped OAuth credentials with minimal privileges; enforce data-loss-prevention (DLP) policies for prompt and response data flows.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: Model or agent autonomously performs privileged or unintended actions—such as calling sensitive APIs, modifying resources, or invoking external tools—without appropriate authorization or contextual policy validation.

Recommendations: Enforce RBAC and allowlists for data sources and actions; perform policy and safety checks before executing model-initiated operations; use sandboxed or isolated connectors to restrict external access.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Integrations or tools that interact with external workflows can be compromised via untrusted dependencies, SSRF, or unsafe deserialization, leading to unauthorized network access or remote code execution.

Recommendations: Enforce egress proxy and strict allowlists for outbound connections; validate and enforce safe content types; verify software supply chain integrity through signed releases and SBOM verification.

Data / Model Poisoning (T01-DMP)

Mapped CWEs: CWE-20, CWE-494, CWE-353

Rationale: Attackers inject malicious data or manipulate model artifacts during training, fine-tuning, or update pipelines, causing biased outputs, backdoors, or performance degradation.

Recommendations: Establish data provenance and reputation scoring mechanisms; perform adversarial sample and anomaly testing; apply cryptographic integrity checks on datasets and model artifacts throughout the pipeline.


(7) Input Handling

Summary: The filter layer; weak parsing/schema enforcement lets adversarial inputs/injections slip through.

Prompt Injection (T01-DPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Malicious user or system input manipulates model prompts to override instructions, inject new goals, or trigger unintended actions in downstream tools or connected systems.

Recommendations: Enforce strict input schemas and strong typing; strip unsafe control sequences and escape characters; sandbox and isolate user inputs before prompt assembly.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-1384

Rationale: Attackers craft adversarial inputs (e.g., perturbed tokens, unicode tricks, or obfuscated payloads) to evade model detection or classification boundaries, resulting in mispredictions or bypassing safety filters.

Recommendations: Normalize and sanitize Unicode and encoding variations; conduct adversarial robustness testing; apply layered input validation and confidence thresholding.

Sensitive Information Disclosure (T01-SID, T01-LSID, T01-SPL)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Sensitive data (e.g., API keys, secrets, PII, or training data) is exposed during ingestion, inference, or logging due to unredacted inputs, verbose errors, or unsafe context retention.

Recommendations: Apply ingestion-time redaction for sensitive terms; mask or tokenize secrets in logs; sanitize logs, error traces, and tool responses to prevent data leakage.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Oversized or malformed inputs and unbounded request rates can exhaust GPU, memory, or CPU resources in model inference services, leading to degraded performance or service outages.

Recommendations: Enforce input size and rate quotas; validate buffer dimensions and tensor structures before inference execution.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: External toolchains, webhooks, or retrieval flows can be exploited through untrusted dependencies, SSRF, or unsafe deserialization to access internal networks or execute arbitrary code.

Recommendations: Use domain-based allowlists with outbound proxy enforcement; validate and enforce safe content types for all retrieved or external resources.


(8) Output Handling

Summary: Safety gate before delivery; failure here leaks sensitive data, misinformation, and unsafe content.

Log/Storage Information Disclosure (T01-LSID)

Mapped CWEs: CWE-200, CWE-532, CWE-522

Rationale: Logs or persistent storage may capture raw model outputs, user prompts, or tokens that contain sensitive information. Without redaction, encryption, or access controls, these records can expose secrets, PII, or proprietary context.

Recommendations: Strip sensitive context from stored logs and outputs; enforce RBAC and least privilege for log access; use sanitized and generic error messages.

Sensitive Information Disclosure (T01-SID, T01-SPL, T01-TDL)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Output layers may inadvertently reveal secrets, PII, or confidential training data through generated responses, summaries, or recalled examples.

Recommendations: Apply post-output DLP scanning; encrypt or mask sensitive fields before returning to clients; prevent recall or verbatim exposure of sensitive training data rows.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessively large or malformed outputs (e.g., runaway text generation, long JSON sequences) can overflow downstream buffers or consume significant rendering resources, impacting availability.

Recommendations: Cap output size and token limits; quarantine or truncate oversized responses; validate downstream buffer and rendering capacities.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-116, CWE-75

Rationale: Untrusted model outputs rendered as HTML, Markdown, or code without proper encoding can lead to injection attacks or content manipulation in client or downstream systems.

Recommendations: Use contextual output encoding and allowlisted sanitization routines; disable rich rendering for untrusted text or code blocks; enforce strict content-type boundaries.

Training Data Leakage (T01-TDL)

Mapped CWEs: CWE-201, CWE-359

Rationale: Models may emit verbatim snippets or memorized content from their training data, including personally identifiable or proprietary information.

Recommendations: Employ differential privacy during training; use verbatim and entropy-based leakage filters; redact prompt and output logs; restrict access to model telemetry or trace data.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Generated outputs may include harmful, biased, or false information due to unfiltered model behavior or insufficient grounding in verified sources.

Recommendations: Integrate toxicity and bias filters; require grounding and citations to trusted datasets; implement fallback responses when confidence is low or bias is detected.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285

Rationale: The model or its connected tools execute actions automatically (e.g., API calls, file writes, system changes) without explicit authorization or confirmation.

Recommendations: Restrict actions to allowlisted commands; apply authorization and policy checks before execution; require explicit human confirmation for high-impact operations.


(9) Model

Summary: The core intelligence; targeted by injection, poisoning, theft, inversion, DoS, and unsafe outputs.

Prompt Injection (T01-DPIJ, T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94

Rationale: Crafted user or system inputs can override, manipulate, or insert instructions within model prompts—altering the models intended reasoning path or causing execution of untrusted actions.

Recommendations: Separate system, developer, and user prompts into isolated contexts; apply tokenizer-stage filtering and normalization; conduct adversarial training to harden against prompt manipulation.

Supply Chain / Data & Fine-tuning Poisoning (T01-SCMP, T01-DPFT, T01-RMP, T01-DMP)

Mapped CWEs: CWE-494, CWE-353, CWE-829

Rationale: Model training or fine-tuning data, dependencies, or weights can be poisoned or replaced through compromised datasets, malicious model checkpoints, or tampered packages in the supply chain.

Recommendations: Use digitally signed model weights and datasets; apply provenance and reputation scoring; sanitize fine-tuning data for adversarial patterns; maintain SBOMs for all model components.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-1384

Rationale: Adversarially perturbed inputs exploit model weaknesses to evade detection or cause misclassification, often through subtle token-level or embedding-space manipulation.

Recommendations: Normalize inputs prior to tokenization; perform robustness and adversarial testing across datasets; monitor embedding distributions for drift or anomalies.

Sensitive Information Disclosure / Training Data Leakage (T01-SID, T01-TDL, T01-LSID, T01-SPL)

Mapped CWEs: CWE-200, CWE-359, CWE-201

Rationale: Model parameters or outputs may expose memorized training data, sensitive context, or private attributes through unfiltered responses or model inversion attempts.

Recommendations: Apply differential privacy during training (e.g., DP-SGD); block verbatim sequence recall; redact sensitive tokens in system prompts; restrict or sanitize inference-time logging.

Model Inversion / Membership Inference (T01-MIMI)

Mapped CWEs: CWE-203, CWE-359

Rationale: Attackers query the model to infer whether specific data records were used during training, or reconstruct sensitive training examples via inversion techniques.

Recommendations: Use DP-SGD or other noise-based privacy mechanisms; enforce rate limits and output randomization; conduct dedicated membership-inference red teaming to validate resilience.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive model context lengths, complex prompt chains, or malformed inference payloads can overload GPU/CPU resources, leading to degraded performance or outages.

Recommendations: Cap model context and token limits; detect abnormal inference patterns or anomalies; harden serving buffers and apply per-request resource quotas.

Insecure Output Handling / Unsafe Integrations (T01-IOH, T01-VEW)

Mapped CWEs: CWE-79, CWE-116, CWE-829

Rationale: Model outputs may contain untrusted data or unsafe formatting passed to external systems, or integrations may process outputs without sanitization—leading to injection or workflow compromise.

Recommendations: Sanitize and encode all model outputs; restrict integrations to whitelisted tools and trusted domains; enforce policy and validation layers between model and tool execution.

Model Theft / Exfiltration (T01-MTR, T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Unauthorized access or exfiltration of model artifacts, weights, or parameters can lead to IP theft, cloning, or malicious redistribution of compromised versions.

Recommendations: Apply strict access controls to model repositories and serving endpoints; encrypt weights and checkpoints at rest; monitor for unauthorized exfiltration or replication.

Model Toxicity / Misinformation / Excessive Agency (T01-MTU, T01-MIS, T01-EA)

Mapped CWEs: CWE-345, CWE-1204, CWE-284

Rationale: Models may generate biased, harmful, or false information—or take autonomous actions based on toxic or deceptive outputs—causing reputational, ethical, or operational harm.

Recommendations: Integrate toxicity and bias post-filters; ground model outputs in verified sources; restrict actionable outputs via policy enforcement; require approvals for high-risk autonomous actions.


(10) Model Storage Infrastructure

Summary: Crown jewels at rest — must be encrypted, signed, and access-controlled.

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Mapped CWEs: CWE-494, CWE-353

Rationale: Attackers may modify or replace stored training or fine-tuning datasets, prompt templates, or embeddings in model storage repositories—resulting in malicious model behavior or backdoored outputs.

Recommendations: Apply cryptographic signing and checksums to all stored artifacts; maintain read-only and versioned storage for model and dataset files; require cryptographic attestation for model load operations.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-829, CWE-494

Rationale: Model dependencies, pre-trained weights, or third-party registries can be compromised, introducing malicious code or poisoned weights into the build and deployment pipelines.

Recommendations: Source models and dependencies only from trusted registries; verify lineage and digital signatures; pin dependency versions and verify integrity before loading or deployment.

Model Theft / Exfiltration (T01-MTR)

Mapped CWEs: CWE-276, CWE-284

Rationale: Unauthorized access or large-scale export of model artifacts, checkpoints, or container images can lead to theft of proprietary IP or replication of protected models.

Recommendations: Encrypt stored models and weights using KMS-managed keys; enforce least-privilege access for repositories and buckets; monitor for bulk download or anomalous access; harden default permissions and configurations.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494

Rationale: Stored models or weight files can be altered, replaced, or disclosed if access controls, integrity checks, or permissions are weak—allowing attackers to inject malicious behavior or leak proprietary data.

Recommendations: Use WORM (Write Once, Read Many) or immutable storage for production models; perform integrity verification on model load; restrict access to service accounts with strict RBAC and scoped tokens.


(11) Model Serving Infrastructure

Summary: Execution gateway; must resist poisoning, theft, DoS, and unsafe outputs.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-494, CWE-353, CWE-829

Rationale: Model serving containers, preloaded weights, or dependencies may be replaced or tampered with during build or deployment, introducing malicious payloads or backdoored models into production pipelines.

Recommendations: Use signed and verified container images; validate checksums and digests for all model files; enforce SBOM-based provenance and signature verification; block deployment from untrusted or public registries.

Model Toxicity / Unreliable Outputs (T01-MTU)

Mapped CWEs: CWE-345, CWE-1204, CWE-75

Rationale: Deployed models may generate harmful, biased, or misleading content due to unmoderated outputs, missing grounding, or unreliable post-processing mechanisms.

Recommendations: Integrate moderation and toxicity filters into inference pipelines; perform grounding checks against trusted data sources; implement fallback or neutral responses when confidence is low or results are potentially unsafe.

Model Theft / Exfiltration (T01-MTR)

Mapped CWEs: CWE-276, CWE-284, CWE-285

Rationale: Insecure endpoints, weak authentication, or misconfigured storage permissions may allow adversaries to exfiltrate model weights, clone serving containers, or reconstruct models through inference scraping.

Recommendations: Enforce API rate limits and anomaly detection on inference endpoints; require mutual TLS (mTLS) and RBAC-based authorization; encrypt model weights at rest; harden file system permissions and disable anonymous or default service accounts.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Oversized, malformed, or high-rate inference requests can exhaust serving resources such as memory, CPU, or GPU queues—causing degraded availability or total service outages.

Recommendations: Cap input request sizes and token lengths; configure quotas and throttling at the API gateway; use circuit breakers and autoscaling for load protection; validate input buffers and parsers to prevent overflow or runaway generation.


(12) Evaluation

Summary: Where model quality and trustworthiness are validated; weak evaluation enables unsafe, biased, or manipulated outputs to pass undetected.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-116, CWE-1389

Rationale: Evaluation datasets and inputs can be crafted to evade detection or distort performance metrics, leading to false confidence in model robustness.

Recommendations: Normalize and validate evaluation inputs; perform adversarial testing under varied perturbations; apply outlier and embedding-space drift detection.

Data/Model Poisoning (T01-DMP)

Mapped CWEs: CWE-345, CWE-353, CWE-494

Rationale: Compromised datasets or poisoned models used during evaluation can skew metrics and conceal malicious alterations.

Recommendations: Validate datasets with cryptographic checksums and signatures; maintain golden reference baselines; verify model lineage before evaluation.

Log/Storage Information Disclosure (T01-LSID)

Mapped CWEs: CWE-117, CWE-200, CWE-532

Rationale: Logging of sensitive evaluation outputs, prompts, or internal metrics can expose confidential data or model behavior to unauthorized users.

Recommendations: Sanitize and minimize logged output; redact sensitive context or metadata; restrict access to evaluation logs and reports.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Evaluation pipelines may process datasets containing private or regulated information that could leak via reports, dashboards, or telemetry.

Recommendations: Apply data masking and DLP filters in evaluation output; enforce least-privilege access; encrypt all evaluation artifacts and summaries at rest.

Training Data Leakage (T01-TDL)

Mapped CWEs: CWE-201, CWE-359

Rationale: Evaluation datasets overlapping with training data can cause inflated scores and unintentional exposure of memorized content.

Recommendations: De-duplicate evaluation data against training sets; implement entropy and verbatim leakage filters; isolate training and evaluation environments.

Denial of Service Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Large or malformed evaluation inputs can overload inference services, exhausting compute resources or crashing evaluation pipelines.

Recommendations: Limit input and output sizes; apply quotas and circuit breakers on evaluation workloads; validate and sanitize input buffers.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204

Rationale: Without toxicity, bias, or factual consistency tests, evaluation may miss unsafe, unreliable, or ungrounded model behaviors.

Recommendations: Include toxicity and bias detection in evaluation metrics; perform grounding verification against trusted sources; use human validation for high-impact outputs.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-74, CWE-75, CWE-693

Rationale: Unsafe rendering or display of model outputs in dashboards or visualization tools can lead to injection, cross-site scripting, or data corruption.

Recommendations: Apply contextual encoding for rendered outputs; sanitize HTML/Markdown before display; restrict rich content in evaluation interfaces.

Unsafe Evaluation Practices (TO1-UEP)

Mapped CWEs: CWE-352, CWE-825

Rationale: Lack of test isolation or dependency validation in evaluation frameworks can lead to contaminated results or untrusted code execution.

Recommendations: Isolate evaluation from training environments; enforce CSRF protection in evaluation tools; validate external dependencies and ensure reproducible runs.


(13) Training & Tuning

Summary: Where knowledge is forged; poor data embeds lasting bias and backdoors.

Adversarial Input Evasion (T01-AIE)

Mapped CWEs: CWE-20, CWE-116

Rationale: Adversarial or malformed training inputs (e.g., mislabeled, perturbed, or poisoned samples) can distort model learning and weaken resilience against evasion or misclassification attacks.

Recommendations: Enforce strict data schemas and canonical normalization during ingestion; perform adversarial resilience testing on training data; deploy anomaly detection to flag abnormal patterns in preprocessing pipelines.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-200

Rationale: Training datasets or feedback loops can contain inaccurate, biased, or manipulated content that skews model reasoning and propagates false or unsafe knowledge into production models.

Recommendations: Validate datasets against trusted reference sources; integrate human oversight for labeling and feedback verification; implement training-time grounding and periodic data quality audits.

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Mapped CWEs: CWE-353, CWE-494

Rationale: Attackers can inject poisoned examples or tampered prompt templates during fine-tuning or reinforcement learning phases, embedding persistent backdoors or bias.

Recommendations: Require cryptographically signed and versioned datasets; preserve immutable baselines for training runs; conduct adversarial and data integrity testing before deploying tuned models.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-494, CWE-284, CWE-285

Rationale: Compromised third-party packages, pre-trained weights, or data pipelines may introduce malicious code or tainted components into the model training environment.

Recommendations: Use trusted registries for dependencies and pre-trained models; enforce signature verification and provenance checks; apply hardened configuration defaults and scoped access for all training assets.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-359

Rationale: Insecure permissions or lack of encryption on model checkpoints and logs can allow unauthorized modification or exposure of sensitive model parameters and training data.

Recommendations: Encrypt model checkpoints, logs, and gradient data with strong key management (KMS); apply RBAC and access scoping to all storage locations; conduct regular permission audits and integrity checks across training infrastructure.


(14) Model Frameworks & Code

Summary: ML runtime backbone; supply chain or unsafe integrations taint the system.

Supply Chain Model Poisoning (T01-SCMP)

Mapped CWEs: CWE-494, CWE-353, CWE-829

Rationale: Compromised ML frameworks, pre-compiled binaries, or third-party libraries can introduce backdoors, poisoned dependencies, or malicious behavior into runtime environments and training pipelines.

Recommendations: Pin dependency versions and require signed packages; scan for known vulnerabilities and integrity mismatches; maintain comprehensive SBOMs for all model and runtime components.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-284, CWE-285

Rationale: Weak runtime permissions, insecure service accounts, or lack of binary integrity validation can allow unauthorized modification or inspection of core ML frameworks, leading to altered inference logic or model theft.

Recommendations: Harden runtimes with restricted privileges; run services under least-privilege accounts; perform regular integrity and permission audits on framework binaries and configuration files.

Vulnerable External Workflow / Unsafe Integration (T01-VEW)

Mapped CWEs: CWE-94, CWE-95, CWE-918, CWE-502

Rationale: Unsafe plugin loading, dynamic evaluation, or insecure integrations within ML frameworks can enable remote code execution, SSRF, or deserialization exploits that compromise the serving environment.

Recommendations: Disable or sandbox dynamic eval and code-generation features; restrict plugin or module loading to trusted sources; isolate untrusted or experimental code in containers; harden deserialization routines and enforce strict content-type validation.


(15) Data Storage Infrastructure

Summary: Knowledge vault; poisoning/tampering/leaks here undermine integrity & confidentiality.

Runtime/Model/Data Poisoning (T01-RMP, T01-DMP, T01-DPFT, T01-SCMP)

Mapped CWEs: CWE-353, CWE-494, CWE-345

Rationale: Malicious or manipulated data, runtime parameters, or stored model artifacts can inject backdoors or bias into downstream inference and retraining workflows, compromising model integrity.

Recommendations: Perform integrity checks and cryptographic verification on stored artifacts; apply provenance and reputation scoring; maintain append-only or versioned storage; monitor for anomalies and poisoning indicators.

Sensitive Information Disclosure (T01-SID, T01-LSID)

Mapped CWEs: CWE-200, CWE-359, CWE-522, CWE-532

Rationale: Misconfigured databases, verbose logs, or shared storage buckets may expose credentials, tokens, or PII contained in datasets, checkpoints, or system logs.

Recommendations: Encrypt all sensitive data at rest using KMS-managed keys; enforce RBAC and access segmentation; sanitize and minimize logging of secrets or identifiers; monitor data-access patterns for anomalies.

Model/Data Tampering or Exfiltration (T01-MTD)

Mapped CWEs: CWE-276, CWE-284, CWE-285, CWE-922

Rationale: Weak storage permissions, shared access tokens, or lack of immutability controls can enable attackers to alter or exfiltrate stored model or dataset assets.

Recommendations: Disable public or overly broad ACLs; use per-tenant encryption keys; enforce least-privilege storage access; apply immutable or WORM storage for critical datasets and production models.

Denial of Service Storage (T01-DoSS)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive data ingestion, unbounded file uploads, or malformed objects can exhaust storage capacity or crash parsers and metadata services, disrupting model training and access.

Recommendations: Enforce quotas and rate limits for data ingestion; validate and harden file parsers and buffer handling; apply throttling and back-pressure controls for high-volume writes or uploads.


(16) Training Data

Summary: Root of trust; compromise propagates to all downstream behavior.

Model Inversion / Membership Inference (T01-MIMI)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Attackers can query models or inspect intermediate representations to infer whether specific data records were included in training, exposing sensitive personal or proprietary information.

Recommendations: Apply differential privacy (e.g., DP-SGD) to limit per-sample influence; enforce strict RBAC and isolation for raw training data; monitor inference activity to detect inversion or membership-inference patterns.

Training Data Leakage (T01-TDL)

Mapped CWEs: CWE-200, CWE-359, CWE-353

Rationale: Sensitive or secret data can be inadvertently exposed during dataset preparation, preprocessing, or ingestion, allowing leakage through logs, pipelines, or model memory.

Recommendations: Encrypt datasets at rest and in transit; scrub credentials or tokens from preprocessing pipelines; tokenize or mask sensitive fields prior to ingestion and model training.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-276, CWE-284, CWE-285

Rationale: Inadequate access control on raw or processed training datasets enables unauthorized viewing or extraction of confidential or regulated data.

Recommendations: Enforce least-privilege access; implement row- and column-level data-access policies; continuously audit and alert on all access to sensitive data stores.

Data Authenticity (T01-DAU)

Mapped CWEs: CWE-345, CWE-494

Rationale: Lack of dataset provenance or version control allows tampered, mislabeled, or malicious data to contaminate training, degrading model reliability and security.

Recommendations: Maintain signed and version-controlled datasets; apply provenance and reputation scoring for all data sources; perform golden-set cross-validation to detect data drift or contamination.


(17) Data Filtering & Processing

Summary: Gatekeeper stage; weak validation lets poisoned/sensitive data pass.

Runtime / Data Poisoning (T01-RMP, T01-DMP, T01-DPFT)

Mapped CWEs: CWE-353, CWE-494, CWE-345

Rationale: Compromised or tampered datasets entering preprocessing pipelines can introduce malicious bias, backdoors, or instability in downstream models if integrity validation is weak.

Recommendations: Require signed and versioned datasets; verify file hashes and checksums during ingestion; apply statistical drift and anomaly detection to identify poisoned or manipulated data.

Sensitive Information Disclosure (T01-SID, T01-TDL, T01-MIMI)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Preprocessing or feature extraction may expose raw sensitive data—such as PII, credentials, or proprietary information—through intermediate files, logs, or feature stores.

Recommendations: Implement DLP scanning during preprocessing; mask or tokenize sensitive attributes before feature extraction; apply RBAC and access segmentation for all feature store operations.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Data processing scripts, plugins, or third-party connectors may invoke untrusted resources or deserialize unsafe content, enabling SSRF, RCE, or data exfiltration through external workflows.

Recommendations: Execute transformation jobs in sandboxed environments; apply outbound egress filtering and domain allowlists; prohibit unsafe deserialization and enforce strict content-type validation.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-353

Rationale: Preprocessing stages that fail to validate data sources or cross-check content may propagate incorrect or manipulated data into model training, resulting in biased or false learning outcomes.

Recommendations: Validate dataset sources through reputation and ground-truth scoring; perform cross-dataset consistency checks; require human review for data from high-risk or low-trust domains.

Denial of Service on Pipelines (T01-DoSP)

Mapped CWEs: CWE-400, CWE-770, CWE-787

Rationale: Excessive data volume, malformed records, or unbounded streaming inputs can overwhelm preprocessing pipelines, causing latency, storage exhaustion, or crashes.

Recommendations: Enforce data size quotas and schema validation; apply ingestion rate limits and back-pressure controls; monitor for pipeline anomalies and memory spikes in ETL workloads.


(18) Data Sources

Summary: Entry point of truth; without provenance checks, they introduce poisoned/unsafe content.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Ingestion processes may inadvertently capture and store sensitive data such as PII, API keys, or confidential content without encryption or access control, leading to downstream exposure.

Recommendations: Implement DLP scanning at ingestion; enforce least-privilege credentials for ingestion pipelines; encrypt sensitive datasets in transit and at rest.

Data/Model Poisoning (T01-DMP)

Mapped CWEs: CWE-345, CWE-353, CWE-494

Rationale: Attackers can insert poisoned or manipulated data into ingestion sources, corrupting the models training corpus or runtime cache, leading to bias or hidden backdoors.

Recommendations: Enforce digital signatures and hash verification for ingested datasets; apply source reputation and provenance scoring; perform golden-set cross-validation to detect inconsistencies or anomalies.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502

Rationale: Ingestion connectors or pipelines that pull data from external systems may process untrusted or malformed content, enabling SSRF, deserialization attacks, or malicious payload execution.

Recommendations: Use egress proxies and strict domain allowlists; reject unsafe data formats or content types; isolate ingestion connectors and third-party integrations in sandboxed environments.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-353

Rationale: Unverified or low-quality data sources may introduce false, biased, or adversarial information into training or analysis pipelines, degrading model accuracy and trustworthiness.

Recommendations: Apply reliability and reputation scoring for all data sources; cross-reference new data against ground-truth sets; perform continuous drift and consistency monitoring across ingestion pipelines.


(19) External Sources

Summary: Outside the trust boundary; major vectors for poisoning, leakage, and misinformation.

Model Inversion / Membership Inference (T01-MIMI)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: Adversaries may exploit external model endpoints or shared datasets to infer private training data or reconstruct sensitive inputs through repeated probing or correlation analysis.

Recommendations: Deploy privacy-preserving APIs with data minimization; implement throttling and anomaly detection on external access; apply k-anonymity and differential privacy where feasible.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522

Rationale: External data integrations or shared access credentials can leak secrets or confidential information through exposed endpoints or weak encryption.

Recommendations: Manage all credentials through secret managers with rotation policies; enforce TLS with mutual authentication for external data exchanges; restrict and log all token usage.

Data/Model Poisoning (T01-DMP)

Mapped CWEs: CWE-345, CWE-353, CWE-494

Rationale: External sources or third-party datasets may inject malicious data or corrupted models that contaminate pipelines, resulting in degraded accuracy or compromised behavior.

Recommendations: Require data signing and checksum verification from external providers; cross-validate new data with reference or golden sets; establish vendor trust and supply-chain integrity contracts.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-353

Rationale: External feeds and open data sources may provide low-reliability or adversarial content that misguides training or inference outputs, spreading false narratives or bias.

Recommendations: Assign reliability scores and reputation metrics to external sources; validate information against ground-truth datasets; require human review for high-impact or public-facing data feeds.