CalvinBackup/www-project-ai-testing-guide

mirror of https://github.com/OWASP/www-project-ai-testing-guide.git synced 2026-03-21 09:46:33 +00:00

Files

Marco Morana d8703cb1d0 Update 2.2_Appendix_E.md

Should be the last. correction. I hope

2025-10-15 13:57:41 -04:00

38 KiB

Raw Blame History

Appendix E: SAIF AI Threat Targeted Components & CVEs/CWEs

This appendix guides penetration testers on translating discovered CVEs and CWEs into AI-specific threats and concrete test cases mapped against the SAIF components of an AI architecture. CVEs generally point to vulnerabilities in the underlying technology stack — libraries, frameworks, APIs, and services that implement user interfaces, model layers, supporting infrastructure, or data sources. Because the pen tests described here target a live application, careful scoping is essential: testers must first identify which SAIF components and subcomponents are in scope, enumerate the exact technologies deployed for each, and use that inventory to prioritize CVE/CWE enumeration and threat simulations. In-scope items commonly include components owned or operated by the organization and directly involved in the request→response flow — for example, chat UIs, API backends (e.g., FastAPI), session/orchestration layers, model orchestration frameworks (e.g., LangChain or LlamaIndex), vector stores (Redis, Pinecone, Weaviate), ETL/data pipelines, model-serving endpoints, and internally managed connectors. Because these components can contain outdated, misconfigured, or otherwise exploitable dependencies, the first operational step is threat enumeration: map each in-scope SAIF component to its tech stack, identify relevant CVEs (and corresponding CWEs), and derive likely exploit paths. That mapping then drives focused validation with scanners, SCA tools, and proof-of-concept testing so testers can prioritize, reproduce, and demonstrate how conventional software flaws translate into AI-centric impacts.

To start, the tester performs Threat enumeration and mappping of CVE exploit paths across the in-scope technology stack. This begins with discovering known vulnerabilities using both SCA and runtime tools: software composition analyzers (Snyk, Trivy, Dependabot) reveal vulnerable dependencies and libraries, while network and host scanners (Nessus, Nuclei) validate active exposures in services and APIs. Runtime telemetry and host-level inspection add further evidence of exploitability in live environments where vulnerable components are installed and running. Identified CVEs are then translated into AI-specific risks using the AI Threats column: a web issue like a FastAPI sanitization flaw (CVE-2022-36067) becomes a direct prompt-injection vector (T01-DPIJ) when an LLM ingests tainted inputs, and an ETL or retrieval vulnerability such as CVE-2022-40127 can be leveraged to perform remote code execution or data corruption that manifests as data poisoning (T01-DMP) in a RAG pipeline. Mapping each CVE to the relevant AI threat converts a routine vulnerability finding into a concrete attack path, making it possible to explain and demonstrate the real impact on model behavior, data integrity, confidentiality, and availability.

For each SAIF component in scope, the tester inspects subcomponents to identify where injection, poisoning, or manipulation are possible, confirms the actual technologies deployed, and runs tests to discover vulnerable or unpatched libraries and CVEs. Those technical findings then drive simulations of AI-specific attacks for example prompt injection, model inversion and membership inference, data poisoning, and runtime DoS, so the tester can demonstrate real impact on the application and its model behavior. Pen test reports should use the “Threat enumeration and CVE exploit-path mappings” table to preserve traceability between vulnerabilities and AI impacts. The mapping lets a tester convert a conventional software finding into a concrete AI attack path and explain how exploitation affects data integrity, confidentiality, availability, or model trust. For example, Redis used in SAIF #4 Application Layer for session caching, API state management, and job queues was found vulnerable to CVE-2022-0543, which can lead to multiple AI-specific risks: data leakage (T01-SID), model disruption (T01-DoSM), and model manipulation (T01-MTD). In short, a single Redis compromise can escalate from infrastructure-level control to sensitive information exposure and altered AI behavior, undermining the system’s integrity and trust. Findings like this should clearly link the vulnerability to relevant CWEs, mapped AI threats, exploit paths, and reproducible validation steps so both security and AI teams can remediate effectively.

The second recommended step is to perform a Threat enumeration and CWE exploit-path mapping This step transforms vulnerability-centric testing into design-level assurance. By classifying findings under CWE categories, the pen tester bridges the gap between patch management and resilient AI architecture. CWE mapping clarifies attacker objectives, expands test coverage beyond isolated CVEs, and guides remediation that strengthens entire system layers rather than individual components. The CWE-based table reframes technical flaws as architectural weaknesses, for instance, CWE-20 (Improper Input Validation) exposes weak parsing logic, CWE-276 (Incorrect Default Permissions) reveals insecure defaults in data storage such as S3 buckets, and CWE-345 (Insufficient Verification of Data Authenticity) uncovers trust and integrity flaws in RAG ingestion. This approach helps testers not only find where AI applications break, but also understand why they break and how to redesign them to resist future exploitation.

Finally, the third step is to look at AI Threats, Targeted CWEs and Provide Recommendations to Fix Them in the Pen Testing Report. CWEs being targeted by a threat needs to be accompanied by secure design recommendations, such as enforcing schema validation, disabling default public access, verifying dataset authenticity, or encrypting sensitive data. This means pen testers can move from “here is how I broke it” to “here is how you should redesign it to prevent recurrence.” As pen testers revisit AI systems/application in scope for testing as these mighr change, they can update the CVE and CWE of newly discovered vulnerabilities and use the AI Threats column as a checklist for attack simulations in future red-team exercises. Over time, this evolving matrix becomes a living document that supports secure design, ongoing validation, and resilience in AI-enabled systems.

AI Threat enumeration and CVE exploit path mapping

In this section we provide a mapping of SAIF components to AI threats and examples of component dependent tech-stack CVEs that can be exploited

SAIF Component (Number)	Sub-Components	Tech Stack (Chatbot + RAG)	Mapped Threats	Example CVEs in Tech Stack
(2) User Input	Text, voice, multimodal parsers	React/Next.js, Slack SDK, Teams Bot, Twilio, Whisper/ASR, FastAPI/Pydantic	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU	React XSS (CVE-2021-24033); FastAPI vuln (CVE-2023-27533); Twilio SDK (CVE-2022-36449)
(3) User Output	Renderers, formatting, TTS/visual output	React chat widgets, Slack/Teams cards, Polly/ElevenLabs, Markdown renderers	T01-EA, T01-SPL, T01-MIS, T01-IOH	Slack API auth bypass (CVE-2020-10753); Markdown injection (CVE-2022-21681)
(4) Application	Orchestration, session mgmt, APIs, business logic	LangChain, LlamaIndex, Semantic Kernel, FastAPI/Flask, Redis sessions, GraphQL APIs	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS	Flask template injection (CVE-2019-8341); Redis RCE (CVE-2022-0543); GraphQL DoS (CVE-2020-15159)
(5) Agent/Plugin	Connectors, plugin registry, tool adapters	LangGraph Agents, OpenAI Functions, Zapier/n8n, custom OpenAPI tools	T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW	n8n RCE (CVE-2023-37925); OpenAPI tooling parser injection (CVE-2021-32640)
(6) External Sources (App)	APIs, SaaS services, enterprise connectors	Salesforce, ServiceNow, Confluence, SharePoint APIs	T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP	Confluence RCE (CVE-2023-22515); SharePoint RCE (CVE-2023-29357)
(7) Input Handling	Validation, sanitization, PII detection, scanning	Pydantic, JSON Schema, Presidio, ClamAV	T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW	ClamAV RCE (CVE-2023-20032); JSON Schema validator injection (GitHub advisories)
(8) Output Handling	Filters, moderation, redaction, grounding checks	Guardrails.ai, OpenAI Moderation, NeMo Guardrails, RAGAS	T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS	NeMo Guardrails Python deps RCE (via PyTorch CVEs)
(9) Model	LLM weights, embeddings, rerankers	GPT-4o, Claude, Llama-3, Mistral, Cohere reranker, BGE embeddings	T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS	PyTorch vuln (CVE-2022-45907); TensorFlow overflow (CVE-2021-37678); Hugging Face sandbox escape (CVE-2023-6730)
(10) Model Storage Infrastructure	Registry, encrypted artifacts	MLflow, S3/GCS, Azure Blob, Vertex AI Registry	T01-DPFT, T01-SCMP, T01-MTR, T01-MTD	MLflow path traversal (CVE-2023-6836); AWS S3 bucket takeover misconfigs (CWE-based)
(11) Model Serving Infrastructure	GPU runtimes, inference servers, autoscaling	vLLM, NVIDIA Triton, TensorRT-LLM, Kubernetes GPU nodes	T01-SCMP, T01-MTU, T01-MTR, T01-DoSM	NVIDIA Triton RCE (CVE-2023-31036); Kubernetes privilege escalation (CVE-2023-3676); NVIDIA GPU DoS (CVE-2024-0146)
(12) Evaluation	Golden sets, drift/bias eval, safety harness	RAGAS, DeepEval, W&B, Evidently AI, Great Expectations	T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS	Weights & Biases CLI vuln (GitHub advisories); Great Expectations YAML injection (potential CWE-74)
(13) Training & Tuning	Pipelines, fine-tuning, HPO	Kubeflow, SageMaker, Hugging Face PEFT, Optuna	T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD	Kubeflow dashboard RCE (CVE-2021-31812); SageMaker Jupyter RCE (AWS advisory); Hugging Face PEFT vuln (CVE-2023-6730)
(14) Model Frameworks & Code	Frameworks, tokenizers, compilers	PyTorch, TensorFlow, Hugging Face, ONNX Runtime	T01-SCMP, T01-MTD, T01-VEW	TensorFlow buffer overflow (CVE-2021-37678); PyTorch vulnerability (CVE-2022-45907); ONNX Runtime DoS (CVE-2022-25883)
(15) Data Storage Infrastructure	Vector DBs, RDBMS, object stores	Weaviate, Pinecone, Milvus, Redis, Postgres, S3	T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID	Redis RCE (CVE-2022-0543); PostgreSQL escalation (CVE-2023-2454); Milvus injection (CVE-2023-48022)
(16) Training Data	Raw corpora, labeled, synthetic	Chat logs, FAQs, Label Studio, synthetic Q&A	T01-MIMI, T01-TDL, T01-SID	Label Studio auth bypass (CVE-2021-36701)
(17) Data Filtering & Processing	ETL, cleaning, chunking, tagging	Airflow, dbt, Unstructured.io, spaCy, NLTK	T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS	Apache Airflow RCE (CVE-2023-42793); dbt adapter injection (GitHub advisories)
(18) Data Sources	Internal KBs, CRM, telemetry	Confluence, Jira, Elastic, Splunk	T01-SID, T01-DMP, T01-VEW, T01-MIS	Confluence RCE (CVE-2023-22515); Jira auth bypass (CVE-2020-14181); ElasticSearch RCE (CVE-2015-1427); Splunk RCE (CVE-2022-32158)
(19) External Sources	Public datasets, 3rd party APIs/feeds	Wikipedia, Common Crawl, arXiv, News APIs	T01-MIMI, T01-SID, T01-DMP, T01-MIS	Dataset poisoning risks (no CVEs, CWE-driven); API poisoning (CWE-345: Insufficient Verification of Data Authenticity)

AI Threat enumeration and Targeted CWEs

In this section we provide a mapping of SAIF components to AI threats and examples of vulnerability types/CWEs that can be exploited

SAIF Component	Mapped Threats	Targeted CWEs
(2) User Input	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU	CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-359, CWE-400, CWE-522, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-94
(3) User Output	T01-EA, T01-SPL, T01-MIS, T01-IOH	CWE-116, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-640, CWE-79, CWE-825
(4) Application	T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS	CWE-116, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-640, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-94
(5) Agent/Plugin	T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW	CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94
(6) External Sources	T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP	CWE-1389, CWE-20, CWE-200, CWE-276, CWE-284, CWE-285, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-829, CWE-918, CWE-94
(7) Input Handling	T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW	CWE-117, CWE-1389, CWE-20, CWE-200, CWE-209, CWE-359, CWE-400, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-770, CWE-787, CWE-829, CWE-918
(8) Output Handling	T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS	CWE-116, CWE-117, CWE-1204, CWE-200, CWE-201, CWE-209, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-522, CWE-532, CWE-640, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825
(9) Model	T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS	CWE-116, CWE-117, CWE-119, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-203, CWE-209, CWE-276, CWE-284, CWE-285, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-502, CWE-522, CWE-532, CWE-640, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94
(10) Model Storage Infra	T01-DPFT, T01-SCMP, T01-MTR, T01-MTD	CWE-276, CWE-284, CWE-285, CWE-494, CWE-522, CWE-829, CWE-830
(11) Model Serving Infra	T01-SCMP, T01-MTU, T01-MTR, T01-DoSM	CWE-1204, CWE-276, CWE-284, CWE-400, CWE-494, CWE-522, CWE-75, CWE-770, CWE-787, CWE-829
(12) Evaluation	T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS	CWE-116, CWE-117, CWE-1204, CWE-1389, CWE-20, CWE-200, CWE-201, CWE-345, CWE-352, CWE-359, CWE-400, CWE-494, CWE-522, CWE-532, CWE-693, CWE-74, CWE-75, CWE-770, CWE-787, CWE-79, CWE-825
(13) Training & Tuning	T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD	CWE-1389, CWE-20, CWE-276, CWE-285, CWE-345, CWE-352, CWE-494, CWE-693, CWE-825, CWE-829, CWE-830
(14) Model Frameworks & Code	T01-SCMP, T01-MTD, T01-VEW	CWE-276, CWE-285, CWE-494, CWE-502, CWE-829, CWE-918
(15) Data Storage Infra	T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID	CWE-117, CWE-119, CWE-20, CWE-200, CWE-276, CWE-285, CWE-359, CWE-494, CWE-522, CWE-532, CWE-74, CWE-829, CWE-830, CWE-94
(16) Training Data	T01-MIMI, T01-TDL, T01-SID	CWE-200, CWE-201, CWE-203, CWE-359, CWE-522
(17) Data Filtering & Processing	T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS	CWE-119, CWE-20, CWE-200, CWE-201, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-830, CWE-918, CWE-94
(18) Data Sources	T01-SID, T01-DMP, T01-VEW, T01-MIS	CWE-20, CWE-200, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-918
(19) External Sources	T01-MIMI, T01-SID, T01-DMP, T01-MIS	CWE-20, CWE-200, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-522, CWE-74, CWE-825

AI Threats, Targeted CWEs and Recommendations to Fix Them

In this section we provide a mapping of SAIF components to threats, possibly targeted CWEs, the rationale for CWEs being targeted, and recommendations for fixing them.

(2) User Input
(3) User Output
(4) Application
(5) Agent / Plugin
(6) External Sources
(7) Input Handling
(8) Output Handling
(9) Model
(10) Model Storage Infrastructure
(11) Model Serving Infrastructure
(12) Evaluation
(13) Training & Tuning
(14) Model Frameworks & Code
(15) Data Storage Infrastructure
(16) Training Data
(17) Data Filtering & Processing
(18) Data Sources
(19) External Sources

(2) User Input

Summary: User Input is the front door of the system — every downstream component depends on it. Without strong input validation, filtering, and limits, it becomes the main vector for prompt injection, data leakage, DoS, and toxicity propagation.

Threats: T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU

Targeted CWEs:
CWE-20, CWE-74, CWE-94, CWE-707, CWE-200, CWE-359, CWE-522, CWE-400, CWE-770, CWE-787, CWE-116, CWE-79

Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94, CWE-707
Rationale: Maliciously crafted inputs (user prompts or embedded instructions) can override instructions or trigger unintended actions.
Recommendations:

Apply strict input validation and canonicalization before passing content to the model.
Use prompt isolation/sandboxing (separate user and system instructions).
Enforce allowlist-based instruction patterns.
Test with adversarial prompt fuzzing.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522
Rationale: Inputs may include secrets/PII that can be reflected in outputs or logs.
Recommendations:

Integrate DLP filters into input channels.
Mask/tokenize secrets and PII before forwarding to the model.
Restrict logging of raw inputs.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787
Rationale: Oversized or adversarial inputs can exhaust tokens/compute.
Recommendations:

Set input size and tokenization limits.
Apply rate-limits and per-user quotas.
Use circuit breakers/autoscaling.

Insecure Output Handling Triggered by Inputs (T01-IOH)

Mapped CWEs: CWE-116, CWE-79
Rationale: Malicious inputs may propagate to rendered outputs (e.g., XSS).
Recommendations:

Sanitize and encode outputs by context (HTML/MD/JSON).
Separate data from control characters; use safe rendering frameworks.

Model Toxicity / Unreliable Outputs (T01-MTU)

Mapped CWEs: CWE-707, CWE-345, CWE-1204
Rationale: Inputs can steer models toward toxic or unreliable content.
Recommendations:

Add toxicity/bias classifiers and context filters.
Escalate high-risk cases to human review.

(3) User Output

Summary: The last mile to users/connected systems; without control, it’s a vector for excessive agency, prompt leakage, misinformation, and unsafe rendering.

Threats: T01-EA, T01-SPL, T01-MIS, T01-IOH

Targeted CWEs:
CWE-284, CWE-285, CWE-200, CWE-209, CWE-359, CWE-532, CWE-116, CWE-79, CWE-75, CWE-345, CWE-1204

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285
Rationale: Action-bearing outputs can trigger privileged operations without proper scoping.
Recommendations:

Enforce least-privilege scopes for action outputs.
Require policy checks before rendering actionable UI.
Use allowlists and out-of-band approvals for high-risk actions.

Sensitive Prompt Leakage (T01-SPL)

Mapped CWEs: CWE-200, CWE-209, CWE-359, CWE-532
Rationale: Hidden prompts/keys/PII can surface in responses, errors, or logs.
Recommendations:

Redact secrets/PII/system instructions before render/logging.
Wrap errors safely; never show raw tool/model errors.
Separate user-visible and operator logs with DLP.

Misinformation (T01-MIS)

Mapped CWEs: CWE-345, CWE-1204
Rationale: Ungrounded claims appear credible in UI.
Recommendations:

Require grounding/citations for high-risk claims.
Add verification metrics and “needs review” flags.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-116, CWE-79, CWE-75
Rationale: Unsanitized text can execute in rich renderers.
Recommendations:

Render from structured formats; encode per context.
Sanitize Markdown/HTML via allowlists; disable unsafe embeds.

(4) Application

Summary: Orchestration brain (sessions, APIs, business logic). Weak validation or access controls can cascade into systemic compromise.

Threats: T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS

Targeted CWEs:
CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-400, CWE-770, CWE-787, CWE-116, CWE-79, CWE-75, CWE-284, CWE-285, CWE-345, CWE-1204

Prompt Injection (T01-DPIJ, T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94
Rationale: Unvalidated inputs into core instruction sets allow overrides.
Recommendations: Schema validation, role separation, safe interpreter layer.

Sensitive Information Disclosure (T01-SID, T01-SPL)

Mapped CWEs: CWE-200, CWE-209, CWE-359, CWE-522
Rationale: Secrets leak via logs/prompts/plugins.
Recommendations: Redact secrets, RBAC on sensitive data, safe error handling.

Denial of Service – Model (T01-DoSM)

Mapped CWEs: CWE-400, CWE-770, CWE-787
Recommendations: Rate-limit orchestration, circuit breakers, size checks.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Mapped CWEs: CWE-345, CWE-1204
Recommendations: Grounding checks, toxicity/bias filters, confidence flags.

Insecure Output Handling (T01-IOH)

Mapped CWEs: CWE-79, CWE-116, CWE-75
Recommendations: Contextual encoding/sanitization; strip unsafe HTML/MD.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285
Recommendations: Least privilege, allowlists, secondary approvals.

(5) Agent / Plugin

Summary: Extended arms of the system; vulnerable to IPIJ, secrets handling, tampering, excessive actions, and unsafe workflows.

Threats: T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW

Targeted CWEs:
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-284, CWE-285, CWE-276, CWE-494, CWE-829, CWE-918, CWE-502

Indirect Prompt Injection (T01-IPIJ)

Mapped CWEs: CWE-20, CWE-74, CWE-94
Recommendations: Strict I/O schemas, escape parameters, forbid dynamic eval.

Sensitive Information Disclosure (T01-SID)

Mapped CWEs: CWE-200, CWE-359, CWE-522
Recommendations: Scoped credentials, redact tool responses, data minimization.

Model Tampering / Disclosure (T01-MTD)

Mapped CWEs: CWE-276, CWE-285, CWE-494
Recommendations: Hardened permissions, signed manifests, artifact signing.

Excessive Agency (T01-EA)

Mapped CWEs: CWE-284, CWE-285
Recommendations: Per-action least privilege, policy gates, human-in-the-loop.

Vulnerable External Workflow (T01-VEW)

Mapped CWEs: CWE-829, CWE-918, CWE-502
Recommendations: Tool allowlists, egress proxy, safe content types.

Operational Hardening (cross-cutting): Per-tool rate limits/timeouts; container isolation; telemetry; signed releases/SBOMs; tenant isolation for state.

(6) External Sources

Summary: Bridges to the outside world; unverified data can inject poison, trigger unsafe actions, or spread misinformation.

Threats: T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP

Targeted CWEs:
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-276, CWE-284, CWE-285, CWE-494, CWE-829, CWE-918, CWE-502, CWE-353, CWE-345

Indirect Prompt Injection (T01-IPIJ)

Recommendations: Sanitize/normalize external content; restrict content types; segregate retrieved content.

Model Tampering/Disclosure (T01-MTD)

Recommendations: Integrity/signature checks; least-privilege access; explicit approvals; hardened storage permissions.

Sensitive Information Disclosure (T01-SID)

Recommendations: Mask sensitive fields; scoped OAuth; DLP policies.

Excessive Agency (T01-EA)

Recommendations: RBAC and allowlists for sources; policy checks before executing; sandboxed connectors.

Vulnerable External Workflow (T01-VEW)

Recommendations: Egress proxy + allowlists; safe content types; SBOM verification.

Data / Model Poisoning (T01-DMP)

Recommendations: Provenance/reputation scoring; adversarial sample testing; cryptographic integrity checks.

(7) Input Handling

Summary: The filter layer; weak parsing/schema enforcement lets adversarial inputs/injections slip through.

Threats: T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW

Targeted CWEs:
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-532, CWE-209, CWE-400, CWE-770, CWE-787, CWE-79, CWE-116, CWE-75, CWE-918

Prompt Injection (T01-DPIJ)

Recommendations: Strict schemas and typing; strip unsafe control sequences; sandbox inputs.

Adversarial Input Evasion (T01-AIE)

Recommendations: Unicode normalization; adversarial testing; layered validation.

Sensitive Information Disclosure (T01-SID, T01-LSID, T01-SPL)

Recommendations: Ingestion-time redaction; masked logging; sanitize logs and errors.

Denial of Service – Model (T01-DoSM)

Recommendations: Input size/rate quotas; buffer validation.

Vulnerable External Workflow (T01-VEW)

Recommendations: Domain allowlists + proxy; content-type validation.

(8) Output Handling

Summary: Safety gate before delivery; failure here leaks sensitive data, misinformation, and unsafe content.

Threats: T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS

Targeted CWEs:
CWE-79, CWE-116, CWE-75, CWE-200, CWE-209, CWE-359, CWE-532, CWE-522, CWE-400, CWE-770, CWE-787, CWE-284, CWE-285, CWE-345, CWE-1204

Log/Storage Information Disclosure (T01-LSID)

Recommendations: Strip sensitive context; RBAC for logs; safe error messages.

Sensitive Information Disclosure (T01-SID, T01-SPL, T01-TDL)

Recommendations: Post-output DLP; encrypt/mask sensitive fields; prevent recall of sensitive training rows.

Denial of Service – Model (T01-DoSM)

Recommendations: Cap output size/tokens; quarantine oversized outputs; validate downstream buffers.

Insecure Output Handling (T01-IOH)

Recommendations: Contextual encoding; allowlist sanitizers; disable rich rendering for untrusted text.

Training Data Leakage (T01-TDL)

Recommendations: Differential privacy; verbatim/entropy filters; redact prompts; restrict logging.

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Recommendations: Toxicity/bias filters; grounding/citations; fallbacks.

Excessive Agency (T01-EA)

Recommendations: Allowlisted commands; authorization checks; explicit confirmation.

(9) Model

Summary: The core intelligence; targeted by injection, poisoning, theft, inversion, DoS, and unsafe outputs.

Threats:
T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS

Targeted CWEs:
CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-532, CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-918, CWE-502, CWE-494, CWE-345, CWE-353, CWE-1204, CWE-116, CWE-119, CWE-830, CWE-829, CWE-640, CWE-693, CWE-75, CWE-79

Prompt Injection (T01-DPIJ, T01-IPIJ)

Recommendations: Separate system/developer prompts; tokenizer-stage filtering; adversarial training.

Supply Chain / Data & Fine-tuning Poisoning (T01-SCMP, T01-DPFT, T01-RMP, T01-DMP)

Recommendations: Signed weights/datasets; provenance scoring; adversarial sanitation; SBOMs.

Adversarial Input Evasion (T01-AIE)

Recommendations: Normalize before tokenization; robustness testing; monitor embeddings.

Sensitive Information Disclosure / Training Data Leakage (T01-SID, T01-TDL, T01-LSID, T01-SPL)

Recommendations: DP in training; block verbatim sequences; redact system prompts; restrict logging.

Model Inversion / Membership Inference (T01-MIMI)

Recommendations: DP-SGD; rate limits/randomization; run MI red-teaming.

Denial of Service – Model (T01-DoSM)

Recommendations: Cap context; detect anomalies; harden serving buffers.

Insecure Output Handling / Unsafe Integrations (T01-IOH, T01-VEW)

Recommendations: Sanitize outputs; whitelist tools; enforce policy layers.

Model Theft / Exfiltration (T01-MTR, T01-MTD)

Recommendations: Access controls; encryption at rest; monitor for exfil.

Model Toxicity / Misinformation / Excessive Agency (T01-MTU, T01-MIS, T01-EA)

Recommendations: Toxicity/bias post-filters; grounding; restrict actionable outputs; approvals.

(10) Model Storage Infrastructure

Summary: Crown jewels at rest — must be encrypted, signed, and access-controlled.

Threats: T01-DPFT, T01-SCMP, T01-MTR, T01-MTD

Targeted CWEs:
CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-494, CWE-353, CWE-922

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Recommendations: Cryptographic signing + checksums; read-only versioned storage; attestation.

Supply Chain Model Poisoning (T01-SCMP)

Recommendations: Trusted registries; verify lineage; pin dependencies.

Model Theft / Exfiltration (T01-MTR)

Recommendations: Encrypt with KMS; least-privilege; monitor bulk downloads; harden defaults.

Model Tampering / Disclosure (T01-MTD)

Recommendations: WORM storage; integrity verification on load; restrict access to service accounts.

(11) Model Serving Infrastructure

Summary: Execution gateway; must resist poisoning, theft, DoS, and unsafe outputs.

Threats: T01-SCMP, T01-MTU, T01-MTR, T01-DoSM

Targeted CWEs:
CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345, CWE-1204, CWE-75

Supply Chain Model Poisoning (T01-SCMP)

Recommendations: Signed container images; checksums; SBOM-enforced provenance; block untrusted registries.

Model Toxicity / Unreliable Outputs (T01-MTU)

Recommendations: Moderation/toxicity filters; grounding checks; safe fallbacks.

Model Theft / Exfiltration (T01-MTR)

Recommendations: Rate limits/anomaly detection; mTLS + RBAC; encrypt weights; harden FS perms.

Denial of Service – Model (T01-DoSM)

Recommendations: Cap request size/tokens; quotas at gateway; circuit breakers/autoscaling; robust parsers.

(12) Evaluation

Summary: The safety lens; poison/bypass here yields false assurance.

Threats: T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS

Targeted CWEs:
CWE-20, CWE-116, CWE-200, CWE-209, CWE-359, CWE-532, CWE-400, CWE-770, CWE-787, CWE-345, CWE-1204

Adversarial Input Evasion (T01-AIE)

Recommendations: Schema validation; normalization; adversarial red-teaming.

Data/Model Poisoning (T01-DMP)

Recommendations: Verify dataset provenance; cross-check baselines; ensemble evaluation.

Information Disclosure (T01-LSID, T01-SID, T01-TDL)

Recommendations: Sanitize logs; encrypt/ACL datasets; monitor for memorization leakage.

Denial of Service – Model (T01-DoSM)

Recommendations: Limit dataset size/runs; rate-limit jobs; fault isolation.

Model Toxicity / Unsafe Output / Misinformation (T01-MTU, T01-IOH, T01-MIS)

Recommendations: Include toxicity/factuality benchmarks; require grounding; scan for unsafe HTML/MD.

(13) Training & Tuning

Summary: Where knowledge is forged; poor data embeds lasting bias/backdoors.

Threats: T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD

Targeted CWEs:
CWE-20, CWE-116, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-200, CWE-359

Adversarial Input Evasion (T01-AIE)

Recommendations: Enforce schemas + canonical normalization; adversarial resilience tests; anomaly detection in preprocessing.

Misinformation (T01-MIS)

Recommendations: Validate vs trusted sources; human oversight; training-time grounding.

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Recommendations: Signed datasets; immutable baselines; adversarial testing pre-deploy.

Supply Chain Model Poisoning (T01-SCMP)

Recommendations: Trusted registries; signatures; hardened defaults and scoped access.

Model Tampering / Disclosure (T01-MTD)

Recommendations: Encrypt checkpoints/logs; RBAC; regular permission audits.

(14) Model Frameworks & Code

Summary: ML runtime backbone; supply chain or unsafe integrations taint the system.

Threats: T01-SCMP, T01-MTD, T01-VEW

Targeted CWEs:
CWE-94, CWE-95, CWE-829, CWE-494, CWE-353, CWE-276, CWE-284, CWE-285, CWE-918, CWE-502

Supply Chain Model Poisoning (T01-SCMP)

Recommendations: Pin versions; require signed packages; scan dependencies; maintain SBOMs.

Model Tampering / Disclosure (T01-MTD)

Recommendations: Harden runtimes; least-privilege service accounts; audit framework binaries.

Vulnerable External Workflow / Unsafe Integration (T01-VEW)

Recommendations: Disable/sandbox dynamic eval; restrict plugin loading; isolate untrusted code; harden deserialization.

(15) Data Storage Infrastructure

Summary: Knowledge vault; poisoning/tampering/leaks here undermine integrity & confidentiality.

Threats: T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID

Targeted CWEs:
CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-532, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345, CWE-922

Runtime/Model/Data Poisoning (T01-RMP, T01-DMP, T01-DPFT, T01-SCMP)

Recommendations: Integrity checks; provenance scoring; append-only/versioned stores; anomaly monitoring.

Sensitive Information Disclosure (T01-SID, T01-LSID)

Recommendations: Encrypt at rest + KMS; RBAC; sanitized logging; access monitoring.

Model/Data Tampering or Exfiltration (T01-MTD)

Recommendations: Disable public/broad ACLs; per-tenant keys; least-privilege; immutable storage for critical data.

Denial of Service – Storage

Recommendations: Quotas and rate limits; hardened parsers/buffers; ingestion throttling.

(16) Training Data

Summary: Root of trust; compromise propagates to all downstream behavior.

Threats: T01-MIMI, T01-TDL, T01-SID

Targeted CWEs:
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285

Model Inversion / Membership Inference (T01-MIMI)

Recommendations: Differential privacy; strict RBAC on raw data; detect inversion patterns.

Training Data Leakage (T01-TDL)

Recommendations: Encrypt datasets; keep creds out of pipelines; tokenize sensitive fields pre-ingestion.

Sensitive Information Disclosure (T01-SID)

Recommendations: Least-privilege; row/column-level policies; audit all access.

Data Authenticity

Recommendations: Signed/versioned datasets; provenance scoring; golden-set cross-validation.

(17) Data Filtering & Processing

Summary: Gatekeeper stage; weak validation lets poisoned/sensitive data pass.

Threats: T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS

Targeted CWEs:
CWE-20, CWE-116, CWE-200, CWE-359, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-829, CWE-918, CWE-502

Runtime / Data Poisoning (T01-RMP, T01-DMP, T01-DPFT)

Recommendations: Signed datasets; hash verification; drift detection.

Sensitive Information Disclosure (T01-SID, T01-TDL, T01-MIMI)

Recommendations: DLP in preprocessing; masking/tokenization; RBAC for feature stores.

Vulnerable External Workflow (T01-VEW)

Recommendations: Sandbox transforms; egress filtering; forbid unsafe deserialization.

Misinformation (T01-MIS)

Recommendations: Reputation/ground-truth validation; cross-dataset checks; human review for high-risk domains.

Denial of Service on Pipelines

Recommendations: Size quotas; ingestion rate limits; anomaly monitoring.

(18) Data Sources

Summary: Entry point of truth; without provenance checks, they introduce poisoned/unsafe content.

Threats: T01-SID, T01-DMP, T01-VEW, T01-MIS

Targeted CWEs:
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-829, CWE-918, CWE-502

Sensitive Information Disclosure (T01-SID)

Recommendations: DLP at ingestion; least-privilege credentials; encrypt sensitive datasets.

Data/Model Poisoning (T01-DMP)

Recommendations: Signature/hash checks; reputation scoring; golden-set cross-validation.

Vulnerable External Workflow (T01-VEW)

Recommendations: Proxy + allowlists; forbid unsafe formats; isolate connectors.

Misinformation (T01-MIS)

Recommendations: Reliability scoring; ground-truth cross-referencing; drift monitoring.

(19) External Sources

Summary: Outside the trust boundary; major vectors for poisoning, leakage, and misinformation.

Threats: T01-MIMI, T01-SID, T01-DMP, T01-MIS

Targeted CWEs:
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-918, CWE-829

Model Inversion / Membership Inference (T01-MIMI)

Recommendations: Privacy-preserving APIs; throttle/detect anomalies; k-anonymity/data minimization.

Sensitive Information Disclosure (T01-SID)

Recommendations: Secret managers; token rotation; TLS + mutual auth.

Data/Model Poisoning (T01-DMP)

Recommendations: Data signing/ checksums; cross-validate with references; vendor trust contracts.

Misinformation (T01-MIS)

Recommendations: Source reliability scores; ground-truth validation; human review for high-impact feeds.

38 KiB Raw Blame History Unescape Escape

Appendix E: SAIF AI Threat Targeted Components & CVEs/CWEs

AI Threat enumeration and CVE exploit path mapping

AI Threat enumeration and Targeted CWEs

AI Threats, Targeted CWEs and Recommendations to Fix Them

(2) User Input

Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)

Sensitive Information Disclosure (T01-SID)

Denial of Service – Model (T01-DoSM)

Insecure Output Handling Triggered by Inputs (T01-IOH)

Model Toxicity / Unreliable Outputs (T01-MTU)

(3) User Output

Excessive Agency (T01-EA)

Sensitive Prompt Leakage (T01-SPL)

Misinformation (T01-MIS)

Insecure Output Handling (T01-IOH)

(4) Application

Prompt Injection (T01-DPIJ, T01-IPIJ)

Sensitive Information Disclosure (T01-SID, T01-SPL)

Denial of Service – Model (T01-DoSM)

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Insecure Output Handling (T01-IOH)

Excessive Agency (T01-EA)

(5) Agent / Plugin

Indirect Prompt Injection (T01-IPIJ)

Sensitive Information Disclosure (T01-SID)

Model Tampering / Disclosure (T01-MTD)

Excessive Agency (T01-EA)

Vulnerable External Workflow (T01-VEW)

(6) External Sources

Indirect Prompt Injection (T01-IPIJ)

Model Tampering/Disclosure (T01-MTD)

Sensitive Information Disclosure (T01-SID)

Excessive Agency (T01-EA)

Vulnerable External Workflow (T01-VEW)

Data / Model Poisoning (T01-DMP)

(7) Input Handling

Prompt Injection (T01-DPIJ)

Adversarial Input Evasion (T01-AIE)

Sensitive Information Disclosure (T01-SID, T01-LSID, T01-SPL)

Denial of Service – Model (T01-DoSM)

Vulnerable External Workflow (T01-VEW)

(8) Output Handling

Log/Storage Information Disclosure (T01-LSID)

Sensitive Information Disclosure (T01-SID, T01-SPL, T01-TDL)

Denial of Service – Model (T01-DoSM)

Insecure Output Handling (T01-IOH)

Training Data Leakage (T01-TDL)

Model Toxicity / Misinformation (T01-MTU, T01-MIS)

Excessive Agency (T01-EA)

(9) Model

Prompt Injection (T01-DPIJ, T01-IPIJ)

Supply Chain / Data & Fine-tuning Poisoning (T01-SCMP, T01-DPFT, T01-RMP, T01-DMP)

Adversarial Input Evasion (T01-AIE)

Sensitive Information Disclosure / Training Data Leakage (T01-SID, T01-TDL, T01-LSID, T01-SPL)

Model Inversion / Membership Inference (T01-MIMI)

Denial of Service – Model (T01-DoSM)

Insecure Output Handling / Unsafe Integrations (T01-IOH, T01-VEW)

Model Theft / Exfiltration (T01-MTR, T01-MTD)

Model Toxicity / Misinformation / Excessive Agency (T01-MTU, T01-MIS, T01-EA)

(10) Model Storage Infrastructure

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

Supply Chain Model Poisoning (T01-SCMP)

Model Theft / Exfiltration (T01-MTR)

Model Tampering / Disclosure (T01-MTD)

(11) Model Serving Infrastructure

Supply Chain Model Poisoning (T01-SCMP)

Model Toxicity / Unreliable Outputs (T01-MTU)

Model Theft / Exfiltration (T01-MTR)

Denial of Service – Model (T01-DoSM)

(12) Evaluation

Adversarial Input Evasion (T01-AIE)

Data/Model Poisoning (T01-DMP)

Information Disclosure (T01-LSID, T01-SID, T01-TDL)

Denial of Service – Model (T01-DoSM)

Model Toxicity / Unsafe Output / Misinformation (T01-MTU, T01-IOH, T01-MIS)

(13) Training & Tuning

Adversarial Input Evasion (T01-AIE)

Misinformation (T01-MIS)

Data/Prompt Fine-Tuning Poisoning (T01-DPFT)

38 KiB

Raw Blame History