mirror of
https://github.com/OWASP/www-project-ai-testing-guide.git
synced 2026-03-20 09:13:56 +00:00
@@ -1,17 +1,16 @@
|
||||
# Appendix E: SAIF AI Threat Targeted Components & CVEs/CWEs
|
||||
|
||||
This appendix is intended to guide penetration testers by showing how common CVEs and CWEs map to AI-specific threats across the SAIF-defined components of an AI architecture. CVEs typically correspond to vulnerabilities in the technology stack—libraries, frameworks, APIs, and services—that implement user interfaces, the model layer, supporting infrastructure, or data sources. Because the AI pen tests of these guide are meant to be performed against an existing application, it is essential that testers perform careful scoping up front: identify which SAIF components and subcomponents are in scope, enumerate the actual technologies deployed for each, and use that inventory to prioritize CVE/CWE enumeration and threat simulations. For example, components directly involved in the AI application’s operation—such as the chat interface, FastAPI backend, model orchestration logic, and connected data stores—should be considered in scope for penetration testing. In contrast, external or third-party services not owned or controlled by the organization, such as vendor APIs or external data feeds, are typically out of scope, as they fall outside the AI application’s trust boundary and control. Once the components in scope versus out of scope are identified, the next step is to build or reference an inventory of the technology stack used to develop and operate those components. This inventory ensures that testing activities are precise and aligned with the actual implementation.
|
||||
For example, in-scope components typically include those owned or managed by the organization and directly involved in the application’s request–response flow — such as the chat interface, API backends (e.g., FastAPI), session and orchestration layers, model orchestration frameworks (e.g., LangChain or LlamaIndex), vector databases (e.g., Redis, Pinecone, Weaviate), ETL and data processing pipelines, model-serving endpoints, and any internally managed connectors. Because the in-scope components may contain vulnerable libraries, misconfigurations, or exploitable services, the first step is threat enumeration and CVE exploit-path mapping — i.e., inventory known CVEs against the tech stack, identify likely attack paths, and prioritize those paths for validation with scanners and proof-of-concept testing.
|
||||
This appendix is intended to guide penetration testers by showing how common CVEs and CWEs map to AI-specific threats across the SAIF-defined components of an AI architecture. CVEs typically correspond to vulnerabilities in the technology stack—libraries, frameworks, APIs, and services—that implement user interfaces, the model layer, supporting infrastructure, or data sources. Because the AI pen tests of these guide are meant to be performed against an existing application, it is essential that testers perform careful scoping up front: identify which SAIF components and subcomponents are in scope, enumerate the actual technologies deployed for each, and use that inventory to prioritize CVE/CWE enumeration and threat simulations. For example, components directly involved in the AI application’s operation—such as the chat interface, FastAPI backend, model orchestration logic, and connected data stores—should be considered in scope for penetration testing. In contrast, external or third-party services not owned or controlled by the organization, such as vendor APIs or external data feeds, are typically out of scope, as they fall outside the AI application’s trust boundary and control.
|
||||
|
||||
To help with this first step we provide an example of **Threat enumeration and CVE exploit path mapping** as example. To begin, the pen tester performs threat enumeration and CVE exploit-path mapping across the in-scope technology stack. This involves identifying known vulnerabilities using tools such as software composition analyzers (SCA) and runtime scanners. SCA tools like Snyk, Trivy, or Dependabot help reveal vulnerable dependencies and libraries, while scanners such as Nessus or Nuclei can validate active exposures in services and APIs. Runtime telemetry and host-level tools, including Falco or eBPF-based detectors, provide additional evidence of exploitability in live environments. For example, a Redis instance used in the data storage layer may expose CVE-2022-0543 (Lua sandbox escape), which could be exploited to poison embeddings and trigger runtime data poisoning (T01-RMP). Once these CVEs/vulnerabilities are identified, they could be mapped to AI-specific threats using the **AI Threats** column. For example, FastAPI sanitization weaknesses (`CVE-2022-36067`) might appear to be routine web vulnerabilities, but in the context of an LLM they translate to `T01-DPJI` (direct prompt injection). Similarly, CVE-2022-40127 in a LLM or RAG-based application, affect the Apache Airflow that might be used in a LLM to orchestrate the flow of data from raw or external sources (APIs, databases, or files) into embeddings, training corpora, or retrieval indexes (like Pinecone or Weaviate). This CVEs ould be exploited not just for remote code execution but for `T01-DMP` (data poisoning), corrupting training or retrieval data. By mapping each CVE in the AI application tech stack to the specific AI threat it enables, the pen tester does more than record a vulnerable component they show how that vulnerability can be weaponized against the AI system. This turns a routine CVE finding into a clear attack path that explains the practical impact on model behavior, data integrity, confidentiality, or availability.
|
||||
Once the components in scope versus out of scope for testing are identified and agreed, the next step is to build or reference an inventory of the technology stack used to develop and operate those components. This inventory ensures that testing activities are precise and aligned with the actual implementations. For example, in-scope components typically include those owned or managed by the organization and directly involved in the application’s request–response flow, such as the chat interface, API backends (e.g. FastAPI), session and orchestration layers, model orchestration frameworks (e.g. LangChain or LlamaIndex), vector databases (e.g. Redis, Pinecone, Weaviate), ETL and data processing pipelines, model-serving endpoints, and any internally managed connectors. Because the in-scope components may contain vulnerable libraries due to being outdated, misconfigured, or exploitable by the services used, the first step is threat enumeration and CVE exploit-path mapping, that is the inventory of known CVEs against the tech stack mapped to AI threats. This mapping also helps in identifing likely attack paths exploiting these vulnerabilities and prioritize those paths for validation with scanners and proof-of-concept testing.
|
||||
|
||||
The tester can follow a systematic AI pen-testing workflow: for each SAIF component in scope they inspect subcomponents to identify where injection, poisoning, or manipulation are possible, confirm the actual technologies deployed, and run tests to discover vulnerable or unpatched libraries and CVEs. Those findings drive simulations of AI-specific attacks for example prompt injection, model inversion and membership inference, data poisoning, and runtime DoS, to demonstrate real impact on the application. For example, exploiting a Weaviate plugin path-traversal vulnerability (CVE-2023-41267) could let an attacker inject poisoned vectors, producing T01-RMP runtime data poisoning where the chatbot serves attacker-controlled facts. A pen test report could leverage the table "Threat enumeration and CVE exploit path mappings" to maintain **traceability** of vulnerabilities and their impact. A finding might read: “Redis vulnerable to `CVE-2022-0543`,” which maps to `CWE-94` (code injection) and aligns with AI Threat `T01-RMP` (runtime data poisoning). The impact statement would explain that this weakness allows the chatbot to output attacker-controlled responses. This creates a clear chain from vulnerability to exploit to AI-specific risk, making the report resonate with both security engineers and AI/ML practitioners.
|
||||
To help with this first step, we provide an example of **Threat enumeration and CVE exploit path mapping**. To begin, the pen tester performs threat enumeration and CVE exploit-path mapping across the in-scope technology stack. This involves identifying known vulnerabilities using tools such as software composition analyzers (SCA) and runtime scanners. SCA tools like Snyk, Trivy, or Dependabot help reveal vulnerable dependencies and libraries, while scanners such as Nessus or Nuclei can validate active exposures in services and APIs. Runtime telemetry and host-level tools can also provide additional evidence of exploitability in live environments where vulnerable components/libraries are being installed and run. Once these CVEs/vulnerabilities are identified, they could be mapped to AI-specific threats using the **AI Threats** column. For example, FastAPI sanitization weaknesses (`CVE-2022-36067`) might appear to be routine web vulnerabilities, but in the context of an LLM they translate to `T01-DPJI` (direct prompt injection). Similarly, CVE-2022-40127 in a LLM or RAG-based application, affect the Apache Airf or retrieval indexes (like Pinecone or Weaviate). This CVEs ould be exploited for remote code execution and `T01-DMP` (data poisoning), corrupting training or retrieval data. By mapping each CVE in the AI application tech stack to the specific AI threats turns a routine CVE finding into a clear attack path that explains the practical impact of exploitation of the CVE on AI Application being this altering the LLM model behavior or impacting the data integrity, confidentiality, or availability.
|
||||
|
||||
The AI pen-tester can follow a systematic AI pen-testing workflow by following this guidance: for each SAIF component in scope, inspect subcomponents to identify where injection, poisoning, or manipulation are possible, confirm the actual technologies deployed, and run tests to discover vulnerable or unpatched libraries and CVEs. Those findings drive simulations of AI-specific attacks for example prompt injection, model inversion and membership inference, data poisoning, and runtime DoS, to demonstrate real impact on the application. A pen test report could leverage the table of "Threat enumeration and CVE exploit path mappings" to maintain **traceability** of vulnerabilities and their impacts. A finding might read: Redis, used in SAIF #4 –Application Layer for session caching, API state management, and job queue orchestration, was found vulnerable to CVE-2022-0543 (Lua sandbox escape). This vulnerability could allow remote code execution within the application’s runtime environment, potentially leading to session hijacking, data manipulation, or further compromise of other in-scope components. Redis used in SAIF #4 Application Layer fo session caching, API state management, job queues is found vulnerable to `CVE-2022-0543`,” which maps to three primary AI-specific threats under SAIF: T01-SID (Sensitive Information Disclosure), where attackers can extract cached API tokens or user session data; T01-DoSM (Denial of Service – Model), where cache corruption or overload disrupts model inference and orchestration; and T01-MTD (Model Tampering/Disclosure), where manipulation of stored orchestration or metadata alters model behavior or exposes internal details. In essence, a single Redis compromise can cascade from infrastructure-level control to data leakage, service disruption, and model manipulation, undermining the integrity and trust of the AI application. This creates a clear chain from vulnerability to exploit to AI-specific risk, making the report resonate with both security engineers and AI/ML practitioners.
|
||||
|
||||
The second reccomended step is to conduct a **Threat enumeration and CWE exploit path mapping**. CWE-based enumeration is the bridge between patch-centric security and resilient AI system design. For pen testers, converting technical findings into CWE classes clarifies attacker goals, enables broader test coverage, and produces remediation guidance that hardens architecture — not only one vulnerable library — against the class of attacks that threaten AI applications. The CWE-based table helps the pen tester in framing these vulnerabilities/findings as **design weaknesses**, not just CVEs that need patching. For example, `CWE-20` (improper input validation) points to weak parsing logic, `CWE-276` (incorrect default permissions) highlights misconfigurations in data storage or S3 buckets, and `CWE-345` (insufficient verification of data authenticity) shows systemic flaws in RAG ingestion.
|
||||
|
||||
Finally, the third step is to look at **AI Threats, Targeted CWEs and Provide Recommendations to Fix Them** in the Pen Testing Report. CWEs being targeted by a threat needs to be accompanied by secure design recommendations, such as enforcing schema validation, disabling default public access, verifying dataset authenticity, or encrypting sensitive data. This means pen testers can move from “here is how I broke it” to “here is how you should redesign it to prevent recurrence.”
|
||||
|
||||
As pen testers revisit AI systems/application in scope for testing as these mighr change, they can update the CVE and CWE of newly discovered vulnerabilities and use the AI Threats column as a checklist for attack simulations in future red-team exercises. Over time, this evolving matrix becomes a living document that supports secure design, ongoing validation, and resilience in AI-enabled systems.
|
||||
Finally, the third step is to look at **AI Threats, Targeted CWEs and Provide Recommendations to Fix Them** in the Pen Testing Report. CWEs being targeted by a threat needs to be accompanied by secure design recommendations, such as enforcing schema validation, disabling default public access, verifying dataset authenticity, or encrypting sensitive data. This means pen testers can move from “here is how I broke it” to “here is how you should redesign it to prevent recurrence.” As pen testers revisit AI systems/application in scope for testing as these mighr change, they can update the CVE and CWE of newly discovered vulnerabilities and use the AI Threats column as a checklist for attack simulations in future red-team exercises. Over time, this evolving matrix becomes a living document that supports secure design, ongoing validation, and resilience in AI-enabled systems.
|
||||
|
||||
## AI Threat enumeration and CVE exploit path mapping
|
||||
|
||||
|
||||
Reference in New Issue
Block a user