Merge pull request #49 from mmorana1/patch-21

This commit is contained in:
Matteo Meucci
2025-10-22 08:47:27 +02:00
committed by GitHub

View File

@@ -70,9 +70,11 @@ In this section we provide a mapping of SAIF components to AI threats and exampl
| (18) Data Sources | T01-SID, T01-DMP, T01-VEW, T01-MIS | CWE-20, CWE-200, CWE-345, CWE-352, CWE-359, CWE-494, CWE-502, CWE-522, CWE-74, CWE-825, CWE-829, CWE-918 |
| (19) External Sources | T01-MIMI, T01-SID, T01-DMP, T01-MIS | CWE-20, CWE-200, CWE-203, CWE-345, CWE-352, CWE-359, CWE-494, CWE-522, CWE-74, CWE-825 |
**AI Threats, Targeted CWEs and Recommendations to Fix Them**
**AI Threat-to-Component-to-CWE Mapping and Remediation Guidance**
In this section we provide a mapping of SAIF components to threats, possibly targeted CWEs, the rationale for CWEs being targeted, and recommendations for fixing them.
In this section, we present a mapping between AI system components, associated AI threats (as defined in the guides threat model), corresponding CWE categories, and remediation recommendations. Each mapping includes the rationale explaining how specific CWEs are exploited or exposed by those AI threats, providing a direct link between identified weaknesses and actionable fixes.
AI System Architectural Components & Data (Note):
- [(2) User Input](#2-user-input)
- [(3) User Output](#3-user-output)
@@ -93,18 +95,25 @@ In this section we provide a mapping of SAIF components to threats, possibly tar
- [(18) Data Sources](#18-data-sources)
- [(19) External Sources](#19-external-sources)
Note: Component identifiers correspond to the SAIF numbering scheme illustrated in the threat model diagram within this guide.
---
## (2) User Input
**Summary:** User Input is the front door of the system — every downstream component depends on it. Without strong input validation, filtering, and limits, it becomes the main vector for prompt injection, data leakage, DoS, and toxicity propagation.
**Threats:** T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-IOH, T01-MTU
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-707, CWE-200, CWE-359, CWE-522, CWE-400, CWE-770, CWE-787, CWE-116, CWE-79
### Direct Prompt Injection (T01-DPIJ) & Indirect Prompt Injection (T01-IPIJ)
**Mapped CWEs:** CWE-20, CWE-74, CWE-94, CWE-707
**Rationale:** Maliciously crafted inputs (user prompts or embedded instructions) can override instructions or trigger unintended actions.
**Recommendations:**
- Apply strict input validation and canonicalization before passing content to the model.
- Use prompt isolation/sandboxing (separate user and system instructions).
@@ -112,31 +121,45 @@ CWE-20, CWE-74, CWE-94, CWE-707, CWE-200, CWE-359, CWE-522, CWE-400, CWE-770, CW
- Test with adversarial prompt fuzzing.
### Sensitive Information Disclosure (T01-SID)
**Mapped CWEs:** CWE-200, CWE-359, CWE-522
**Rationale:** Inputs may include secrets/PII that can be reflected in outputs or logs.
**Recommendations:**
- Integrate DLP filters into input channels.
- Mask/tokenize secrets and PII before forwarding to the model.
- Restrict logging of raw inputs.
### Denial of Service Model (T01-DoSM)
**Mapped CWEs:** CWE-400, CWE-770, CWE-787
**Rationale:** Oversized or adversarial inputs can exhaust tokens/compute.
**Rationale:** Oversized or adversarial inputs can exhaust tokens/compute.
**Recommendations:**
- Set input size and tokenization limits.
- Apply rate-limits and per-user quotas.
- Use circuit breakers/autoscaling.
### Insecure Output Handling Triggered by Inputs (T01-IOH)
**Mapped CWEs:** CWE-116, CWE-79
**Rationale:** Malicious inputs may propagate to rendered outputs (e.g., XSS).
**Recommendations:**
- Sanitize and encode outputs by context (HTML/MD/JSON).
- Separate data from control characters; use safe rendering frameworks.
### Model Toxicity / Unreliable Outputs (T01-MTU)
**Mapped CWEs:** CWE-707, CWE-345, CWE-1204
**Rationale:** Inputs can steer models toward toxic or unreliable content.
**Recommendations:**
- Add toxicity/bias classifiers and context filters.
- Escalate high-risk cases to human review.
@@ -144,21 +167,29 @@ CWE-20, CWE-74, CWE-94, CWE-707, CWE-200, CWE-359, CWE-522, CWE-400, CWE-770, CW
---
## (3) User Output
**Summary:** The last mile to users/connected systems; without control, its a vector for excessive agency, prompt leakage, misinformation, and unsafe rendering.
**Threats:** T01-EA, T01-SPL, T01-MIS, T01-IOH
**Targeted CWEs:**
CWE-284, CWE-285, CWE-200, CWE-209, CWE-359, CWE-532, CWE-116, CWE-79, CWE-75, CWE-345, CWE-1204
### Excessive Agency (T01-EA)
**Mapped CWEs:** CWE-284, CWE-285
**Rationale:** Action-bearing outputs can trigger privileged operations without proper scoping.
**Recommendations:**
- Enforce least-privilege scopes for action outputs.
- Require policy checks before rendering actionable UI.
- Use allowlists and out-of-band approvals for high-risk actions.
### Sensitive Prompt Leakage (T01-SPL)
**Mapped CWEs:** CWE-200, CWE-209, CWE-359, CWE-532
**Rationale:** Hidden prompts/keys/PII can surface in responses, errors, or logs.
**Recommendations:**
- Redact secrets/PII/system instructions before render/logging.
@@ -166,15 +197,21 @@ CWE-284, CWE-285, CWE-200, CWE-209, CWE-359, CWE-532, CWE-116, CWE-79, CWE-75, C
- Separate user-visible and operator logs with DLP.
### Misinformation (T01-MIS)
**Mapped CWEs:** CWE-345, CWE-1204
**Rationale:** Ungrounded claims appear credible in UI.
**Recommendations:**
- Require grounding/citations for high-risk claims.
- Add verification metrics and “needs review” flags.
### Insecure Output Handling (T01-IOH)
**Mapped CWEs:** CWE-116, CWE-79, CWE-75
**Rationale:** Unsanitized text can execute in rich renderers.
**Recommendations:**
- Render from structured formats; encode per context.
- Sanitize Markdown/HTML via allowlists; disable unsafe embeds.
@@ -182,47 +219,70 @@ CWE-284, CWE-285, CWE-200, CWE-209, CWE-359, CWE-532, CWE-116, CWE-79, CWE-75, C
---
## (4) Application
**Summary:** Orchestration brain (sessions, APIs, business logic). Weak validation or access controls can cascade into systemic compromise.
**Threats:** T01-DPIJ, T01-IPI J, T01-SID, T01-DoSM, T01-MTU, T01-IOH, T01-EA, T01-SPL, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-400, CWE-770, CWE-787, CWE-116, CWE-79, CWE-75, CWE-284, CWE-285, CWE-345, CWE-1204
### Prompt Injection (T01-DPIJ, T01-IPIJ)
**Mapped CWEs:** CWE-20, CWE-74, CWE-94
**Rationale:** Unvalidated inputs into core instruction sets allow overrides.
**Recommendations:** Schema validation, role separation, safe interpreter layer.
### Sensitive Information Disclosure (T01-SID, T01-SPL)
**Mapped CWEs:** CWE-200, CWE-209, CWE-359, CWE-522
**Rationale:** Secrets leak via logs/prompts/plugins.
**Recommendations:** Redact secrets, RBAC on sensitive data, safe error handling.
### Denial of Service Model (T01-DoSM)
**Mapped CWEs:** CWE-400, CWE-770, CWE-787
**Recommendations:** Rate-limit orchestration, circuit breakers, size checks.
### Model Toxicity / Misinformation (T01-MTU, T01-MIS)
**Mapped CWEs:** CWE-345, CWE-1204
**Recommendations:** Grounding checks, toxicity/bias filters, confidence flags.
### Insecure Output Handling (T01-IOH)
**Mapped CWEs:** CWE-79, CWE-116, CWE-75
**Recommendations:** Contextual encoding/sanitization; strip unsafe HTML/MD.
### Excessive Agency (T01-EA)
**Mapped CWEs:** CWE-284, CWE-285
**Recommendations:** Least privilege, allowlists, secondary approvals.
---
## (5) Agent / Plugin
**Summary:** Extended arms of the system; vulnerable to IPIJ, secrets handling, tampering, excessive actions, and unsafe workflows.
**Threats:** T01-IPI J, T01-SID, T01-MTD, T01-EA, T01-VEW
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-284, CWE-285, CWE-276, CWE-494, CWE-829, CWE-918, CWE-502
### Indirect Prompt Injection (T01-IPIJ)
**Mapped CWEs:** CWE-20, CWE-74, CWE-94
**Rationale:** Plugins may receive crafted instructions through user or system prompts that alter tool behavior or execute unsafe code.
**Recommendations:** Strict I/O schemas, escape parameters, forbid dynamic eval.
### Sensitive Information Disclosure (T01-SID)
@@ -246,8 +306,11 @@ CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-284, CWE-285, CWE-276, CW
---
## (6) External Sources
**Summary:** Bridges to the outside world; unverified data can inject poison, trigger unsafe actions, or spread misinformation.
**Threats:** T01-IPI J, T01-MTD, T01-SID, T01-EA, T01-VEW, T01-DMP
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-276, CWE-284, CWE-285, CWE-494, CWE-829, CWE-918, CWE-502, CWE-353, CWE-345
@@ -272,8 +335,11 @@ CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-276, CWE-284, CWE-285, CW
---
## (7) Input Handling
**Summary:** The filter layer; weak parsing/schema enforcement lets adversarial inputs/injections slip through.
**Threats:** T01-DPIJ, T01-AIE, T01-SID, T01-LSID, T01-DoSM, T01-SPL, T01-VEW
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-532, CWE-209, CWE-400, CWE-770, CWE-787, CWE-79, CWE-116, CWE-75, CWE-918
@@ -295,8 +361,11 @@ CWE-20, CWE-74, CWE-94, CWE-200, CWE-359, CWE-522, CWE-532, CWE-209, CWE-400, CW
---
## (8) Output Handling
**Summary:** Safety gate before delivery; failure here leaks sensitive data, misinformation, and unsafe content.
**Threats:** T01-LSID, T01-SID, T01-DoSM, T01-SPL, T01-IOH, T01-TDL, T01-MTU, T01-EA, T01-MIS
**Targeted CWEs:**
CWE-79, CWE-116, CWE-75, CWE-200, CWE-209, CWE-359, CWE-532, CWE-522, CWE-400, CWE-770, CWE-787, CWE-284, CWE-285, CWE-345, CWE-1204
@@ -324,9 +393,12 @@ CWE-79, CWE-116, CWE-75, CWE-200, CWE-209, CWE-359, CWE-532, CWE-522, CWE-400, C
---
## (9) Model
**Summary:** The core intelligence; targeted by injection, poisoning, theft, inversion, DoS, and unsafe outputs.
**Threats:**
T01-DPIJ, T01-IPI J, T01-SCMP, T01-AIE, T01-DPFT, T01-RMP, T01-DMP, T01-SID, T01-MIMI, T01-TDL, T01-DoSM, T01-LSID, T01-SPL, T01-VEW, T01-MTU, T01-IOH, T01-MTR, T01-EA, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-532, CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-918, CWE-502, CWE-494, CWE-345, CWE-353, CWE-1204, CWE-116, CWE-119, CWE-830, CWE-829, CWE-640, CWE-693, CWE-75, CWE-79
@@ -360,8 +432,11 @@ CWE-20, CWE-74, CWE-94, CWE-200, CWE-209, CWE-359, CWE-522, CWE-532, CWE-276, CW
---
## (10) Model Storage Infrastructure
**Summary:** Crown jewels at rest — must be encrypted, signed, and access-controlled.
**Threats:** T01-DPFT, T01-SCMP, T01-MTR, T01-MTD
**Targeted CWEs:**
CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-494, CWE-353, CWE-922
@@ -380,8 +455,11 @@ CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-494, CWE-353, CWE-922
---
## (11) Model Serving Infrastructure
**Summary:** Execution gateway; must resist poisoning, theft, DoS, and unsafe outputs.
**Threats:** T01-SCMP, T01-MTU, T01-MTR, T01-DoSM
**Targeted CWEs:**
CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345, CWE-1204, CWE-75
@@ -400,8 +478,11 @@ CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345,
---
## (12) Evaluation
**Summary:** The safety lens; poison/bypass here yields false assurance.
**Threats:** T01-AIE, T01-DMP, T01-LSID, T01-SID, T01-TDL, T01-DoSM, T01-MTU, T01-IOH, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-116, CWE-200, CWE-209, CWE-359, CWE-532, CWE-400, CWE-770, CWE-787, CWE-345, CWE-1204
@@ -423,8 +504,11 @@ CWE-20, CWE-116, CWE-200, CWE-209, CWE-359, CWE-532, CWE-400, CWE-770, CWE-787,
---
## (13) Training & Tuning
**Summary:** Where knowledge is forged; poor data embeds lasting bias/backdoors.
**Threats:** T01-AIE, T01-MIS, T01-DPFT, T01-SCMP, T01-MTD
**Targeted CWEs:**
CWE-20, CWE-116, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-200, CWE-359
@@ -446,8 +530,11 @@ CWE-20, CWE-116, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-200,
---
## (14) Model Frameworks & Code
**Summary:** ML runtime backbone; supply chain or unsafe integrations taint the system.
**Threats:** T01-SCMP, T01-MTD, T01-VEW
**Targeted CWEs:**
CWE-94, CWE-95, CWE-829, CWE-494, CWE-353, CWE-276, CWE-284, CWE-285, CWE-918, CWE-502
@@ -463,8 +550,11 @@ CWE-94, CWE-95, CWE-829, CWE-494, CWE-353, CWE-276, CWE-284, CWE-285, CWE-918, C
---
## (15) Data Storage Infrastructure
**Summary:** Knowledge vault; poisoning/tampering/leaks here undermine integrity & confidentiality.
**Threats:** T01-RMP, T01-DMP, T01-DPFT, T01-SCMP, T01-SID, T01-MTD, T01-LSID
**Targeted CWEs:**
CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-532, CWE-400, CWE-770, CWE-787, CWE-494, CWE-353, CWE-345, CWE-922
@@ -483,8 +573,11 @@ CWE-276, CWE-284, CWE-285, CWE-200, CWE-359, CWE-522, CWE-532, CWE-400, CWE-770,
---
## (16) Training Data
**Summary:** Root of trust; compromise propagates to all downstream behavior.
**Threats:** T01-MIMI, T01-TDL, T01-SID
**Targeted CWEs:**
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285
@@ -503,8 +596,11 @@ CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285
---
## (17) Data Filtering & Processing
**Summary:** Gatekeeper stage; weak validation lets poisoned/sensitive data pass.
**Threats:** T01-RMP, T01-DMP, T01-DPFT, T01-SID, T01-MIMI, T01-TDL, T01-VEW, T01-MIS
**Targeted CWEs:**
CWE-20, CWE-116, CWE-200, CWE-359, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284, CWE-285, CWE-400, CWE-770, CWE-787, CWE-829, CWE-918, CWE-502
@@ -526,8 +622,11 @@ CWE-20, CWE-116, CWE-200, CWE-359, CWE-345, CWE-353, CWE-494, CWE-276, CWE-284,
---
## (18) Data Sources
**Summary:** Entry point of truth; without provenance checks, they introduce poisoned/unsafe content.
**Threats:** T01-SID, T01-DMP, T01-VEW, T01-MIS
**Targeted CWEs:**
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-829, CWE-918, CWE-502
@@ -546,7 +645,9 @@ CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-829, CWE-918, CWE-502
---
## (19) External Sources
**Summary:** Outside the trust boundary; major vectors for poisoning, leakage, and misinformation.
**Threats:** T01-MIMI, T01-SID, T01-DMP, T01-MIS
**Targeted CWEs:**
CWE-200, CWE-359, CWE-522, CWE-345, CWE-353, CWE-494, CWE-918, CWE-829