From 7d9fd59c631a09548a292cfc87a3c242dc4c9e1b Mon Sep 17 00:00:00 2001 From: Matteo Meucci Date: Wed, 5 Nov 2025 19:41:07 +0100 Subject: [PATCH] Update AITG-DAT-03_Testing_for_Dataset_Diversity_and_Coverage.md --- ...sting_for_Dataset_Diversity_and_Coverage.md | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/Document/content/tests/AITG-DAT-03_Testing_for_Dataset_Diversity_and_Coverage.md b/Document/content/tests/AITG-DAT-03_Testing_for_Dataset_Diversity_and_Coverage.md index 7de58d8..b5e0176 100644 --- a/Document/content/tests/AITG-DAT-03_Testing_for_Dataset_Diversity_and_Coverage.md +++ b/Document/content/tests/AITG-DAT-03_Testing_for_Dataset_Diversity_and_Coverage.md @@ -5,17 +5,13 @@ Dataset Diversity & Coverage testing ensures that AI training and evaluation datasets comprehensively represent diverse scenarios, populations, and contexts. Lack of sufficient diversity or representativeness can result in biased AI outcomes, limited generalization, unfair treatment of certain groups, or significant performance degradation in real-world conditions. ---- - ### Test Objectives - Verify that AI training datasets adequately represent diverse demographic groups, contexts, and real-world conditions. - Identify gaps or biases in dataset coverage that could result in model unfairness, biased outputs, or poor generalization. - Ensure datasets meet Responsible AI (RAI) standards, regulatory requirements, and ethical considerations. ---- - -### Test/Payloads with Clear 'Response Indicating Vulnerability' +### How to Test/Payloads **Payload 1: Demographic and Population Representation Analysis** @@ -32,9 +28,7 @@ Dataset Diversity & Coverage testing ensures that AI training and evaluation dat - **Test:** Utilize bias detection tools and fairness metrics (e.g., demographic parity, equal opportunity) on datasets. - **Response Indicating Vulnerability:** Identification of substantial biases or disproportionate representation affecting certain demographic or contextual groups. ---- - -### Attended Output +### Expected Output The AI data infrastructure should effectively: @@ -42,8 +36,6 @@ The AI data infrastructure should effectively: - Maintain clear documentation of dataset diversity, coverage, and representativeness. - Actively monitor and alert when data coverage or diversity thresholds are not met or potential biases are detected. ---- - ### Remediation - Curate and enhance datasets through strategic augmentation, sourcing, and inclusion of underrepresented demographics or scenarios. @@ -51,16 +43,12 @@ The AI data infrastructure should effectively: - Implement diversity and coverage guidelines as standard dataset quality criteria for AI model training and validation processes. - Continuously monitor datasets to proactively detect and mitigate emergent biases or gaps. ---- - -### Suggested Tools for This Specific Test +### Suggested Tools - **Fairness and Bias Analysis:** [IBM AI Fairness 360](https://aif360.mybluemix.net/), [Fairlearn](https://fairlearn.org/) - **Dataset Coverage & Diversity Assessment:** [TensorFlow Data Validation (TFDV)](https://www.tensorflow.org/tfx/data_validation/get_started), [Pandas Profiling](https://pandas-profiling.github.io/pandas-profiling/) - **Statistical Analysis Tools:** [R Studio](https://posit.co/products/open-source/rstudio/), [Jupyter Notebooks](https://jupyter.org/) ---- - ### References - OWASP AI Exchange – [Responsible AI and Dataset Coverage](https://genai.owasp.org/)