Update AITG-DAT-04_Testing_for_Harmful_Content_in_Data.md

2026-07-15 23:47:19 +02:00 · 2025-11-05 19:44:47 +01:00
parent ff6fc599b6
commit c44b46e4e9
1 changed files with 9 additions and 9 deletions
@@ -30,18 +30,18 @@ Testing for Harmful Content in Data involves identifying and mitigating any inap

 ### Expected Output

-The AI data infrastructure should effectively:
-
- Ensure that datasets contain no harmful, toxic, or biased content.
- Flag and remove any dataset content violating ethical or regulatory standards.
- Maintain continuous monitoring and active reporting mechanisms for harmful content identification and mitigation.
+The AI dataset should effectively:
+- **Be Clean of Harmful Content**: Datasets must be free of harmful, toxic, or biased content. The automated scan should result in a **Harmful Content Rate below 1%**.
+- **Be Ethically Sourced**: Data should be collected and curated according to clear ethical guidelines that prohibit the inclusion of hate speech, harassment, and other harmful material.
+- **Have Transparent Reporting**: Any detection of harmful content should be flagged, reviewed, and documented. A data quality report should be available for all datasets.

 ### Remediation

- Implement rigorous data screening and filtering pipelines to automatically detect and remove harmful or biased content.
- Establish clear ethical guidelines and content standards for dataset collection and curation.
- Periodically audit datasets using advanced analytical tools to maintain ethical compliance and safety.
- Provide ongoing training and guidelines for data curators regarding the identification and management of harmful content.
+- **Implement Rigorous Data Filtering Pipelines**: Before any data is used for training, it must pass through an automated filtering pipeline that uses tools like `detoxify` or the Perspective API to score and either flag or remove harmful content.
+- **Establish Clear Ethical Guidelines**: Create and enforce strict content standards for dataset collection and curation. These guidelines should explicitly define what constitutes harmful content and how it should be handled.
+- **Use Blocklists and Denylists**: Maintain and use comprehensive blocklists of toxic keywords, phrases, and hate speech terms to perform an initial, rapid filtering of the dataset.
+- **Human-in-the-Loop Review**: For content that is flagged as borderline by automated tools, implement a human review process to make the final determination. This is crucial for nuanced or context-dependent cases.
+- **Periodically Audit Datasets**: Do not assume a dataset remains clean. Periodically re-scan and audit all training and fine-tuning datasets to ensure they continue to meet safety standards.

 ### Suggested Tools