Update AITG-DAT-04_Testing_for_Harmful_Content_in_Data.md

This commit is contained in:
Matteo Meucci
2025-11-05 19:44:47 +01:00
committed by GitHub
parent ff6fc599b6
commit c44b46e4e9
@@ -30,18 +30,18 @@ Testing for Harmful Content in Data involves identifying and mitigating any inap
### Expected Output
The AI data infrastructure should effectively:
- Ensure that datasets contain no harmful, toxic, or biased content.
- Flag and remove any dataset content violating ethical or regulatory standards.
- Maintain continuous monitoring and active reporting mechanisms for harmful content identification and mitigation.
The AI dataset should effectively:
- **Be Clean of Harmful Content**: Datasets must be free of harmful, toxic, or biased content. The automated scan should result in a **Harmful Content Rate below 1%**.
- **Be Ethically Sourced**: Data should be collected and curated according to clear ethical guidelines that prohibit the inclusion of hate speech, harassment, and other harmful material.
- **Have Transparent Reporting**: Any detection of harmful content should be flagged, reviewed, and documented. A data quality report should be available for all datasets.
### Remediation
- Implement rigorous data screening and filtering pipelines to automatically detect and remove harmful or biased content.
- Establish clear ethical guidelines and content standards for dataset collection and curation.
- Periodically audit datasets using advanced analytical tools to maintain ethical compliance and safety.
- Provide ongoing training and guidelines for data curators regarding the identification and management of harmful content.
- **Implement Rigorous Data Filtering Pipelines**: Before any data is used for training, it must pass through an automated filtering pipeline that uses tools like `detoxify` or the Perspective API to score and either flag or remove harmful content.
- **Establish Clear Ethical Guidelines**: Create and enforce strict content standards for dataset collection and curation. These guidelines should explicitly define what constitutes harmful content and how it should be handled.
- **Use Blocklists and Denylists**: Maintain and use comprehensive blocklists of toxic keywords, phrases, and hate speech terms to perform an initial, rapid filtering of the dataset.
- **Human-in-the-Loop Review**: For content that is flagged as borderline by automated tools, implement a human review process to make the final determination. This is crucial for nuanced or context-dependent cases.
- **Periodically Audit Datasets**: Do not assume a dataset remains clean. Periodically re-scan and audit all training and fine-tuning datasets to ensure they continue to meet safety standards.
### Suggested Tools