mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-05 16:07:49 +02:00
feat(metadata): blank AI-generator tokens in AVIF/HEIF Exif meta-box items
Closes a documented coverage gap (P2#9): an AI Software/Make/Artist/ImageDescription token in an EXIF item (its TIFF bytes live in mdat/idat) survived remove_ai_metadata because the top-level box stripper and (absent pillow-heif) the PIL EXIF reader can't reach it. New isobmff.blank_ai_exif_tokens finds EXIF TIFF blocks by their II/MM byte-order header, validates each with piexif (a coincidental II/MM run in pixels won't parse as a TIFF IFD, so it's ignored), and overwrites any AI_GENERATOR_TOKENS- bearing value with same-length spaces -- so box sizes and iloc offsets stay valid and the coded image is untouched (mirrors blank_ai_xmp_packets; no iinf/iloc surgery, no exiftool dep). Camera/editor EXIF without an AI token is preserved. Wired into remove_ai_metadata's ISOBMFF path. Covers the realistic AI-generator-token case; xAI- signature-in-meta-box-EXIF (Grok is JPEG-only) stays out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -41,7 +41,7 @@ Metadata detection for AVIF/HEIF/JPEG-XL relies on a binary scan for `C2PA_UUID`
|
||||
|
||||
**Meta-box XMP now handled (`isobmff.blank_ai_xmp_packets`, v0.6.9):** an AI-label XMP packet stored as a meta-box `mime` item (AVIF/HEIF) is blanked in place (overwritten with spaces of the same length, so `iloc` offsets and the coded image stay valid).
|
||||
|
||||
**Still NOT built:** an `Exif` *item* inside the `meta` box (rare -- AI labels are XMP) needs full `iinf`/`iloc` surgery (offset rewrite) with corruption risk -- exiftool (R/W/C for HEIC/AVIF EXIF+XMP, verified on exiftool.org 2026-05-27) would do it but is a non-installed binary dep, so it stays a documented gap.
|
||||
**`Exif` item inside the `meta` box (AVIF/HEIF), now handled in place (2026-06-19):** an AI-generator token in an EXIF item (its TIFF bytes live in `mdat`/`idat`) is blanked by `isobmff.blank_ai_exif_tokens` — it finds EXIF TIFF blocks by their II/MM byte-order header, validates each with **piexif** (a coincidental II/MM run in pixel data won't parse as a TIFF IFD, so it is ignored), and overwrites any `Software`/`Make`/`Artist`/`ImageDescription` value carrying an `AI_GENERATOR_TOKENS` token with spaces of the **same length**. Same-length means every box size and `iloc` offset stays valid and the coded image is untouched — so it avoids the full `iinf`/`iloc` surgery (offset rewrite) that exiftool would need (exiftool is a non-installed binary dep, deliberately not used). It scrubs only the AI-token value; camera/editor EXIF is preserved. Wired into `remove_ai_metadata`'s ISOBMFF path after `blank_ai_xmp_packets`. Limitation: covers the AI-generator-token case (the realistic one); a future xAI-signature-in-meta-box-EXIF (Grok is JPEG-only today) is not separately handled. **Still NOT built:** Resemble PerTh audio detection (no presence/confidence flag exists).
|
||||
|
||||
**Audio watermark DETECTION (Resemble PerTh) was evaluated and NOT built (2026-05-26):** `resemble-perth`'s `PerthImplicitWatermarker.get_watermark()` returns a raw bit-array with **no presence/confidence flag** (clean audio decodes to arbitrary bits too), so reliably distinguishing watermarked-from-clean needs either Resemble's fixed payload or a confidence API -- neither is public, and there's no real Resemble sample to calibrate against. Same wall-class as the SynthID pixel detector: the decode exists, reliable presence-detection does not. (perth's top-level `PerthImplicitWatermarker` is also gated to None unless `librosa` is importable.)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user