mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-05 02:28:00 +02:00
31f0a82906
Provenance detection no longer relies on a fixed first-MB read. In a streaming / non-faststart MP4 the C2PA manifest sits AFTER a multi-megabyte mdat, beyond the 1 MB scan window, so it was missed. - isobmff.scan_c2pa_region(path): a file-seeking top-level box walker that returns the payloads of uuid/jumb (provenance) boxes, seeking past mdat by size without reading it -- works on multi-GB files. Returns b"" for non-ISOBMFF or on read error. Mirrors the box-size encoding of the existing in-memory _iter_top_level_boxes (largesize / size==0). - metadata.scan_head(path, size): the shared input for every C2PA/AIGC/IPTC byte scan -- first __TEXT __DATA __OBJC others dec hex bytes plus, for ISOBMFF, the late provenance-box payloads. Behavior-neutral (f.read(size)) for non-ISOBMFF inputs. - Routed all six metadata scan sites (has_ai_metadata, aigc_label, iptc_ai_system, synthid_source, exif_generator XMP, get_ai_metadata soft-binding) and identify's head read through scan_head. 6 new tests: late box found by scan_c2pa_region / scan_head, the fixed window provably misses it, non-ISOBMFF -> b"", front-placed (faststart) regression. The remaining gap stays documented: EXIF/XMP stored as items inside the meta box (AVIF/HEIF stills) still needs meta-box surgery or exiftool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>