remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-07-21 23:20:51 +02:00

Files

T

Victor Kuznetsov 25a1acc53b Detect TC260 AIGC label in JPEG EXIF and late/attribute PNG XMP

A corpus audit surfaced China TC260 AIGC-labeled images that `identify`
missed. Three detection gaps in `aigc_label`, all fixed:

- raw-JSON `{"AIGC":{...}}` in JPEG EXIF (UserComment): brace-matched from
  the scan head with `json.raw_decode`, gated on a TC260 field like the
  PNG-chunk path. (Doubao-class output via that export surface.)
- XMP attribute form `TC260:AIGC="{...}"` (PicWish): folded into the
  element regex as a second alternation.
- TC260 XMP packet appended after a large `IDAT`, past the 1 MB scan
  window: `scan_head` now appends late PNG metadata chunks via
  `_png_late_metadata`, mirroring the existing ISOBMFF late-box scan.

Adds `scripts/corpus_gap_scan.py`: runs `identify` over a corpus, writes
the per-file report CSV, and flags `unknown` files that carry a known
marker in their metadata region (the audit that found these gaps).
Scanning only the metadata region — not the whole file — avoids the
random short-token collisions inside compressed PNG/JPEG streams.

On the local corpus this lifts 3 files from `unknown` to AI (China AIGC)
and leaves zero false gap candidates. Synthetic piexif/PngInfo fixtures
cover all three forms.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 11:44:53 -07:00

corpus_gap_scan.py

Detect TC260 AIGC label in JPEG EXIF and late/attribute PNG XMP

2026-05-30 11:44:53 -07:00

synthid_corpus.py

feat(corpus): read HEIC dimensions via macOS sips

2026-05-24 15:55:11 -07:00

synthid_pixel_probe.py

feat(probe): solid-fill SynthID carrier probe; corpus reconfirms no pixel detector

2026-05-24 16:35:39 -07:00

text_detection_benchmark.py

Detect text at native resolution (capped), fixing small-text recall on large images (#27 )

2026-05-29 12:28:30 -07:00