diff --git a/CLAUDE.md b/CLAUDE.md index 397983a..8e4d3c8 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -61,7 +61,7 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are - `invisible` pipeline processes at **native resolution by default** (`max_resolution=0`), matching the hosted raiw.cc backend (fal fast-sdxl, no pre-downscale). The old forced downscale-to-1024 -> upscale-back round-trip was the main quality loss (issue #10) and is gone; at strength ~0.05 SDXL img2img does not need the ~1024 downscale. `--max-resolution N` re-introduces an opt-in long-side cap purely to bound GPU/MPS memory on very large inputs (it reintroduces the lossy round-trip). For huge images that OOM at native, tile-based diffusion is still the proper long-term fix. **Concrete MPS data point (verified 2026-05-25 on a 1254x1254 gpt-image SDXL run, fp32, 20 GB MPS ceiling):** native res OOMs at the *UNet* step (peak ~17 GiB), not only the VAE decode, and the auto-fallback in `img2img_runner` reloads on CPU and finishes (slow, ~13 min) -- the output is still weight-identical and defeats SynthID, so "looks hung/crashed" on Mac is usually this CPU fallback, not a pipeline error. Adding `enable_vae_tiling()` alone does NOT prevent it (the peak is the UNet, not the VAE). The fast Mac workarounds are fp16 on MPS (roughly halves memory) or `--max-resolution` to cap the long side; neither is wired as the default. The native-vs-downscale decision lives in the pure helper `invisible_engine._target_size(w, h, max_resolution)` (returns `None` for native, a clamped target tuple otherwise) so it is unit-tested (`tests/test_invisible_engine.py::TestTargetSize`, the #10/#15 regression guard) without loading the model -- keep that logic in the helper, don't re-inline it. - Pyright first run is slow (2-3 min) due to ML deps (torch/diffusers/transformers stubs); full-project `uv run pyright` can stall for many minutes — scope it to changed files. - `ultralytics` monkey-patches `PIL.Image.open` and tries to autoload `pi_heif`. When `pi_heif` is missing, opening files raises `ModuleNotFoundError`, not `UnidentifiedImageError`. Code that opens user-supplied or unknown-format files should `except Exception`, not just `OSError`/`UnidentifiedImageError`. -- Metadata detection for AVIF/HEIF/JPEG-XL relies on a binary scan for `C2PA_UUID` + `IPTC_AI_MARKERS`, plus EXIF `Software` / XMP `CreatorTool` generator tags via `metadata.exif_generator` (validated with synthesized AVIF/JPEG fixtures + an XMP raw-scan fixture). C2PA removal in those containers is implemented via `noai/isobmff.py` (top-level ``uuid`` / ``jumb`` box stripper, no re-encoding), which now also drops a top-level XMP ``uuid`` box that carries an AI label (matched by AI-marker content, not by the XMP UUID, so byte-order-robust) and covers MP4/MOV/M4V/M4A by content sniff. **Metadata boundary (researched 2026-05-26, deliberately NOT built — low yield / high risk):** EXIF/XMP stored as *items inside the ``meta`` box* (typical for AVIF/HEIF images) needs meta-box surgery (iinf/iloc edit + mdat splice) with corruption risk; WebM/Matroska is EBML (different container, would need a new parser); MP3 ID3 / WAV RIFF audio tags are low-yield (audio provenance is overwhelmingly oracle-only — SynthID/ElevenLabs/Resemble — or unmarked like Suno/Udio). For these, `remove_ai_metadata` raises a clear "not supported" error (`_UNSUPPORTED_CONTAINER_EXTS`) rather than crashing; detection via the `identify` byte scan still fires when the marker is in the first MB. (These are candidates for future work, not a hard ceiling — revisit when there's demand.) +- Metadata detection for AVIF/HEIF/JPEG-XL relies on a binary scan for `C2PA_UUID` + `IPTC_AI_MARKERS`, plus EXIF `Software` / XMP `CreatorTool` generator tags via `metadata.exif_generator` (validated with synthesized AVIF/JPEG fixtures + an XMP raw-scan fixture). C2PA removal in those containers is implemented via `noai/isobmff.py` (top-level ``uuid`` / ``jumb`` box stripper, no re-encoding), which now also drops a top-level XMP ``uuid`` box that carries an AI label (matched by AI-marker content, not by the XMP UUID, so byte-order-robust) and covers MP4/MOV/M4V/M4A by content sniff. **Non-ISOBMFF audio/video removal is via ffmpeg** (`_FFMPEG_STRIP_EXTS` -> `_strip_with_ffmpeg`): WebM/Matroska (EBML), MP3 (ID3), WAV/FLAC/OGG (RIFF/Vorbis) are stripped losslessly with `ffmpeg -map_metadata -1 -map_chapters -1 -c copy` (codec data untouched). Requires ffmpeg on PATH; raises `RuntimeError` if absent or if ffmpeg can't parse the file. Verified end-to-end (a real ffmpeg-made WAV/MP3 with a `title=Suno AI` tag -> tag gone, audio bytes preserved). **Still NOT built (deliberate):** EXIF/XMP stored as *items inside the ``meta`` box* (typical for AVIF/HEIF images) needs meta-box surgery (iinf/iloc edit + mdat splice) with corruption risk -- exiftool would do it but is a non-installed binary dep, so it stays a documented gap. **Audio watermark DETECTION (Resemble PerTh) was evaluated and NOT built (2026-05-26):** `resemble-perth`'s `PerthImplicitWatermarker.get_watermark()` returns a raw bit-array with **no presence/confidence flag** (clean audio decodes to arbitrary bits too), so reliably distinguishing watermarked-from-clean needs either Resemble's fixed payload or a confidence API -- neither is public, and there's no real Resemble sample to calibrate against. Same wall-class as the SynthID pixel detector: the decode exists, reliable presence-detection does not. (perth's top-level `PerthImplicitWatermarker` is also gated to None unless `librosa` is importable.) - **SynthID detection is metadata-only.** There is no reliable *local* detector of the SynthID *pixel* watermark — Google's decoder is proprietary, no public spec or API (only a waitlisted portal). Authoritative confirmation: Google DeepMind's own paper "SynthID-Image: Image watermarking at internet scale" (Gowal et al., arXiv:2510.09263) states the verification service is restricted to "trusted testers" and does not release detector weights or a reproducible algorithm — so a local pixel detector is infeasible by design, not just unbuilt. https://arxiv.org/abs/2510.09263 We detect SynthID by its C2PA companion (`synthid_source` / `SYNTHID_C2PA_ISSUERS`), which is reliable while the manifest is intact but says nothing once C2PA is stripped. **Surface-dependent blind spot (verified 2026-05-24):** the same Google model emits different metadata per surface -- the Gemini *app* wraps outputs in Google C2PA, but the *API/playground* (AI Studio, Nano Banana / gemini-2.5-flash-image) emits the SynthID *pixel* watermark (confirmed via the Gemini-app oracle) + the visible sparkle but **no C2PA/IPTC at all**, so `synthid_source` returns None despite SynthID being present. Only the pixel oracle or the visible-sparkle detector catches those. (Meta AI is another surface mismatch: it writes the IPTC `digitalSourceType=trainedAlgorithmicMedia` marker, not C2PA and not SynthID.) Google→SynthID is long-standing; OpenAI→SynthID is confirmed by OpenAI's Help Center (ChatGPT/Codex/API "include both C2PA metadata and SynthID watermarks", updated 2026-05-21) but time-gated (pre-rollout OpenAI images carry C2PA without SynthID), so the OpenAI verdict is hedged "likely". Oracles: Gemini app "Verify with SynthID" (Google), openai.com/verify (OpenAI). The spectral phase-coherence approach from `github.com/aloshdenny/reverse-SynthID` was evaluated (May 2026) and **does not work for real-content detection**: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation **0.92**, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24, `scripts/synthid_pixel_probe.py`, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate *content* (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content. **Correction (deeper re-examination 2026-05-25):** the carrier IS real on solid fills — the earlier "no carrier" was a *method* artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed *phase* at specific low frequencies, so the right metric is **per-bin phase coherence**. On 8 white `gemini-2.5-flash-image` fills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers `(0,±7..±12,±20..±23)` = **0.86** vs **0.31** random; single-image leave-one-out phase-match **+0.83** vs real photos **-0.24**. (Black `2.5-flash` fills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.) **But it does not generalize:** (a) carriers are model-version + resolution + color specific — the repo's v4 codebook (built for `gemini-3.1-flash-image-preview` + `nano-banana-pro-preview`) scores ~0.527 on my 2.5-flash white fills, indistinguishable from negatives (~0.50), i.e. carriers shift across model versions and need a per-model codebook; (b) on real content (30 `2.5-flash` images) the carrier collapses — set phase coherence at carriers 0.37 ≈ random 0.42, and the repo's v4 detector gives content 0.518 ≈ negatives 0.504 (no separation; a faint +0.24 single-image lean is likely a brightness confound). Net: the spectral/phase approach is a real *controlled-fill* characterizer, NOT an arbitrary-real-content detector, and is brittle to model version. Metadata proxy + visible sparkle + online oracles remain the ceiling for real content. - **External AI-vs-real classifier models are out of scope (decided 2026-05-24).** Generic HuggingFace detectors (`Organika/sdxl-detector` Swin Transformer, `umm-maybe/AI-image-detector`, and fine-tunes) exist and report ~0.98 on their *own* SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency. - **SynthID v2 vs default pipeline:** the SDXL-based default profile (since May 2026) defeats SynthID v2. **Verified end-to-end (May 2026):** local SDXL run on a Gemini 3 Pro output, checked via the Gemini app's "Verify with SynthID" feature, returned "no SynthID watermark detected". Also confirmed against **OpenAI's** SynthID (2026-05-23): a fresh ChatGPT/gpt-image output read "SynthID detected" on openai.com/verify before the local SDXL run and "SynthID not detected" after (corpus regression chain: pos `4ef377bd` -> cleaned `47188e88`). The same configuration is used in raiw-app production (`fal-ai/fast-sdxl/image-to-image`, strength 0.05, steps 50, guidance 7.5, no pre-downscale). fal's own `llms.txt` for `fast-sdxl` names the base checkpoint as `stabilityai/stable-diffusion-xl-base-1.0` (verified 2026-05-25) -- the exact checkpoint the local CLI defaults to (`DEFAULT_MODEL_ID`). So the local `invisible` default is weight-for-weight identical to prod; "fast-sdxl" is fal's optimized serving, not different weights. After the native-resolution fix the local pipeline matches prod on weights + strength + steps + guidance + resolution. SD-1.5 dreamshaper at 768 px was previously the default and does NOT defeat v2 — verified empirically against the same feature (strength 0.04, 0.10, and elastic warp α∈{5,8} all flagged positive). That SD-1.5 path was removed; only `default` (SDXL) and `ctrlregen` profiles remain. **Scope of the claim: defeating the SynthID verifier is NOT the same as forensic invisibility.** "Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal" (arXiv:2605.09203, 2026-05) shows that six removal attacks across four families (UnMarker, CtrlRegen+, WatermarkAttacker, etc.) all leave forensic traces: independent detectors flag *removal-processed* images vs genuinely-clean ones at **>98% TPR at 1% FPR**. So our SDXL pass makes the oracle read "SynthID not detected," but the output can still be classifiable as "an image that went through a removal pipeline." Do not over-claim "indistinguishable from a real photo." https://arxiv.org/abs/2605.09203 diff --git a/pyproject.toml b/pyproject.toml index d64f4dd..2d62475 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "remove-ai-watermarks" -version = "0.6.3" +version = "0.6.4" description = "Remove visible and invisible AI watermarks from images (Gemini / Nano Banana, ChatGPT, Stable Diffusion)" readme = "README.md" requires-python = ">=3.10" diff --git a/src/remove_ai_watermarks/__init__.py b/src/remove_ai_watermarks/__init__.py index 2fb7892..c7509f1 100644 --- a/src/remove_ai_watermarks/__init__.py +++ b/src/remove_ai_watermarks/__init__.py @@ -1,3 +1,3 @@ """Remove-AI-Watermarks: Unified tool for removing visible and invisible AI watermarks.""" -__version__ = "0.6.3" +__version__ = "0.6.4" diff --git a/src/remove_ai_watermarks/metadata.py b/src/remove_ai_watermarks/metadata.py index 16f9cca..3456809 100644 --- a/src/remove_ai_watermarks/metadata.py +++ b/src/remove_ai_watermarks/metadata.py @@ -90,10 +90,10 @@ IPTC_AI_FIELD_MARKERS: tuple[bytes, ...] = ( # (``ftyp``) is also accepted, so this is a fast-path hint, not the sole gate. _ISOBMFF_EXTS: frozenset[str] = frozenset({".avif", ".heif", ".heic", ".jxl", ".mp4", ".mov", ".m4v", ".m4a"}) -# Non-ISOBMFF audio/video we can DETECT (binary scan) but not strip at the -# container level (EBML / framed / RIFF need re-encoding). remove_ai_metadata -# fails clearly on these rather than crashing in the image path. -_UNSUPPORTED_CONTAINER_EXTS: frozenset[str] = frozenset( +# Non-ISOBMFF audio/video the ISOBMFF box walker can't reach (EBML / framed / +# RIFF / Vorbis). remove_ai_metadata strips their container metadata losslessly +# via ffmpeg (`-c copy`), so it needs ffmpeg on PATH for these. +_FFMPEG_STRIP_EXTS: frozenset[str] = frozenset( {".webm", ".mkv", ".mka", ".mp3", ".wav", ".flac", ".ogg", ".oga", ".opus", ".aac"} ) @@ -487,6 +487,39 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]: return result +def _strip_with_ffmpeg(source_path: Path, output_path: Path) -> Path: + """Strip container metadata from a non-ISOBMFF audio/video file via ffmpeg. + + Uses a lossless stream copy (``-c copy``), so codec data is untouched and only + container-level tags/chapters are dropped -- the metadata strip for WebM / + Matroska (EBML), MP3 (ID3), WAV / FLAC / OGG (RIFF / Vorbis comments) that the + ISOBMFF box walker cannot reach. Requires ffmpeg on PATH (raises if absent). + The output extension should match the source so ``-c copy`` can re-mux. + """ + import shutil + import subprocess + + ffmpeg = shutil.which("ffmpeg") + if ffmpeg is None: + raise RuntimeError( + f"ffmpeg is required to strip metadata from {source_path.suffix} files but was not found on " + "PATH; install ffmpeg (e.g. `brew install ffmpeg`) or re-encode the file with another tool" + ) + output_path.parent.mkdir(parents=True, exist_ok=True) + cmd = [ + ffmpeg, "-y", "-loglevel", "error", + "-i", str(source_path), + "-map_metadata", "-1", "-map_chapters", "-1", + "-c", "copy", + str(output_path), + ] + result = subprocess.run(cmd, capture_output=True, text=True, check=False) # noqa: S603 + if result.returncode != 0: + raise RuntimeError(f"ffmpeg failed to strip metadata from {source_path}: {result.stderr.strip()[:300]}") + logger.info("Stripped container metadata via ffmpeg -> %s", output_path) + return output_path + + def remove_ai_metadata( source_path: Path, output_path: Path | None = None, @@ -530,15 +563,11 @@ def remove_ai_metadata( logger.info("Stripped %d AI-provenance box(es) → %s", stripped, output_path) return output_path - # Containers we can detect (via identify's byte scan) but cannot strip at the - # container level: non-ISOBMFF audio/video (Matroska/WebM are EBML; MP3 is - # framed; WAV is RIFF). Re-encoding them is out of scope, so fail clearly - # rather than crash in the PIL image path below. - if source_path.suffix.lower() in _UNSUPPORTED_CONTAINER_EXTS: - raise ValueError( - f"container-level metadata removal is not supported for {source_path.suffix} " - "(detection via `identify` still works); re-encode it with a media tool to strip metadata" - ) + # Non-ISOBMFF audio/video (WebM/Matroska EBML, MP3 ID3, WAV/FLAC/OGG): the + # box walker can't reach these, so strip container metadata losslessly via + # ffmpeg (-c copy -- codec data untouched, only tags/chapters dropped). + if source_path.suffix.lower() in _FFMPEG_STRIP_EXTS: + return _strip_with_ffmpeg(source_path, output_path) # Read image and filter metadata with Image.open(source_path) as img: diff --git a/tests/test_metadata.py b/tests/test_metadata.py index 8904451..f2cf570 100644 --- a/tests/test_metadata.py +++ b/tests/test_metadata.py @@ -2,6 +2,8 @@ from __future__ import annotations +import shutil +import subprocess from pathlib import Path import piexif @@ -699,9 +701,35 @@ class TestIsobmffMetadataRemoval: remove_ai_metadata(src, out) assert out.read_bytes() == _MP4_FTYP + _MP4_MDAT - def test_unsupported_container_raises(self, tmp_path: Path): + def test_unparseable_audio_raises(self, tmp_path: Path): + # Garbage that ffmpeg can't parse must raise a clear error, not crash in + # the image path. (When ffmpeg is absent this still raises RuntimeError.) src = tmp_path / "audio.mp3" - src.write_bytes(b"ID3\x04\x00\x00\x00\x00\x00\x00 fake mp3 frames") + src.write_bytes(b"ID3\x04\x00\x00\x00\x00\x00\x00 not real mp3 frames") out = tmp_path / "out.mp3" - with pytest.raises(ValueError, match="not supported"): + with pytest.raises(RuntimeError): remove_ai_metadata(src, out) + + +@pytest.mark.skipif(shutil.which("ffmpeg") is None, reason="ffmpeg not installed") +class TestFfmpegMetadataStrip: + """Lossless container-metadata strip for non-ISOBMFF audio/video via ffmpeg.""" + + def _wav_with_tag(self, path: Path, tag: str = "Suno AI") -> None: + subprocess.run( # noqa: S603 + [ + shutil.which("ffmpeg"), "-y", "-loglevel", "error", + "-f", "lavfi", "-i", "sine=frequency=440:duration=0.1", + "-metadata", f"title={tag}", str(path), + ], + check=True, + ) + + def test_strips_wav_title_metadata(self, tmp_path: Path): + src = tmp_path / "in.wav" + self._wav_with_tag(src, "Suno AI generated") + assert b"Suno AI generated" in src.read_bytes() # tag is present pre-strip + out = tmp_path / "clean.wav" + remove_ai_metadata(src, out) + assert out.exists() + assert b"Suno AI generated" not in out.read_bytes() # tag stripped, audio kept diff --git a/uv.lock b/uv.lock index 358702c..931c9af 100644 --- a/uv.lock +++ b/uv.lock @@ -2865,7 +2865,7 @@ wheels = [ [[package]] name = "remove-ai-watermarks" -version = "0.6.3" +version = "0.6.4" source = { editable = "." } dependencies = [ { name = "click" },