mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-05 02:28:00 +02:00
feat: detect soft-binding vendors, IPTC 2025.1, video/audio C2PA, TrustMark (v0.6.0)
Broadens metadata provenance coverage at the detection and container-strip level. Detection: - C2PA soft-binding `alg` -> forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...) via C2PA_SOFT_BINDINGS + soft_binding_vendors_in(); names the watermark vendor even when the watermark itself can't be decoded. - IPTC Photo Metadata 2025.1 AI-disclosure XMP fields (AISystemUsed etc.) via iptc_ai_system() + IPTC_AI_FIELD_MARKERS. - Adobe TrustMark open keyless decoder (trustmark_detector.py, optional extra `trustmark`) -- the watermark behind Adobe Durable Content Credentials. Detects provenance, not AI origin, so it does not assert is_ai. Removal / containers: - isobmff.strip_c2pa_boxes now also drops a top-level XMP uuid box that carries an AI label (matched by AI-marker content, byte-order-robust; plain XMP kept). - remove_ai_metadata routes MP4/MOV/M4V/M4A (and any ftyp-sniffed ISOBMFF) through the box stripper; raises a clear error for non-ISOBMFF audio/video (WebM/MP3/WAV) instead of crashing in the image path. Tests: soft-binding scan, IPTC element/attribute/presence, MP4 + M4A detect/ strip, ISOBMFF XMP surgical strip, content-sniff, unsupported-container guard, TrustMark absent-safety + identify integration. ruff clean; pyright clean on all new modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -16,13 +16,13 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
|
||||
- **Visible watermark removal** — Gemini / Nano Banana sparkle logo via reverse alpha blending (fast, offline, deterministic)
|
||||
- **Invisible watermark removal** — SynthID, StableSignature, TreeRing via diffusion-based regeneration (needs a local GPU, or run it with no setup on [raiw.cc](https://raiw.cc))
|
||||
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL), XMP DigitalSourceType
|
||||
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, and **MP4 / MOV / M4V video** at the container level), XMP DigitalSourceType
|
||||
- **"Made with AI" label removal** — removes the metadata that triggers AI labels on Instagram, Facebook, X (Twitter)
|
||||
- **Analog Humanizer** — film grain and chromatic aberration to bypass AI image classifiers
|
||||
- **Smart Face Protection** — automatic extraction and blending of human faces to prevent AI distortion
|
||||
- **Batch processing** — process entire directories
|
||||
- **Detection** — three-stage NCC watermark detection with confidence scoring
|
||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, IPTC "Made with AI", embedded SD/ComfyUI params, EXIF/XMP generator tags, the SynthID metadata proxy, the visible sparkle, and the open SD/SDXL/FLUX invisible watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible sparkle, the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -51,7 +51,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
|
||||
> Visible watermarks (logo overlays) are currently used only by Google Gemini / Nano Banana. Other services rely on invisible watermarks and/or metadata. Our diffusion-based regeneration works against any invisible watermark in pixel or frequency domain.
|
||||
|
||||
> **Detection:** `remove-ai-watermarks identify <image>` reports the origin platform and watermark inventory for all the signals above — C2PA issuer, IPTC "Made with AI", the China TC260 AIGC label, embedded generation params, EXIF/XMP generator tags, the SynthID metadata proxy, the visible sparkle, and (with the `[detect]` extra) the open SD/SDXL/FLUX invisible watermark. The SynthID *pixel* watermark has no local decoder, so it is reported as a metadata proxy only.
|
||||
> **Detection:** `remove-ai-watermarks identify <image>` reports the origin platform and watermark inventory for all the signals above — C2PA issuer, the C2PA soft-binding forensic-watermark vendor (TrustMark / Digimarc / Imatag / ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, the China TC260 AIGC label, embedded generation params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible sparkle, and (with the `[detect]` / `[trustmark]` extras) the open SD/SDXL/FLUX and Adobe TrustMark invisible watermarks. SynthID and the proprietary soft-binding watermarks (Digimarc etc.) have no local decoder, so they are reported by metadata proxy / vendor name only.
|
||||
|
||||
## How it works
|
||||
|
||||
@@ -159,6 +159,14 @@ After installation the `remove-ai-watermarks` command is available system-wide.
|
||||
> ```bash
|
||||
> pip install -e ".[detect]" # or: uv pip install -e ".[detect]"
|
||||
> ```
|
||||
>
|
||||
> To also decode the open **Adobe TrustMark** watermark (behind Adobe Durable
|
||||
> Content Credentials), install the `trustmark` extra (pulls torch and downloads
|
||||
> model weights on first use):
|
||||
>
|
||||
> ```bash
|
||||
> pip install -e ".[trustmark]" # or: uv pip install -e ".[trustmark]"
|
||||
> ```
|
||||
|
||||
#### Invisible watermark removal
|
||||
|
||||
@@ -315,7 +323,8 @@ Watermarking and provenance for AI-generated content is now regulated in several
|
||||
| US (state) | CA AB 2655, TX SB 751, similar | TX SB 751 in force; **CA AB 2655 struck down** by a federal court (Aug 2025, Section 230 / First Amendment). | Content-specific (election deepfakes, sexual deepfakes). Not tool-specific. |
|
||||
| US (state) | CA AB 853 (amends the California AI Transparency Act) | Core provider duties operative **2 August 2026** (delayed from 1 January 2026); large platforms 1 January 2027; capture devices 1 January 2028. | Covered providers (1M+ monthly users) must embed a latent disclosure that is "permanent or extraordinarily difficult to remove" and offer a free detection tool. Removing that disclosure is what this tool does. |
|
||||
| South Korea | AI Framework Act (Basic Act on AI), Article 31 | In force since **January 2026** (one-year transition after promulgation). | Art. 31(3): AI output "difficult to distinguish from reality" must be labeled so users "clearly recognize" it; the draft Enforcement Decree accepts a machine-readable (invisible-watermark) label. Artistic/creative works get a presentation exception. |
|
||||
| China | Measures for Labeling AI-Generated Content (+ GB 45438-2025) | In force since **1 September 2025**. | Mandatory explicit (visible) + implicit (metadata) labels for AI content; tampering with or removing labels is prohibited. |
|
||||
| China | Measures for Labeling AI-Generated Content (+ GB 45438-2025) | In force since **1 September 2025**. | Mandatory explicit (visible) + implicit (metadata) labels across image / audio / video; tampering with, forging, or removing labels is prohibited. |
|
||||
| India | IT (Intermediary Guidelines and Digital Media Ethics Code) Amendment Rules, 2026 | In force since **20 February 2026** (notified 10 February 2026). | All "synthetically generated information" must be **prominently labelled** and carry **permanent metadata / a provenance identifier**; the rules expressly **prohibit modifying, suppressing, or removing** that label or metadata. Covers image, audio, and audio-visual content. |
|
||||
| UK | Online Safety Act 2023 / Ofcom guidance | In force, but **no statutory AI-provenance or watermarking obligation**. | Ofcom encourages watermarking / provenance metadata as voluntary "attribution measures"; platform duties, not user obligations. |
|
||||
|
||||
## Threat model
|
||||
@@ -340,7 +349,12 @@ This tool is intended for legitimate purposes such as:
|
||||
- Removing false-positive "Made with AI" labels from human-edited photographs.
|
||||
- Security research and watermark robustness study.
|
||||
|
||||
Removing AI provenance markers to misrepresent AI-generated content as human-created may violate the laws above, the DMCA, and platform terms of service. Users are solely responsible for ensuring their use complies with all applicable laws. The authors do not condone use of this tool for deception, fraud, or any activity that violates applicable laws or regulations.
|
||||
**Who bears the liability.** This is general-purpose software and is itself lawful to publish and run; legal responsibility attaches to the person who removes a marker and to how the result is then used, and the hinge is intent. Removing AI provenance to pass AI-generated content off as human-made, to commit fraud, to produce non-consensual deepfakes, or to conceal copyright infringement can expose the remover to liability. Two kinds of exposure are worth knowing:
|
||||
|
||||
- **The downstream act.** Deception, fraud, defamation, IP infringement, or breaking a platform's terms — judged by intent and harm, not by the act of editing metadata itself. In the US, the **DMCA (17 U.S.C. § 1202)** specifically bars removing "copyright management information" *with intent to conceal or enable infringement*.
|
||||
- **The removal itself.** Some jurisdictions penalise tampering with the label/metadata as such, regardless of downstream use — notably **China** (Labeling Measures) and **India** (IT Amendment Rules 2026), which expressly prohibit removing or suppressing the AI label and provenance metadata. The US **COPIED Act** would do the same if enacted.
|
||||
|
||||
Legitimate uses — publishing your own work, privacy (stripping metadata that leaks an account identifier), security / robustness research, or removing a false-positive "Made with AI" label from a human-edited photograph — are generally lawful. Users are solely responsible for ensuring their use complies with all applicable laws. The authors do not condone use of this tool for deception, fraud, or any activity that violates applicable laws or regulations. None of this is legal advice.
|
||||
|
||||
## License
|
||||
|
||||
|
||||
+17
-2
@@ -1,6 +1,6 @@
|
||||
[project]
|
||||
name = "remove-ai-watermarks"
|
||||
version = "0.5.6"
|
||||
version = "0.6.0"
|
||||
description = "Remove visible and invisible AI watermarks from images (Gemini / Nano Banana, ChatGPT, Stable Diffusion)"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
@@ -36,6 +36,21 @@ gpu = [
|
||||
detect = [
|
||||
"invisible-watermark>=0.2.0",
|
||||
]
|
||||
# Adobe TrustMark decoder -- the open, keyless watermark behind Adobe Durable
|
||||
# Content Credentials (soft-binding alg ``com.adobe.trustmark.P``). Optional
|
||||
# because it pulls torch and downloads model weights on first use. identify()
|
||||
# guards the import and skips the TrustMark signal when absent.
|
||||
trustmark = [
|
||||
"trustmark>=0.8.0",
|
||||
]
|
||||
# Universal region eraser backend -- big-LaMa via onnxruntime (Carve/LaMa-ONNX,
|
||||
# Apache-2.0). CPU, no torch. Model (~200 MB) is downloaded on first use and
|
||||
# cached by huggingface_hub; it is never bundled in this repo. The default cv2
|
||||
# eraser backend needs none of this.
|
||||
lama = [
|
||||
"onnxruntime>=1.16.0",
|
||||
"huggingface-hub>=0.20.0",
|
||||
]
|
||||
dev = [
|
||||
"pytest>=8.0.0",
|
||||
"pytest-cov>=4.1.0",
|
||||
@@ -43,7 +58,7 @@ dev = [
|
||||
"pyright>=1.1.0",
|
||||
"invisible-watermark>=0.2.0",
|
||||
]
|
||||
all = ["remove-ai-watermarks[gpu,detect,dev]"]
|
||||
all = ["remove-ai-watermarks[gpu,detect,trustmark,lama,dev]"]
|
||||
|
||||
# diffusers 0.38.0 (security fix for GHSA-98h9-4798-4q5v) declares a dependency
|
||||
# on safetensors>=0.8.0rc0 — a pre-release. Allow pre-releases globally so the
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
"""Remove-AI-Watermarks: Unified tool for removing visible and invisible AI watermarks."""
|
||||
|
||||
__version__ = "0.5.6"
|
||||
__version__ = "0.6.0"
|
||||
|
||||
@@ -26,13 +26,15 @@ from remove_ai_watermarks.metadata import (
|
||||
AI_METADATA_KEYS,
|
||||
AIGC_MARKERS,
|
||||
C2PA_UUID,
|
||||
IPTC_AI_FIELD_MARKERS,
|
||||
IPTC_AI_MARKERS,
|
||||
aigc_label,
|
||||
exif_generator,
|
||||
get_ai_metadata,
|
||||
iptc_ai_system,
|
||||
xai_signature,
|
||||
)
|
||||
from remove_ai_watermarks.noai.c2pa import extract_c2pa_info
|
||||
from remove_ai_watermarks.noai.c2pa import extract_c2pa_info, soft_binding_vendors_in
|
||||
from remove_ai_watermarks.noai.constants import C2PA_AI_TOOLS, C2PA_ISSUERS
|
||||
|
||||
if TYPE_CHECKING:
|
||||
@@ -162,6 +164,17 @@ def _invisible_watermark(image_path: Path) -> str | None:
|
||||
return detect_invisible_watermark(image_path)
|
||||
|
||||
|
||||
def _trustmark(image_path: Path) -> str | None:
|
||||
"""Adobe TrustMark scheme name or None.
|
||||
|
||||
Optional: needs the ``trustmark`` decoder (extra ``trustmark``). Returns None
|
||||
if it is not installed or no TrustMark watermark decodes.
|
||||
"""
|
||||
from remove_ai_watermarks.trustmark_detector import detect_trustmark
|
||||
|
||||
return detect_trustmark(image_path)
|
||||
|
||||
|
||||
def identify(image_path: Path, *, check_visible: bool = True, check_invisible: bool = True) -> ProvenanceReport:
|
||||
"""Identify an image's origin platform and watermark inventory.
|
||||
|
||||
@@ -213,6 +226,14 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
if "OpenAI" in (" ".join(issuers) + synthid):
|
||||
caveats.append(_OPENAI_CAVEAT)
|
||||
|
||||
# ── C2PA soft-binding: a named forensic/third-party watermark vendor ─
|
||||
# (Adobe TrustMark, Digimarc, Imatag, ...). Present in the manifest even when
|
||||
# the watermark itself can't be decoded; names whose watermark stamped the pixels.
|
||||
soft_binding = meta.get("soft_binding") or (", ".join(v) if (v := soft_binding_vendors_in(head)) else None)
|
||||
if soft_binding:
|
||||
signals.append(Signal("soft_binding", f"C2PA soft binding: {soft_binding}", "high"))
|
||||
watermarks.append(f"Forensic watermark soft binding ({soft_binding})")
|
||||
|
||||
# ── IPTC "Made with AI" (Meta etc.), only meaningful without C2PA ─
|
||||
iptc = any(m in head for m in IPTC_AI_MARKERS)
|
||||
if iptc and not has_c2pa:
|
||||
@@ -222,6 +243,18 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
if platform is None:
|
||||
platform = "Made-with-AI tag (e.g. Meta AI); platform not specified"
|
||||
|
||||
# ── IPTC 2025.1 AI-disclosure fields (Iptc4xmpExt:AISystemUsed etc.) ─
|
||||
iptc_ai = any(m in head for m in IPTC_AI_FIELD_MARKERS)
|
||||
if iptc_ai:
|
||||
system = iptc_ai_system(image_path)
|
||||
named = bool(system) and system != "fields present"
|
||||
signals.append(
|
||||
Signal("iptc_ai_system", f"IPTC AI disclosure ({system})" if named else "IPTC AI disclosure fields", "high")
|
||||
)
|
||||
watermarks.append(f"IPTC 2025.1 AI disclosure ({system})" if named else "IPTC 2025.1 AI disclosure fields")
|
||||
if platform is None and named:
|
||||
platform = f"{system} (IPTC AISystemUsed)"
|
||||
|
||||
# ── China TC260 AIGC label (Doubao and other China-served gens) ──
|
||||
aigc = any(m in head for m in AIGC_MARKERS)
|
||||
if aigc:
|
||||
@@ -266,12 +299,29 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
if platform is None:
|
||||
platform = f"{scheme} (open DWT-DCT watermark)"
|
||||
|
||||
# ── Adobe TrustMark invisible watermark (open decoder, no key) ───
|
||||
# The watermark behind Adobe Durable Content Credentials. Decoded locally,
|
||||
# but it binds provenance for human-authored content too, so it enriches the
|
||||
# watermark inventory without by itself asserting AI origin.
|
||||
if check_invisible and (tm_scheme := _trustmark(image_path)) is not None:
|
||||
signals.append(Signal("trustmark", tm_scheme, "high"))
|
||||
watermarks.append(f"Adobe TrustMark invisible watermark ({tm_scheme})")
|
||||
if platform is None:
|
||||
platform = "Adobe (TrustMark / Content Credentials)"
|
||||
|
||||
# ── Verdict so far (metadata + embedded watermark) ──────────────
|
||||
invisible_wm = any(s.name == "invisible_watermark" for s in signals)
|
||||
exif_gen = any(s.name == "exif_generator" for s in signals)
|
||||
xai_sig = any(s.name == "xai_signature" for s in signals)
|
||||
ai_from_metadata = bool(
|
||||
(has_c2pa and (c2pa_is_ai or synthid)) or iptc or aigc or local_keys or invisible_wm or exif_gen or xai_sig
|
||||
(has_c2pa and (c2pa_is_ai or synthid))
|
||||
or iptc
|
||||
or iptc_ai
|
||||
or aigc
|
||||
or local_keys
|
||||
or invisible_wm
|
||||
or exif_gen
|
||||
or xai_sig
|
||||
)
|
||||
|
||||
# ── Visible Gemini sparkle (fallback for stripped-metadata case) ─
|
||||
|
||||
@@ -74,6 +74,29 @@ IPTC_AI_MARKERS: tuple[bytes, ...] = (
|
||||
b"compositeWithTrainedAlgorithmicMedia",
|
||||
)
|
||||
|
||||
# IPTC Photo Metadata 2025.1 (published 2025-11-27) added explicit AI-disclosure
|
||||
# XMP properties in the Iptc4xmpExt namespace. Their mere presence is an AI
|
||||
# signal; ``AISystemUsed`` additionally carries the generator name. Property
|
||||
# tokens verified against the IPTC 2025.1 specification.
|
||||
IPTC_AI_FIELD_MARKERS: tuple[bytes, ...] = (
|
||||
b"AISystemUsed",
|
||||
b"AISystemVersionUsed",
|
||||
b"AIPromptInformation",
|
||||
b"AIPromptWriterName",
|
||||
)
|
||||
|
||||
# ISOBMFF containers whose AI-provenance boxes ``remove_ai_metadata`` strips at
|
||||
# the container level (image, video, audio -- all ISOBMFF). A content sniff
|
||||
# (``ftyp``) is also accepted, so this is a fast-path hint, not the sole gate.
|
||||
_ISOBMFF_EXTS: frozenset[str] = frozenset({".avif", ".heif", ".heic", ".jxl", ".mp4", ".mov", ".m4v", ".m4a"})
|
||||
|
||||
# Non-ISOBMFF audio/video we can DETECT (binary scan) but not strip at the
|
||||
# container level (EBML / framed / RIFF need re-encoding). remove_ai_metadata
|
||||
# fails clearly on these rather than crashing in the image path.
|
||||
_UNSUPPORTED_CONTAINER_EXTS: frozenset[str] = frozenset(
|
||||
{".webm", ".mkv", ".mka", ".mp3", ".wav", ".flac", ".ogg", ".oga", ".opus", ".aac"}
|
||||
)
|
||||
|
||||
# China's mandatory AI-content labeling (TC260, the national cybersecurity
|
||||
# standards committee). AI generators serving China embed an XMP block in the
|
||||
# TC260 namespace -- ``<TC260:AIGC>{"Label":"1",...}``. Doubao (ByteDance) uses
|
||||
@@ -155,6 +178,9 @@ def has_ai_metadata(image_path: Path) -> bool:
|
||||
return True
|
||||
if any(marker in data for marker in IPTC_AI_MARKERS):
|
||||
return True
|
||||
# IPTC 2025.1 AI-disclosure XMP properties (their presence flags AI content).
|
||||
if any(marker in data for marker in IPTC_AI_FIELD_MARKERS):
|
||||
return True
|
||||
# xAI / Grok: no C2PA/IPTC/XMP -- only the EXIF Signature + UUID-Artist pair.
|
||||
return xai_signature(image_path)
|
||||
|
||||
@@ -183,6 +209,26 @@ def aigc_label(image_path: Path) -> dict[str, str] | None:
|
||||
return {str(k): str(v) for k, v in parsed.items()} if isinstance(parsed, dict) else None
|
||||
|
||||
|
||||
def iptc_ai_system(image_path: Path) -> str | None:
|
||||
"""Return an IPTC 2025.1 AI-disclosure note if the file carries those XMP
|
||||
properties, else None.
|
||||
|
||||
IPTC Photo Metadata 2025.1 added ``Iptc4xmpExt`` AI-disclosure properties
|
||||
(see ``IPTC_AI_FIELD_MARKERS``); their presence alone flags AI content, and
|
||||
``AISystemUsed`` names the generator. Returns the ``AISystemUsed`` value when
|
||||
extractable, otherwise the literal ``"fields present"``. Container-agnostic
|
||||
raw-byte scan; handles both XMP element and attribute serializations.
|
||||
"""
|
||||
with open(image_path, "rb") as f:
|
||||
data = f.read(1024 * 1024)
|
||||
if not any(marker in data for marker in IPTC_AI_FIELD_MARKERS):
|
||||
return None
|
||||
match = re.search(rb"AISystemUsed[=:\s]*[\"'>]\s*([^<\"']{1,120})", data)
|
||||
if match and (value := match.group(1).decode("utf-8", "replace").strip()):
|
||||
return value
|
||||
return "fields present"
|
||||
|
||||
|
||||
def synthid_source(image_path: Path) -> str | None:
|
||||
"""Return the vendor name(s) if the image carries a SynthID pixel watermark.
|
||||
|
||||
@@ -380,7 +426,7 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
|
||||
"""
|
||||
from PIL import Image
|
||||
|
||||
from remove_ai_watermarks.noai.c2pa import extract_c2pa_info, synthid_verdict
|
||||
from remove_ai_watermarks.noai.c2pa import extract_c2pa_info, soft_binding_vendors_in, synthid_verdict
|
||||
|
||||
result: dict[str, str] = {}
|
||||
|
||||
@@ -410,14 +456,21 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
|
||||
"source_type",
|
||||
"actions",
|
||||
"synthid_watermark",
|
||||
"soft_binding",
|
||||
):
|
||||
if key in c2pa:
|
||||
result.setdefault(key, str(c2pa[key]))
|
||||
|
||||
# Non-PNG containers (JPEG/WebP/AVIF): extract_c2pa_info is PNG-only, so
|
||||
# fall back to the format-agnostic source check for the SynthID verdict.
|
||||
# Non-PNG containers (JPEG/WebP/AVIF/MP4): extract_c2pa_info is PNG-only, so
|
||||
# fall back to the format-agnostic source check for the SynthID verdict and
|
||||
# the soft-binding (forensic-watermark vendor) scan.
|
||||
if "synthid_watermark" not in result and (vendor := synthid_source(image_path)):
|
||||
result.setdefault("synthid_watermark", synthid_verdict(vendor))
|
||||
if "soft_binding" not in result:
|
||||
with open(image_path, "rb") as f:
|
||||
head = f.read(1024 * 1024)
|
||||
if vendors := soft_binding_vendors_in(head):
|
||||
result["soft_binding"] = ", ".join(vendors)
|
||||
|
||||
# China TC260 AI-content label (Doubao and other China-served generators).
|
||||
if aigc := aigc_label(image_path):
|
||||
@@ -427,6 +480,10 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
|
||||
# xAI / Grok EXIF signature scheme (its only provenance signal).
|
||||
if xai_signature(image_path):
|
||||
result.setdefault("xai_signature", "xAI/Grok EXIF signature (Artist UUID + Signature blob)")
|
||||
|
||||
# IPTC 2025.1 AI-disclosure XMP fields (Iptc4xmpExt:AISystemUsed etc.).
|
||||
if system := iptc_ai_system(image_path):
|
||||
result.setdefault("ai_system", f"IPTC 2025.1 AI disclosure ({system})")
|
||||
return result
|
||||
|
||||
|
||||
@@ -455,19 +512,34 @@ def remove_ai_metadata(
|
||||
if output_path is None:
|
||||
output_path = source_path
|
||||
|
||||
# AVIF/HEIF/JPEG-XL: strip C2PA boxes at the container level without
|
||||
# re-encoding. Avoids needing PIL plugins (pillow-heif / pillow-jxl) and
|
||||
# preserves pixel data bit-for-bit.
|
||||
if source_path.suffix.lower() in (".avif", ".heif", ".heic", ".jxl"):
|
||||
from remove_ai_watermarks.noai.isobmff import strip_c2pa_boxes
|
||||
# ISOBMFF containers (AVIF/HEIF/JPEG-XL images, MP4/MOV/M4V video, M4A audio):
|
||||
# strip C2PA + AI-label boxes at the container level without re-encoding.
|
||||
# Avoids needing PIL plugins (pillow-heif / pillow-jxl) and preserves the
|
||||
# codestream bit-for-bit. MP4/MOV/M4A are ISOBMFF too, so the same top-level
|
||||
# uuid/jumb box walker applies. Route by suffix OR by an ``ftyp`` content
|
||||
# sniff, so a correctly-shaped container is handled whatever its extension.
|
||||
from remove_ai_watermarks.noai.isobmff import is_isobmff, strip_c2pa_boxes
|
||||
|
||||
with open(source_path, "rb") as f:
|
||||
head = f.read(12)
|
||||
if source_path.suffix.lower() in _ISOBMFF_EXTS or is_isobmff(head):
|
||||
data = source_path.read_bytes()
|
||||
cleaned, stripped = strip_c2pa_boxes(data)
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_path.write_bytes(cleaned)
|
||||
logger.info("Stripped %d C2PA box(es) → %s", stripped, output_path)
|
||||
logger.info("Stripped %d AI-provenance box(es) → %s", stripped, output_path)
|
||||
return output_path
|
||||
|
||||
# Containers we can detect (via identify's byte scan) but cannot strip at the
|
||||
# container level: non-ISOBMFF audio/video (Matroska/WebM are EBML; MP3 is
|
||||
# framed; WAV is RIFF). Re-encoding them is out of scope, so fail clearly
|
||||
# rather than crash in the PIL image path below.
|
||||
if source_path.suffix.lower() in _UNSUPPORTED_CONTAINER_EXTS:
|
||||
raise ValueError(
|
||||
f"container-level metadata removal is not supported for {source_path.suffix} "
|
||||
"(detection via `identify` still works); re-encode it with a media tool to strip metadata"
|
||||
)
|
||||
|
||||
# Read image and filter metadata
|
||||
with Image.open(source_path) as img:
|
||||
img = img.copy()
|
||||
|
||||
@@ -28,6 +28,7 @@ from remove_ai_watermarks.noai.constants import (
|
||||
C2PA_CHUNK_TYPE,
|
||||
C2PA_ISSUERS,
|
||||
C2PA_SIGNATURES,
|
||||
C2PA_SOFT_BINDINGS,
|
||||
PNG_SIGNATURE,
|
||||
SYNTHID_C2PA_ISSUERS,
|
||||
)
|
||||
@@ -174,6 +175,18 @@ def synthid_vendors_in(buffer: bytes) -> list[str]:
|
||||
return sorted({name for sig, name in C2PA_ISSUERS.items() if sig in buffer and sig in SYNTHID_C2PA_ISSUERS})
|
||||
|
||||
|
||||
def soft_binding_vendors_in(buffer: bytes) -> list[str]:
|
||||
"""Return forensic-watermark vendor names whose C2PA soft-binding ``alg``
|
||||
identifier appears in ``buffer``.
|
||||
|
||||
A ``c2pa.soft-binding`` assertion names the watermark scheme that stamped the
|
||||
pixels (Adobe TrustMark, Digimarc, Imatag, Steg.AI, ...). Shared by the PNG
|
||||
caBX parser and the format-agnostic binary scan so both apply the same
|
||||
C2PA_SOFT_BINDINGS rule against their respective bytes.
|
||||
"""
|
||||
return sorted({name for sig, name in C2PA_SOFT_BINDINGS.items() if sig in buffer})
|
||||
|
||||
|
||||
def _parse_c2pa_chunk(chunk_data: bytes, c2pa_info: dict[str, Any]) -> None:
|
||||
"""Parse C2PA chunk data and populate info dictionary."""
|
||||
c2pa_info["c2pa_manifest"] = f"C2PA manifest ({len(chunk_data)} bytes)"
|
||||
@@ -238,6 +251,13 @@ def _parse_c2pa_chunk(chunk_data: bytes, c2pa_info: dict[str, Any]) -> None:
|
||||
c2pa_info["synthid_vendors"] = synthid_vendors
|
||||
c2pa_info["synthid_watermark"] = synthid_verdict(", ".join(synthid_vendors))
|
||||
|
||||
# Soft-binding: a forensic/third-party watermark vendor named in the
|
||||
# manifest (Adobe TrustMark, Digimarc, ...), independent of the issuer.
|
||||
soft_binding_vendors = soft_binding_vendors_in(chunk_data)
|
||||
if soft_binding_vendors:
|
||||
c2pa_info["soft_binding_vendors"] = soft_binding_vendors
|
||||
c2pa_info["soft_binding"] = ", ".join(soft_binding_vendors)
|
||||
|
||||
|
||||
def extract_c2pa_chunk(image_path: Path) -> bytes | None:
|
||||
"""
|
||||
|
||||
@@ -122,6 +122,26 @@ C2PA_AI_TOOLS = {
|
||||
b"Firefly": "Firefly",
|
||||
}
|
||||
|
||||
# C2PA ``c2pa.soft-binding`` algorithm identifiers -> the forensic-watermark
|
||||
# vendor that stamped the pixels. The manifest's ``alg`` field names the
|
||||
# watermark scheme even when the watermark itself cannot be decoded locally, so
|
||||
# a byte-scan for these (keyed on a distinctive prefix to catch all variants)
|
||||
# tells us a third-party forensic watermark is present and whose. Verified
|
||||
# against the official C2PA registry (github.com/c2pa-org/softbinding-algorithm-list).
|
||||
# Adobe TrustMark is additionally decodable locally (see ``trustmark_detector``);
|
||||
# the rest (Digimarc, Imatag, Steg.AI, etc.) are proprietary oracle-only decoders.
|
||||
C2PA_SOFT_BINDINGS = {
|
||||
b"com.adobe.trustmark": "Adobe TrustMark",
|
||||
b"com.digimarc": "Digimarc",
|
||||
b"com.imatag.lamark": "Imatag (Lamark)",
|
||||
b"ai.steg": "Steg.AI",
|
||||
b"com.microsoft.invismark": "Microsoft InvisMark",
|
||||
b"com.microsoft.wavmark": "Microsoft WavMark",
|
||||
b"com.verimatrix": "Verimatrix",
|
||||
b"com.nagra.nexguard": "NAGRA NexGuard",
|
||||
b"com.aiwatermark": "AIWatermark",
|
||||
}
|
||||
|
||||
# Lowercased substrings that mark an AI generator when found in an EXIF
|
||||
# ``Software`` / XMP ``CreatorTool`` value. Conservative on purpose: plain
|
||||
# editors like "Adobe Photoshop" or "GIMP" must NOT match (no AI token), so only
|
||||
|
||||
@@ -23,13 +23,25 @@ from typing import TYPE_CHECKING
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Iterator
|
||||
|
||||
from remove_ai_watermarks.metadata import C2PA_UUID
|
||||
from remove_ai_watermarks.metadata import (
|
||||
AIGC_MARKERS,
|
||||
C2PA_UUID,
|
||||
IPTC_AI_FIELD_MARKERS,
|
||||
IPTC_AI_MARKERS,
|
||||
)
|
||||
|
||||
# Top-level box types that carry C2PA payload. ``uuid`` boxes are checked
|
||||
# against ``C2PA_UUID`` before being stripped; ``jumb`` boxes are always
|
||||
# stripped (JPEG-XL uses them exclusively for JUMBF).
|
||||
# Top-level box types that may carry AI provenance. ``uuid`` boxes are checked
|
||||
# against ``C2PA_UUID`` / AI-label markers before being stripped; ``jumb`` boxes
|
||||
# are always stripped (JPEG-XL uses them exclusively for JUMBF).
|
||||
C2PA_BOX_TYPES: frozenset[bytes] = frozenset({b"uuid", b"jumb"})
|
||||
|
||||
# AI-label byte markers (TC260 AIGC, IPTC "Made with AI", IPTC 2025.1 AI fields)
|
||||
# whose presence inside an XMP ``uuid`` box means the box carries an AI label.
|
||||
# Matching the payload rather than a fixed XMP UUID avoids the XMP-box UUID
|
||||
# byte-order ambiguity and stays surgical: only AI-bearing XMP is dropped, plain
|
||||
# XMP (copyright, camera info) is kept.
|
||||
_AI_LABEL_MARKERS: tuple[bytes, ...] = AIGC_MARKERS + IPTC_AI_MARKERS + IPTC_AI_FIELD_MARKERS
|
||||
|
||||
|
||||
def _iter_top_level_boxes(data: bytes) -> Iterator[tuple[int, int, bytes, int]]:
|
||||
"""Yield ``(start, end, type, payload_offset)`` for each top-level box.
|
||||
@@ -67,12 +79,22 @@ def is_isobmff(data: bytes) -> bool:
|
||||
|
||||
|
||||
def strip_c2pa_boxes(data: bytes) -> tuple[bytes, int]:
|
||||
"""Return ``(cleaned_bytes, stripped_count)``.
|
||||
"""Return ``(cleaned_bytes, stripped_count)`` with AI-provenance boxes removed.
|
||||
|
||||
Walks top-level boxes; drops any ``uuid`` box whose UUID equals
|
||||
``C2PA_UUID`` and any ``jumb`` box (JPEG-XL JUMBF container). All other
|
||||
boxes are emitted verbatim. If the input is not ISOBMFF-shaped, returns
|
||||
it unchanged.
|
||||
Walks top-level boxes and drops:
|
||||
- any ``uuid`` box whose UUID equals ``C2PA_UUID`` (a C2PA manifest);
|
||||
- any ``uuid`` box whose payload carries an AI-label marker (an XMP packet
|
||||
with a TC260 / IPTC / IPTC-2025.1 AI field -- caught by content, not by the
|
||||
XMP UUID, so it works regardless of the UUID's byte order, and leaves plain
|
||||
non-AI XMP intact);
|
||||
- any ``jumb`` box (JPEG-XL JUMBF container).
|
||||
|
||||
All other boxes (incl. ``mdat`` / codestream) are emitted verbatim, so pixel
|
||||
and audio data is preserved bit-for-bit. Non-ISOBMFF input is returned
|
||||
unchanged. Despite the name this also covers MP4/MOV/M4A video and audio
|
||||
(all ISOBMFF). NOTE: EXIF/XMP stored as *items inside the ``meta`` box*
|
||||
(typical for AVIF/HEIF images) is not removed -- that needs meta-box surgery
|
||||
and is a documented limitation.
|
||||
"""
|
||||
if not is_isobmff(data):
|
||||
return data, 0
|
||||
@@ -80,14 +102,15 @@ def strip_c2pa_boxes(data: bytes) -> tuple[bytes, int]:
|
||||
out = bytearray()
|
||||
stripped = 0
|
||||
for start, end, box_type, payload_off in _iter_top_level_boxes(data):
|
||||
if box_type in C2PA_BOX_TYPES:
|
||||
if box_type == b"uuid":
|
||||
# uuid boxes carry the 16-byte UUID immediately after the type.
|
||||
if payload_off + 16 <= end and data[payload_off : payload_off + 16] == C2PA_UUID:
|
||||
stripped += 1
|
||||
continue
|
||||
else: # b"jumb"
|
||||
if box_type == b"uuid":
|
||||
# uuid boxes carry the 16-byte UUID immediately after the type.
|
||||
is_c2pa = payload_off + 16 <= end and data[payload_off : payload_off + 16] == C2PA_UUID
|
||||
has_ai_label = any(marker in data[payload_off:end] for marker in _AI_LABEL_MARKERS)
|
||||
if is_c2pa or has_ai_label:
|
||||
stripped += 1
|
||||
continue
|
||||
elif box_type == b"jumb":
|
||||
stripped += 1
|
||||
continue
|
||||
out.extend(data[start:end])
|
||||
return bytes(out), stripped
|
||||
|
||||
@@ -0,0 +1,72 @@
|
||||
"""Detect Adobe TrustMark invisible watermarks.
|
||||
|
||||
TrustMark (github.com/adobe/trustmark, MIT) is the open, keyless image watermark
|
||||
behind Adobe "Durable Content Credentials": when a C2PA manifest is stripped, a
|
||||
TrustMark soft binding can still re-link the asset to its manifest in a
|
||||
repository. Unlike SynthID it has a PUBLIC decoder with no secret key, so a
|
||||
TrustMark-stamped image can be identified locally. Adobe's shipping products use
|
||||
Variant P (the ``com.adobe.trustmark.P`` soft-binding ``alg``); this wrapper
|
||||
loads that model.
|
||||
|
||||
Optional dependency (extra: ``trustmark``); the model weights download on first
|
||||
use. ``detect_trustmark`` returns None when the package is absent. This detects
|
||||
provenance (Adobe Content Credentials), NOT AI generation as such -- TrustMark
|
||||
also marks human-authored content -- so callers should treat it as a watermark
|
||||
signal, not proof of AI origin.
|
||||
"""
|
||||
|
||||
# trustmark ships no type stubs; relax untyped-library diagnostics for this thin
|
||||
# wrapper module only.
|
||||
# pyright: reportMissingTypeStubs=false, reportUnknownMemberType=false, reportUnknownVariableType=false, reportUnknownArgumentType=false, reportMissingImports=false
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from pathlib import Path
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
# Adobe ships Variant P in production (com.adobe.trustmark.P).
|
||||
_MODEL_TYPE = "P"
|
||||
# Lazily constructed singleton -- model load + first-use download is expensive.
|
||||
_tm: Any = None
|
||||
|
||||
|
||||
def is_available() -> bool:
|
||||
"""True if the optional ``trustmark`` package is installed."""
|
||||
import importlib.util
|
||||
|
||||
return importlib.util.find_spec("trustmark") is not None
|
||||
|
||||
|
||||
def _decoder() -> Any:
|
||||
global _tm
|
||||
if _tm is None:
|
||||
from trustmark import TrustMark
|
||||
|
||||
_tm = TrustMark(verbose=False, model_type=_MODEL_TYPE)
|
||||
return _tm
|
||||
|
||||
|
||||
def detect_trustmark(image_path: Path) -> str | None:
|
||||
"""Return a TrustMark scheme note if a TrustMark watermark is decoded, else None.
|
||||
|
||||
Returns e.g. ``"Adobe TrustMark (variant P, schema 0)"`` when the decoder
|
||||
reports the watermark present, or None if it is absent, the optional
|
||||
``trustmark`` package is not installed, or the image cannot be read/decoded.
|
||||
"""
|
||||
if not is_available():
|
||||
return None
|
||||
try:
|
||||
from PIL import Image
|
||||
|
||||
with Image.open(image_path) as img:
|
||||
cover = img.convert("RGB")
|
||||
_wm_secret, wm_present, wm_schema = _decoder().decode(cover)
|
||||
except Exception as exc: # model download / decode failure / unreadable image
|
||||
log.debug("TrustMark decode failed for %s: %s", image_path, exc)
|
||||
return None
|
||||
return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})" if wm_present else None
|
||||
@@ -311,6 +311,32 @@ class TestIdentifyXaiSignature:
|
||||
assert any("xAI/Grok" in w for w in r.watermarks)
|
||||
|
||||
|
||||
class TestIdentifySoftBinding:
|
||||
"""A C2PA soft-binding alg names a forensic-watermark vendor in the inventory."""
|
||||
|
||||
def test_soft_binding_vendor_listed(self, tmp_path: Path):
|
||||
p = tmp_path / "sb.jpg"
|
||||
p.write_bytes(b"\xff\xd8\xff\xe1 c2pa jumb com.digimarc.validate.1 \xff\xd9")
|
||||
r = identify(p, check_visible=False, check_invisible=False)
|
||||
assert any("Digimarc" in w for w in r.watermarks)
|
||||
assert any(s.name == "soft_binding" for s in r.signals)
|
||||
|
||||
|
||||
class TestIdentifyIptcAi:
|
||||
"""IPTC 2025.1 AISystemUsed drives an AI verdict + platform attribution."""
|
||||
|
||||
def test_iptc_ai_system_attributed(self, tmp_path: Path):
|
||||
p = tmp_path / "iptc.jpg"
|
||||
p.write_bytes(
|
||||
b"\xff\xd8\xff\xe1<x:xmpmeta><Iptc4xmpExt:AISystemUsed>Google Gemini"
|
||||
b"</Iptc4xmpExt:AISystemUsed></x:xmpmeta>\xff\xd9"
|
||||
)
|
||||
r = identify(p, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is True
|
||||
assert r.platform is not None
|
||||
assert "Gemini" in r.platform
|
||||
|
||||
|
||||
# ── Open invisible watermark (SD/SDXL/FLUX) integration ─────────────
|
||||
|
||||
from remove_ai_watermarks.invisible_watermark import is_available as _wm_available # noqa: E402
|
||||
|
||||
@@ -14,6 +14,7 @@ from remove_ai_watermarks.metadata import (
|
||||
exif_generator,
|
||||
get_ai_metadata,
|
||||
has_ai_metadata,
|
||||
iptc_ai_system,
|
||||
remove_ai_metadata,
|
||||
synthid_source,
|
||||
xai_signature,
|
||||
@@ -567,3 +568,137 @@ class TestAIGCRealSample:
|
||||
def test_doubao_detected_as_ai(self):
|
||||
assert has_ai_metadata(SAMPLES_DIR / "doubao-1.png")
|
||||
assert "aigc_label" in get_ai_metadata(SAMPLES_DIR / "doubao-1.png")
|
||||
|
||||
|
||||
class TestSoftBinding:
|
||||
"""C2PA soft-binding alg identifier -> forensic-watermark vendor name."""
|
||||
|
||||
def test_vendors_in_recognizes_known_algs(self):
|
||||
from remove_ai_watermarks.noai.c2pa import soft_binding_vendors_in
|
||||
|
||||
assert soft_binding_vendors_in(b"...alg...com.adobe.trustmark.P...") == ["Adobe TrustMark"]
|
||||
assert soft_binding_vendors_in(b"com.digimarc.validate.1") == ["Digimarc"]
|
||||
assert soft_binding_vendors_in(b"ai.steg.api blah") == ["Steg.AI"]
|
||||
|
||||
def test_vendors_in_empty_when_absent(self):
|
||||
from remove_ai_watermarks.noai.c2pa import soft_binding_vendors_in
|
||||
|
||||
assert soft_binding_vendors_in(b"no soft binding here") == []
|
||||
|
||||
def test_get_ai_metadata_surfaces_soft_binding(self, tmp_path: Path):
|
||||
# Non-PNG binary-scan path: a manifest naming a soft-binding vendor.
|
||||
p = tmp_path / "fake.jpg"
|
||||
p.write_bytes(b"\xff\xd8\xff\xe1 c2pa jumb com.adobe.trustmark.P \xff\xd9")
|
||||
assert get_ai_metadata(p).get("soft_binding") == "Adobe TrustMark"
|
||||
|
||||
|
||||
class TestIptcAiFields:
|
||||
"""IPTC 2025.1 AI-disclosure XMP properties (Iptc4xmpExt:AISystemUsed etc.)."""
|
||||
|
||||
def test_detects_ai_system_used_element_form(self, tmp_path: Path):
|
||||
p = tmp_path / "iptc_ai.jpg"
|
||||
p.write_bytes(
|
||||
b"\xff\xd8\xff\xe1<x:xmpmeta><Iptc4xmpExt:AISystemUsed>ChatGPT DALL-E"
|
||||
b"</Iptc4xmpExt:AISystemUsed></x:xmpmeta>\xff\xd9"
|
||||
)
|
||||
assert has_ai_metadata(p) is True
|
||||
assert iptc_ai_system(p) == "ChatGPT DALL-E"
|
||||
assert "ChatGPT DALL-E" in get_ai_metadata(p)["ai_system"]
|
||||
|
||||
def test_attribute_serialization(self, tmp_path: Path):
|
||||
p = tmp_path / "attr.jpg"
|
||||
p.write_bytes(b'\xff\xd8\xff\xe1 Iptc4xmpExt:AISystemUsed="Google Gemini" \xff\xd9')
|
||||
assert iptc_ai_system(p) == "Google Gemini"
|
||||
|
||||
def test_present_without_value(self, tmp_path: Path):
|
||||
# A disclosure field with no extractable value still flags presence.
|
||||
p = tmp_path / "novalue.jpg"
|
||||
p.write_bytes(b"\xff\xd8\xff\xe1 Iptc4xmpExt:AIPromptWriterName \xff\xd9")
|
||||
assert iptc_ai_system(p) == "fields present"
|
||||
assert has_ai_metadata(p) is True
|
||||
|
||||
def test_clean_image_none(self, tmp_clean_png: Path):
|
||||
assert iptc_ai_system(tmp_clean_png) is None
|
||||
|
||||
|
||||
# Synthetic MP4 (ISOBMFF): ftyp + C2PA uuid box + mdat. Same box format as AVIF.
|
||||
_MP4_FTYP = b"\x00\x00\x00\x18ftypmp42\x00\x00\x00\x00mp42isom"
|
||||
_MP4_MDAT = b"\x00\x00\x00\x10mdat" + b"videodat"
|
||||
|
||||
|
||||
class TestVideoC2pa:
|
||||
"""C2PA in MP4 (ISOBMFF) -- detect + strip, reusing the image box walker."""
|
||||
|
||||
def test_detects_c2pa_in_mp4(self, tmp_path: Path):
|
||||
from remove_ai_watermarks.metadata import C2PA_UUID
|
||||
|
||||
uuid_box = b"\x00\x00\x00\x20uuid" + C2PA_UUID + b"manifest"
|
||||
p = tmp_path / "ai.mp4"
|
||||
p.write_bytes(_MP4_FTYP + uuid_box + _MP4_MDAT)
|
||||
assert has_ai_metadata(p) is True
|
||||
|
||||
def test_strips_c2pa_in_mp4(self, tmp_path: Path):
|
||||
from remove_ai_watermarks.metadata import C2PA_UUID
|
||||
|
||||
uuid_box = b"\x00\x00\x00\x20uuid" + C2PA_UUID + b"manifest"
|
||||
src = tmp_path / "in.mp4"
|
||||
src.write_bytes(_MP4_FTYP + uuid_box + _MP4_MDAT)
|
||||
out = tmp_path / "out.mp4"
|
||||
remove_ai_metadata(src, out)
|
||||
assert out.read_bytes() == _MP4_FTYP + _MP4_MDAT
|
||||
assert has_ai_metadata(out) is False
|
||||
|
||||
|
||||
class TestIsobmffMetadataRemoval:
|
||||
"""Container-level AI-provenance stripping across ISOBMFF image/video/audio."""
|
||||
|
||||
def test_strips_ai_xmp_uuid_box(self):
|
||||
# A uuid box carrying a TC260 AIGC label is dropped by content match,
|
||||
# regardless of the (non-C2PA) XMP UUID's byte order.
|
||||
from remove_ai_watermarks.noai.isobmff import strip_c2pa_boxes
|
||||
|
||||
xmp_uuid = bytes(range(16)) # arbitrary, not the C2PA UUID
|
||||
payload = b'<x:xmpmeta><TC260:AIGC>{"Label":"1"}</TC260:AIGC></x:xmpmeta>'
|
||||
box = (24 + len(payload)).to_bytes(4, "big") + b"uuid" + xmp_uuid + payload
|
||||
cleaned, stripped = strip_c2pa_boxes(_MP4_FTYP + box + _MP4_MDAT)
|
||||
assert stripped == 1
|
||||
assert cleaned == _MP4_FTYP + _MP4_MDAT
|
||||
|
||||
def test_keeps_plain_non_ai_xmp(self):
|
||||
# A uuid box with ordinary (non-AI) XMP must be preserved.
|
||||
from remove_ai_watermarks.noai.isobmff import strip_c2pa_boxes
|
||||
|
||||
xmp_uuid = bytes(range(16))
|
||||
payload = b"<x:xmpmeta><dc:rights>(c) me</dc:rights></x:xmpmeta>"
|
||||
box = (24 + len(payload)).to_bytes(4, "big") + b"uuid" + xmp_uuid + payload
|
||||
cleaned, stripped = strip_c2pa_boxes(_MP4_FTYP + box + _MP4_MDAT)
|
||||
assert stripped == 0
|
||||
assert cleaned == _MP4_FTYP + box + _MP4_MDAT
|
||||
|
||||
def test_m4a_c2pa_stripped(self, tmp_path: Path):
|
||||
from remove_ai_watermarks.metadata import C2PA_UUID
|
||||
|
||||
uuid_box = b"\x00\x00\x00\x20uuid" + C2PA_UUID + b"manifest"
|
||||
src = tmp_path / "voice.m4a"
|
||||
src.write_bytes(_MP4_FTYP + uuid_box + _MP4_MDAT)
|
||||
out = tmp_path / "clean.m4a"
|
||||
remove_ai_metadata(src, out)
|
||||
assert out.read_bytes() == _MP4_FTYP + _MP4_MDAT
|
||||
|
||||
def test_content_sniff_routes_unknown_suffix(self, tmp_path: Path):
|
||||
# An ISOBMFF file with a non-standard extension is still box-stripped.
|
||||
from remove_ai_watermarks.metadata import C2PA_UUID
|
||||
|
||||
uuid_box = b"\x00\x00\x00\x20uuid" + C2PA_UUID + b"manifest"
|
||||
src = tmp_path / "mystery.bin"
|
||||
src.write_bytes(_MP4_FTYP + uuid_box + _MP4_MDAT)
|
||||
out = tmp_path / "out.bin"
|
||||
remove_ai_metadata(src, out)
|
||||
assert out.read_bytes() == _MP4_FTYP + _MP4_MDAT
|
||||
|
||||
def test_unsupported_container_raises(self, tmp_path: Path):
|
||||
src = tmp_path / "audio.mp3"
|
||||
src.write_bytes(b"ID3\x04\x00\x00\x00\x00\x00\x00 fake mp3 frames")
|
||||
out = tmp_path / "out.mp3"
|
||||
with pytest.raises(ValueError, match="not supported"):
|
||||
remove_ai_metadata(src, out)
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
"""Tests for the optional Adobe TrustMark detector.
|
||||
|
||||
TrustMark is an optional dependency (extra ``trustmark``) that downloads model
|
||||
weights on first use, so the decode path is only exercised when it is installed
|
||||
(mirrors the imwatermark handling). The always-on test pins the graceful
|
||||
absent/error behaviour: detect must return None, never raise.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
import pytest
|
||||
|
||||
from remove_ai_watermarks.trustmark_detector import detect_trustmark, is_available
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def test_detect_never_raises(tmp_clean_png: Path):
|
||||
# Whether or not trustmark is installed, a clean image must yield None
|
||||
# (no watermark) without raising. When absent, the import guard returns None.
|
||||
assert detect_trustmark(tmp_clean_png) is None
|
||||
|
||||
|
||||
def test_unreadable_file_returns_none(tmp_path: Path):
|
||||
bad = tmp_path / "not_an_image.txt"
|
||||
bad.write_bytes(b"not an image")
|
||||
assert detect_trustmark(bad) is None
|
||||
|
||||
|
||||
@pytest.mark.skipif(not is_available(), reason="trustmark not installed")
|
||||
def test_clean_image_reports_no_watermark(tmp_clean_png: Path):
|
||||
# With the decoder present, an un-watermarked image must report absent.
|
||||
assert detect_trustmark(tmp_clean_png) is None
|
||||
Reference in New Issue
Block a user