mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-05 07:57:50 +02:00
feat(identify): close 3 detector gaps found on the spaces corpus (06-05..06-11)
- AIGC: parse the bare ``AIGC{...}`` blob form (label glued to its JSON in a
JPEG APP segment near the JFIF header), and scan both raw-JSON forms in one
fall-through loop so a quoted ``"AIGC"`` later in an XMP packet no longer
shadows a real bare label earlier in the file (3 files read unknown before).
- Integrity clash rule 2: a camera device + an AI marker from the SAME C2PA
manifest (Google Pixel Magic Editor / Pixel Studio edit chain) is a legitimate
edit chain, not a contradiction. Fire only when the AI marker's source is
independent of the camera's manifest; pure cameras (Leica/Sony/Nikon) are
unaffected (2 Pixel files mis-flagged before).
- New c2pa_cloud_manifest detector: surface a C2PA 2.4 Durable Content
Credentials cloud-manifest reference (Adobe cai-manifests.adobe.com) as a
medium provenance signal when the embedded manifest is stripped. Provenance
only, never asserts is_ai (2 files read fully unknown before).
identify reuses its already-loaded scan head for the cloud check (no second
read). +7 tests; CLAUDE.md + README synced.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -30,7 +30,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
|
||||
- **Batch processing** — process entire directories
|
||||
- **Detection** — three-stage NCC watermark detection with confidence scoring
|
||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, EXIF, or JPEG segment), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the C2PA cloud-manifest reference (Adobe Durable Content Credentials, when the embedded manifest is stripped), the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -62,7 +62,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
|
||||
> Visible overlays are used by Google Gemini / Nano Banana (sparkle logo), by ByteDance's Doubao ("豆包AI生成" corner text) and Jimeng / Dreamina ("★ 即梦AI" wordmark), and by Samsung Galaxy AI ("✦ Contenuti generati dall'AI" strip, bottom-left, locale-specific). All are removed on CPU by reverse-alpha against a captured alpha map (Jimeng and Samsung add a thin residual inpaint over the glyph footprint, since their marks re-rasterize per image). Other services rely on invisible watermarks and/or metadata; our diffusion-based regeneration works against any invisible watermark in pixel or frequency domain. For a visible mark from any other source (any position, any colour), use the universal `erase --region` command.
|
||||
|
||||
> **Detection:** `remove-ai-watermarks identify <image>` reports the origin platform and watermark inventory for all the signals above — C2PA issuer, the C2PA soft-binding forensic-watermark vendor (TrustMark / Digimarc / Imatag / ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, embedded generation params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), and (with the `[detect]` / `[trustmark]` extras) the open SD/SDXL/FLUX and Adobe TrustMark invisible watermarks. SynthID and the proprietary soft-binding watermarks (Digimarc etc.) have no local decoder, so they are reported by metadata proxy / vendor name only.
|
||||
> **Detection:** `remove-ai-watermarks identify <image>` reports the origin platform and watermark inventory for all the signals above — C2PA issuer, the C2PA soft-binding forensic-watermark vendor (TrustMark / Digimarc / Imatag / ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, the China TC260 AIGC label (XMP, PNG chunk, EXIF, or JPEG segment), the HuggingFace `hf-job-id` job marker, embedded generation params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the C2PA cloud-manifest reference (Adobe Durable Content Credentials, when the embedded manifest is stripped), the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), and (with the `[detect]` / `[trustmark]` extras) the open SD/SDXL/FLUX and Adobe TrustMark invisible watermarks. SynthID and the proprietary soft-binding watermarks (Digimarc etc.) have no local decoder, so they are reported by metadata proxy / vendor name only.
|
||||
|
||||
## How it works
|
||||
|
||||
|
||||
@@ -30,6 +30,7 @@ from remove_ai_watermarks.metadata import (
|
||||
IPTC_AI_FIELD_MARKERS,
|
||||
IPTC_AI_MARKERS,
|
||||
aigc_label,
|
||||
c2pa_cloud_manifest_in,
|
||||
c2pa_marker_in,
|
||||
exif_generator,
|
||||
get_ai_metadata,
|
||||
@@ -96,6 +97,13 @@ _HF_JOB_CAVEAT = (
|
||||
"generation) but names neither the model nor the content type, so it is a "
|
||||
"medium-confidence signal, not proof the pixels are AI-generated."
|
||||
)
|
||||
_C2PA_CLOUD_CAVEAT = (
|
||||
"The embedded C2PA manifest is absent but an XMP provenance pointer to the "
|
||||
"vendor's cloud manifest store survives, so the Content Credentials remain "
|
||||
"recoverable server-side -- stripping the file no longer removes the provenance. "
|
||||
"It marks Content Credentials, not AI origin: the cloud manifest may describe a "
|
||||
"human edit, and reading it needs a network fetch this tool does not make."
|
||||
)
|
||||
_SAMSUNG_GENAI_CAVEAT = (
|
||||
"Samsung's genAIType marker shows a Galaxy AI editing tool (Generative Edit, "
|
||||
"Sketch to Image, ...) touched the image; it is an undocumented proprietary "
|
||||
@@ -285,7 +293,12 @@ def _vendor_of(text: str | None) -> str | None:
|
||||
# chain like Adobe over a Gemini original) legitimately names several vendors in
|
||||
# one valid chain and must not read as spoofing. Families not listed here are each
|
||||
# their own independent source (EXIF/XMP generator, IPTC AISystemUsed, AIGC, ...).
|
||||
_CLASH_SOURCE: dict[str, str] = {"c2pa": "c2pa_manifest", "synthid": "c2pa_manifest"}
|
||||
# The single C2PA-manifest source shared by the issuer attribution and the SynthID
|
||||
# proxy (both inferred from the same embedded manifest). Rule 2 keys off it too:
|
||||
# the camera device label is read from this manifest, so an AI marker is a clash
|
||||
# only when its source differs from this (i.e. it is genuinely independent).
|
||||
_C2PA_MANIFEST_SOURCE = "c2pa_manifest"
|
||||
_CLASH_SOURCE: dict[str, str] = {"c2pa": _C2PA_MANIFEST_SOURCE, "synthid": _C2PA_MANIFEST_SOURCE}
|
||||
|
||||
|
||||
def _integrity_clashes(
|
||||
@@ -326,7 +339,16 @@ def _integrity_clashes(
|
||||
+ " -- one provenance set was likely spoofed, transplanted, or laundered."
|
||||
)
|
||||
|
||||
if camera_label and camera_has_ai_marker:
|
||||
# Rule 2: a camera-capture C2PA device next to an AI-generation marker. Only
|
||||
# an AI marker from a source INDEPENDENT of the camera's own C2PA manifest is
|
||||
# a contradiction. A device that both captures and runs on-device generative
|
||||
# AI (Google Pixel Magic Editor / Pixel Studio) records the capture and the
|
||||
# AI edit in ONE manifest, so the AI vendor is named only from that same
|
||||
# manifest (c2pa issuer + synthid proxy) -- a legitimate edit chain, not a
|
||||
# spoof. An EXIF/XMP generator, IPTC field, TC260 AIGC label, or second
|
||||
# manifest naming AI on a camera capture is the real laundering tell.
|
||||
independent_ai_marker = any(grp != _C2PA_MANIFEST_SOURCE for grp in source.values())
|
||||
if camera_label and camera_has_ai_marker and independent_ai_marker:
|
||||
vendors = ", ".join(sorted(set(ai_vendors.values()))) or "present"
|
||||
clashes.append(
|
||||
f"Camera-capture C2PA credentials ({camera_label}) coexist with AI-generation markers "
|
||||
@@ -483,6 +505,21 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
if c2pa_is_ai and (v := (_vendor_of(_attribute_platform(issuers, is_ai=True)) or _vendor_of(generator))):
|
||||
ai_vendor_claims["c2pa"] = v
|
||||
|
||||
# ── C2PA cloud-manifest reference (Durable Content Credentials) ─
|
||||
# An XMP dcterms:provenance pointer to a vendor manifest store survives even
|
||||
# when the embedded manifest is stripped, so the credentials stay recoverable
|
||||
# server-side (C2PA 2.4). Provenance only -- it does NOT assert AI (the cloud
|
||||
# manifest may describe a human edit), so it is excluded from ai_from_metadata
|
||||
# and the clash vendors. Skip when an embedded manifest already attributed it.
|
||||
if not has_c2pa and (cloud_vendor := c2pa_cloud_manifest_in(head)):
|
||||
signals.append(Signal("c2pa_cloud", f"cloud manifest store: {cloud_vendor}", "medium"))
|
||||
watermarks.append(
|
||||
f"C2PA Durable Content Credentials (cloud manifest at {cloud_vendor}; embedded manifest absent)"
|
||||
)
|
||||
caveats.append(_C2PA_CLOUD_CAVEAT)
|
||||
if platform is None:
|
||||
platform = f"C2PA signer: {cloud_vendor} (cloud manifest)"
|
||||
|
||||
# ── SynthID metadata proxy ──────────────────────────────────────
|
||||
# get_ai_metadata already sets synthid_watermark for both PNG (caBX parser)
|
||||
# and non-PNG (its own synthid_source fallback), so no extra scan is needed.
|
||||
|
||||
@@ -343,13 +343,16 @@ def aigc_label(image_path: Path) -> dict[str, str] | None:
|
||||
found by a container-agnostic raw-byte scan (PNG/JPEG/WebP alike); and
|
||||
- a raw-JSON ``{"AIGC":{...}}`` block with no namespace, as embedded in JPEG
|
||||
EXIF (UserComment) by some China-served generators, brace-matched from the
|
||||
scan head.
|
||||
scan head; and
|
||||
- a bare ``AIGC{...}`` blob (the label glued straight to its JSON, no
|
||||
``"AIGC":`` key wrapper) embedded in a JPEG APP segment near the JFIF
|
||||
header by some China-served generators.
|
||||
|
||||
Returns the decoded JSON (e.g. ``{"Label": "1", "ContentProducer": ...}``)
|
||||
or None. The generic forms (the PNG-chunk key ``AIGC`` and the bare
|
||||
``{"AIGC":...}`` object) are accepted only if they carry at least one known
|
||||
TC260 field (``_TC260_FIELDS``); the namespaced XMP element is unambiguous,
|
||||
so any JSON object is accepted.
|
||||
or None. The generic forms (the PNG-chunk key ``AIGC``, the bare
|
||||
``{"AIGC":...}`` object, and the bare ``AIGC{...}`` blob) are accepted only
|
||||
if they carry at least one known TC260 field (``_TC260_FIELDS``); the
|
||||
namespaced XMP element is unambiguous, so any JSON object is accepted.
|
||||
"""
|
||||
import html
|
||||
import json
|
||||
@@ -393,24 +396,76 @@ def aigc_label(image_path: Path) -> dict[str, str] | None:
|
||||
body = match.group(1) if match.group(1) is not None else match.group(2)
|
||||
return _parse(html.unescape(body.decode("utf-8", "replace")), require_tc260_field=False)
|
||||
|
||||
# Raw-JSON {"AIGC":{...}} block (no namespace), as written into JPEG EXIF
|
||||
# (UserComment) by some China-served generators -- the PNG-chunk and XMP
|
||||
# paths above both miss it. The bytes pre-check keeps the common (no-AIGC)
|
||||
# path off the full-buffer decode; raw_decode then brace-matches the inner
|
||||
# object (respecting nested braces / quoted strings) and `_parse` applies the
|
||||
# same dict coercion + TC260-field gate as the generic PNG-chunk path.
|
||||
if b'"AIGC"' in data:
|
||||
text = data.decode("latin-1")
|
||||
brace = text.find("{", text.find('"AIGC"') + len('"AIGC"'))
|
||||
if brace != -1:
|
||||
try:
|
||||
_, end = json.JSONDecoder().raw_decode(text, brace)
|
||||
except ValueError:
|
||||
return None
|
||||
return _parse(text[brace:end], require_tc260_field=True)
|
||||
# Generic raw-JSON forms the PNG-chunk and XMP paths above both miss, each
|
||||
# gated on a TC260 field: the ``"AIGC":{...}`` key wrapper (as written into
|
||||
# JPEG EXIF UserComment) and the bare ``AIGC{...}`` blob (the label glued
|
||||
# straight to its JSON, no key wrapper, in a JPEG APP segment near the JFIF
|
||||
# header). `raw_decode` brace-matches the inner object (respecting nested
|
||||
# braces / quoted strings); `_parse` applies the same dict coercion + TC260
|
||||
# gate as the PNG-chunk path. A non-matching hit (no TC260 field, or an
|
||||
# undecodable brace) must FALL THROUGH to the next form, never short-circuit:
|
||||
# a quoted ``"AIGC"`` can appear later in an XMP packet while the real label
|
||||
# is a bare ``AIGC{...}`` blob earlier in the file, so an unconditional return
|
||||
# on the quoted form would shadow the bare form.
|
||||
text = data.decode("latin-1")
|
||||
for needle in ('"AIGC"', "AIGC{"):
|
||||
start = text.find(needle)
|
||||
if start == -1:
|
||||
continue
|
||||
# First brace at/after the needle: the object brace for ``"AIGC":{`` and
|
||||
# the glued brace (at start+4) for the bare ``AIGC{`` -- one search covers both.
|
||||
brace = text.find("{", start)
|
||||
if brace == -1:
|
||||
continue
|
||||
try:
|
||||
_, end = json.JSONDecoder().raw_decode(text, brace)
|
||||
except ValueError:
|
||||
continue
|
||||
if result := _parse(text[brace:end], require_tc260_field=True):
|
||||
return result
|
||||
return None
|
||||
|
||||
|
||||
# C2PA "Durable Content Credentials" manifest repositories (C2PA 2.4). When the
|
||||
# embedded manifest is stripped, an XMP ``dcterms:provenance`` URL can still point
|
||||
# at the vendor's cloud manifest store, from which the credentials are recoverable
|
||||
# server-side via the file's soft binding. Host -> vendor label. Verified on real
|
||||
# files: Adobe's Content Authenticity cloud store.
|
||||
_C2PA_MANIFEST_REPOSITORIES: tuple[tuple[bytes, str], ...] = (
|
||||
(b"cai-manifests.adobe.com", "Adobe Content Authenticity"),
|
||||
)
|
||||
|
||||
|
||||
def c2pa_cloud_manifest_in(data: bytes) -> str | None:
|
||||
"""Return a C2PA cloud-manifest vendor label if ``data`` carries an XMP
|
||||
``dcterms:provenance`` pointer to a known manifest repository, else None.
|
||||
|
||||
The shared byte-scan (mirroring ``soft_binding_vendors_in``), so a caller that
|
||||
already holds the scan head (``identify``) reuses it instead of re-reading.
|
||||
"""
|
||||
if b"dcterms:provenance" not in data:
|
||||
return None
|
||||
for host, vendor in _C2PA_MANIFEST_REPOSITORIES:
|
||||
if host in data:
|
||||
return vendor
|
||||
return None
|
||||
|
||||
|
||||
def c2pa_cloud_manifest(image_path: Path) -> str | None:
|
||||
"""Return a C2PA cloud-manifest vendor label if the file carries only an XMP
|
||||
``dcterms:provenance`` pointer to a manifest repository (C2PA 2.4 Durable
|
||||
Content Credentials), else None.
|
||||
|
||||
This fires on the laundering case where the *embedded* manifest was stripped
|
||||
but the XMP cloud reference survives, so the Content Credentials remain
|
||||
recoverable server-side. It is provenance, NOT an AI assertion: the cloud
|
||||
manifest can describe a human edit as easily as an AI generation, and reading
|
||||
its contents needs a network fetch we do not do. ``identify`` surfaces it as a
|
||||
provenance signal without setting ``is_ai_generated``.
|
||||
"""
|
||||
return c2pa_cloud_manifest_in(scan_head(image_path, _QUICK_SCAN_BYTES))
|
||||
|
||||
|
||||
def huggingface_job(image_path: Path) -> str | None:
|
||||
"""Return the HuggingFace job id if the image carries an ``hf-job-id`` PNG
|
||||
text chunk, else None.
|
||||
|
||||
@@ -772,6 +772,31 @@ class TestIntegrityClashesHelper:
|
||||
# must NOT raise a clash.
|
||||
assert _integrity_clashes({}, "Leica (camera, C2PA capture)", camera_has_ai_marker=False) == []
|
||||
|
||||
def test_pixel_generative_edit_same_manifest_no_clash(self):
|
||||
# A Google Pixel that BOTH captures and runs on-device generative AI
|
||||
# (Magic Editor / Pixel Studio) records the capture and the AI edit in
|
||||
# ONE C2PA manifest -- the AI vendor is named only from that same
|
||||
# manifest (c2pa / synthid), independent of nothing. That is a legitimate
|
||||
# edit chain, NOT a camera-vs-AI contradiction, so rule 2 must stay quiet.
|
||||
assert (
|
||||
_integrity_clashes(
|
||||
{"c2pa": "Google", "synthid": "Google"},
|
||||
"Google Pixel (camera, C2PA capture)",
|
||||
camera_has_ai_marker=True,
|
||||
)
|
||||
== []
|
||||
)
|
||||
|
||||
def test_camera_plus_independent_ai_marker_still_clashes(self):
|
||||
# But a camera capture next to an AI marker from a genuinely INDEPENDENT
|
||||
# source (EXIF/XMP generator, TC260 AIGC, ...) is still a laundering tell.
|
||||
clashes = _integrity_clashes(
|
||||
{"c2pa": "Google", "aigc": "China AIGC (TC260)"},
|
||||
"Google Pixel (camera, C2PA capture)",
|
||||
camera_has_ai_marker=True,
|
||||
)
|
||||
assert any("Camera-capture" in c for c in clashes)
|
||||
|
||||
|
||||
class TestIntegrityClashEndToEnd:
|
||||
def _c2pa_jpeg(self, tmp_path: Path, blob: bytes) -> Path:
|
||||
@@ -806,6 +831,22 @@ class TestIntegrityClashEndToEnd:
|
||||
assert r.platform == "Google Pixel (camera, C2PA capture)"
|
||||
assert any("Camera-capture C2PA credentials" in c and "AI-generation markers" in c for c in r.integrity_clashes)
|
||||
|
||||
def test_pixel_generative_edit_no_clash(self, tmp_path: Path):
|
||||
# A real Google Pixel generative edit (Magic Editor / Pixel Studio) signs
|
||||
# ONE manifest carrying both the Pixel Camera capture and a Google
|
||||
# Generative AI edit (trainedAlgorithmicMedia + "Applied imperceptible
|
||||
# SynthID watermark"). The AI marker lives in the SAME manifest as the
|
||||
# device, so it is an edit chain, not a camera-vs-AI contradiction.
|
||||
path = self._c2pa_jpeg(
|
||||
tmp_path,
|
||||
b"Pixel Camera ... Created by Pixel Camera ... computationalCapture ... "
|
||||
b"Created by Google Generative AI ... trainedAlgorithmicMedia ... "
|
||||
b"Applied imperceptible SynthID watermark",
|
||||
)
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is True
|
||||
assert r.integrity_clashes == []
|
||||
|
||||
def test_clash_serializes_to_json(self, tmp_path: Path):
|
||||
path = self._c2pa_jpeg(tmp_path, b"OpenAI ... trainedAlgorithmicMedia ... TC260:AIGC label")
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
|
||||
@@ -790,6 +790,42 @@ class TestAIGCLabel:
|
||||
def test_has_ai_metadata_detects_raw_json_exif_form(self, tmp_path: Path):
|
||||
assert has_ai_metadata(self._aigc_exif_jpeg(tmp_path))
|
||||
|
||||
def _aigc_bare_jpeg(self, tmp_path: Path, producer: str = "00119144030008867405X210002") -> Path:
|
||||
"""Some China-served generators glue the TC260 label straight to its JSON
|
||||
as a bare ``AIGC{...}`` blob inside a JPEG APP segment (no ``"AIGC":``
|
||||
key wrapper, no PNG chunk, no namespaced XMP) -- seen near the JFIF
|
||||
header on real 2026-06 downloads."""
|
||||
p = tmp_path / "aigc_bare.jpg"
|
||||
Image.new("RGB", (32, 32)).save(p)
|
||||
raw = p.read_bytes()
|
||||
blob = b'AIGC{"Label":"1","ContentProducer":"' + producer.encode() + b'","ProduceID":"8F995586"}'
|
||||
segment = b"\xff\xe9" + (len(blob) + 2).to_bytes(2, "big") + blob # APP9
|
||||
p.write_bytes(raw[:2] + segment + raw[2:]) # splice after SOI
|
||||
return p
|
||||
|
||||
def test_parses_bare_aigc_jpeg_segment_form(self, tmp_path: Path):
|
||||
from remove_ai_watermarks.metadata import aigc_label
|
||||
|
||||
info = aigc_label(self._aigc_bare_jpeg(tmp_path))
|
||||
assert info is not None
|
||||
assert info["Label"] == "1"
|
||||
assert info["ContentProducer"] == "00119144030008867405X210002"
|
||||
|
||||
def test_has_ai_metadata_detects_bare_aigc_jpeg_form(self, tmp_path: Path):
|
||||
assert has_ai_metadata(self._aigc_bare_jpeg(tmp_path))
|
||||
|
||||
def test_bare_aigc_without_tc260_field_ignored(self, tmp_path: Path):
|
||||
"""A bare ``AIGC{...}`` blob with no TC260 field must not false-positive."""
|
||||
from remove_ai_watermarks.metadata import aigc_label
|
||||
|
||||
p = tmp_path / "bare_unrelated.jpg"
|
||||
Image.new("RGB", (32, 32)).save(p)
|
||||
raw = p.read_bytes()
|
||||
blob = b'AIGC{"unrelated":"value"}'
|
||||
segment = b"\xff\xe9" + (len(blob) + 2).to_bytes(2, "big") + blob
|
||||
p.write_bytes(raw[:2] + segment + raw[2:])
|
||||
assert aigc_label(p) is None
|
||||
|
||||
def test_raw_json_without_tc260_field_ignored(self, tmp_path: Path):
|
||||
"""A bare ``{"AIGC":{...}}`` object with no TC260 field must not fire."""
|
||||
import json
|
||||
@@ -1185,3 +1221,49 @@ class TestFfmpegMetadataStrip:
|
||||
remove_ai_metadata(src, out)
|
||||
assert out.exists()
|
||||
assert b"Suno AI generated" not in out.read_bytes() # tag stripped, audio kept
|
||||
|
||||
|
||||
class TestC2paCloudManifest:
|
||||
"""C2PA 2.4 Durable Content Credentials: an XMP dcterms:provenance pointer to
|
||||
a vendor cloud manifest store survives when the embedded manifest is stripped."""
|
||||
|
||||
def _cloud_png(self, tmp_path: Path, host: bytes = b"cai-manifests.adobe.com") -> Path:
|
||||
xmp = (
|
||||
b'<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/">'
|
||||
b'<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">'
|
||||
b'<rdf:Description rdf:about="" xmlns:dcterms="http://purl.org/dc/terms/" '
|
||||
b'dcterms:provenance="https://' + host + b'/manifests/urn-c2pa-abc123"> </rdf:Description>'
|
||||
b'</rdf:RDF></x:xmpmeta><?xpacket end="w"?>'
|
||||
)
|
||||
p = tmp_path / "cloud.png"
|
||||
img = Image.new("RGB", (16, 16))
|
||||
meta = PngInfo()
|
||||
meta.add_itxt("XML:com.adobe.xmp", xmp.decode("latin-1"))
|
||||
img.save(p, pnginfo=meta)
|
||||
return p
|
||||
|
||||
def test_detects_adobe_cloud_manifest(self, tmp_path: Path):
|
||||
from remove_ai_watermarks.metadata import c2pa_cloud_manifest
|
||||
|
||||
assert c2pa_cloud_manifest(self._cloud_png(tmp_path)) == "Adobe Content Authenticity"
|
||||
|
||||
def test_no_provenance_pointer_is_none(self, tmp_clean_png: Path):
|
||||
from remove_ai_watermarks.metadata import c2pa_cloud_manifest
|
||||
|
||||
assert c2pa_cloud_manifest(tmp_clean_png) is None
|
||||
|
||||
def test_unknown_host_is_none(self, tmp_path: Path):
|
||||
from remove_ai_watermarks.metadata import c2pa_cloud_manifest
|
||||
|
||||
# A dcterms:provenance pointer to an unrecognized host is not attributed.
|
||||
assert c2pa_cloud_manifest(self._cloud_png(tmp_path, host=b"manifests.example.com")) is None
|
||||
|
||||
def test_cloud_manifest_does_not_assert_ai(self, tmp_path: Path):
|
||||
# Provenance only -- a cloud manifest can describe a human edit, so the
|
||||
# verdict must stay 'unknown', not 'AI-generated'.
|
||||
from remove_ai_watermarks.identify import identify
|
||||
|
||||
r = identify(self._cloud_png(tmp_path), check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is None
|
||||
assert any("Durable Content Credentials" in w for w in r.watermarks)
|
||||
assert any(s.name == "c2pa_cloud" for s in r.signals)
|
||||
|
||||
Reference in New Issue
Block a user