mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-06 02:53:54 +02:00
feat(metadata): detect China TC260 AIGC PNG chunk and HuggingFace hf-job-id
aigc_label now reads the TC260 label from a raw-JSON `AIGC` PNG tEXt chunk (as Doubao/ByteDance write it, with no namespaced XMP marker) in addition to the `<TC260:AIGC>` XMP block, via a shared _parse helper gated on a TC260 field so a generic AIGC key cannot false-positive. New huggingface_job() reads the hf-job-id PNG chunk; identify surfaces it as a medium-confidence hf_job signal (parallel to the visible sparkle, never overriding a hard metadata verdict). Both wired into has_ai_metadata/get_ai_metadata; the PNG save whitelist already strips them on removal. Found by auditing 646 corpus originals: 28 AIGC and 3 hf-job files the library previously reported as Unknown. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -25,7 +25,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
|||||||
- **Smart Face Protection** — automatic extraction and blending of human faces to prevent AI distortion
|
- **Smart Face Protection** — automatic extraction and blending of human faces to prevent AI distortion
|
||||||
- **Batch processing** — process entire directories
|
- **Batch processing** — process entire directories
|
||||||
- **Detection** — three-stage NCC watermark detection with confidence scoring
|
- **Detection** — three-stage NCC watermark detection with confidence scoring
|
||||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible sparkle, the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP or PNG chunk), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible sparkle, the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
@@ -48,13 +48,13 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
|||||||
| **xAI Grok (Aurora)** | — | — | ✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist` | Detected (`identify`); metadata strip |
|
| **xAI Grok (Aurora)** | — | — | ✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist` | Detected (`identify`); metadata strip |
|
||||||
| **Midjourney** | — | — | ✅ EXIF + XMP (prompt, model, seed) | Metadata strip |
|
| **Midjourney** | — | — | ✅ EXIF + XMP (prompt, model, seed) | Metadata strip |
|
||||||
| **Meta AI** | — | — | ✅ IPTC "Made with AI" (digitalSourceType) | Metadata strip (removes the label) |
|
| **Meta AI** | — | — | ✅ IPTC "Made with AI" (digitalSourceType) | Metadata strip (removes the label) |
|
||||||
| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 `<TC260:AIGC>` XMP label (China's mandatory AI labeling) | Locate + mask + inpaint (cv2, CPU) + metadata strip |
|
| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label — `<TC260:AIGC>` XMP **or** `AIGC` PNG chunk (China's mandatory AI labeling) | Locate + mask + inpaint (cv2, CPU) + metadata strip |
|
||||||
| **StableSignature** (Meta) | — | ✅ In-model watermark | — | Diffusion regeneration |
|
| **StableSignature** (Meta) | — | ✅ In-model watermark | — | Diffusion regeneration |
|
||||||
| **TreeRing** | — | ✅ Latent space watermark | — | Diffusion regeneration |
|
| **TreeRing** | — | ✅ Latent space watermark | — | Diffusion regeneration |
|
||||||
|
|
||||||
> Visible overlays are used by Google Gemini / Nano Banana (sparkle logo) and by Doubao / China AIGC generators (the mandated "...AI生成" corner text). Both are removed deterministically on CPU. Other services rely on invisible watermarks and/or metadata; our diffusion-based regeneration works against any invisible watermark in pixel or frequency domain. For a visible mark from any other source (any position, any colour), use the universal `erase --region` command.
|
> Visible overlays are used by Google Gemini / Nano Banana (sparkle logo) and by Doubao / China AIGC generators (the mandated "...AI生成" corner text). Both are removed deterministically on CPU. Other services rely on invisible watermarks and/or metadata; our diffusion-based regeneration works against any invisible watermark in pixel or frequency domain. For a visible mark from any other source (any position, any colour), use the universal `erase --region` command.
|
||||||
|
|
||||||
> **Detection:** `remove-ai-watermarks identify <image>` reports the origin platform and watermark inventory for all the signals above — C2PA issuer, the C2PA soft-binding forensic-watermark vendor (TrustMark / Digimarc / Imatag / ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, the China TC260 AIGC label, embedded generation params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible sparkle, and (with the `[detect]` / `[trustmark]` extras) the open SD/SDXL/FLUX and Adobe TrustMark invisible watermarks. SynthID and the proprietary soft-binding watermarks (Digimarc etc.) have no local decoder, so they are reported by metadata proxy / vendor name only.
|
> **Detection:** `remove-ai-watermarks identify <image>` reports the origin platform and watermark inventory for all the signals above — C2PA issuer, the C2PA soft-binding forensic-watermark vendor (TrustMark / Digimarc / Imatag / ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, the China TC260 AIGC label (XMP or PNG chunk), the HuggingFace `hf-job-id` job marker, embedded generation params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible sparkle, and (with the `[detect]` / `[trustmark]` extras) the open SD/SDXL/FLUX and Adobe TrustMark invisible watermarks. SynthID and the proprietary soft-binding watermarks (Digimarc etc.) have no local decoder, so they are reported by metadata proxy / vendor name only.
|
||||||
|
|
||||||
## How it works
|
## How it works
|
||||||
|
|
||||||
|
|||||||
@@ -31,6 +31,7 @@ from remove_ai_watermarks.metadata import (
|
|||||||
aigc_label,
|
aigc_label,
|
||||||
exif_generator,
|
exif_generator,
|
||||||
get_ai_metadata,
|
get_ai_metadata,
|
||||||
|
huggingface_job,
|
||||||
iptc_ai_system,
|
iptc_ai_system,
|
||||||
scan_head,
|
scan_head,
|
||||||
xai_signature,
|
xai_signature,
|
||||||
@@ -89,6 +90,11 @@ _INVISIBLE_WM_CAVEAT = (
|
|||||||
"The open invisible watermark is fragile: it does not survive JPEG re-encoding "
|
"The open invisible watermark is fragile: it does not survive JPEG re-encoding "
|
||||||
"or resizing, so it confirms origin only on a pristine (un-re-encoded) file."
|
"or resizing, so it confirms origin only on a pristine (un-re-encoded) file."
|
||||||
)
|
)
|
||||||
|
_HF_JOB_CAVEAT = (
|
||||||
|
"The hf-job-id tag marks a HuggingFace-hosted job (commonly diffusion "
|
||||||
|
"generation) but names neither the model nor the content type, so it is a "
|
||||||
|
"medium-confidence signal, not proof the pixels are AI-generated."
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@@ -423,9 +429,14 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
|||||||
ai_vendor_claims["iptc_ai_system"] = v
|
ai_vendor_claims["iptc_ai_system"] = v
|
||||||
|
|
||||||
# ── China TC260 AIGC label (Doubao and other China-served gens) ──
|
# ── China TC260 AIGC label (Doubao and other China-served gens) ──
|
||||||
aigc = any(m in head for m in AIGC_MARKERS)
|
# Fire on either the namespaced byte marker (``TC260:AIGC`` / the TC260 ns
|
||||||
|
# URL, present in XMP and as a laundering tell even when the JSON payload is
|
||||||
|
# truncated) OR the parsed label, which additionally catches the raw-JSON
|
||||||
|
# PNG ``AIGC`` tEXt chunk that carries no namespaced marker at all.
|
||||||
|
aigc_data = aigc_label(image_path)
|
||||||
|
aigc = aigc_data is not None or any(m in head for m in AIGC_MARKERS)
|
||||||
if aigc:
|
if aigc:
|
||||||
producer = (aigc_label(image_path) or {}).get("ContentProducer", "")
|
producer = (aigc_data or {}).get("ContentProducer", "")
|
||||||
signals.append(Signal("aigc", f"TC260 AIGC label{f' (producer {producer})' if producer else ''}", "high"))
|
signals.append(Signal("aigc", f"TC260 AIGC label{f' (producer {producer})' if producer else ''}", "high"))
|
||||||
watermarks.append("China AIGC label (TC260 standard)")
|
watermarks.append("China AIGC label (TC260 standard)")
|
||||||
if platform is None:
|
if platform is None:
|
||||||
@@ -461,6 +472,18 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
|||||||
platform = "xAI (Grok / Aurora)"
|
platform = "xAI (Grok / Aurora)"
|
||||||
ai_vendor_claims["xai"] = "xAI"
|
ai_vendor_claims["xai"] = "xAI"
|
||||||
|
|
||||||
|
# ── HuggingFace-hosted job marker (hf-job-id PNG text chunk) ─────
|
||||||
|
# Marks the hosting job, not a model -- medium confidence (commonly diffusion
|
||||||
|
# output). Like the visible sparkle, it lifts an otherwise-Unknown verdict to
|
||||||
|
# a tentative AI, but never overrides a high-confidence metadata signal.
|
||||||
|
hf_job = huggingface_job(image_path)
|
||||||
|
if hf_job:
|
||||||
|
signals.append(Signal("hf_job", f"HuggingFace job {hf_job}", "medium"))
|
||||||
|
watermarks.append("HuggingFace-hosted job (hf-job-id)")
|
||||||
|
caveats.append(_HF_JOB_CAVEAT)
|
||||||
|
if platform is None:
|
||||||
|
platform = "HuggingFace-hosted job (model not identified)"
|
||||||
|
|
||||||
# ── Open invisible watermark (SD / SDXL / FLUX, dwtDct) ──────────
|
# ── Open invisible watermark (SD / SDXL / FLUX, dwtDct) ──────────
|
||||||
# Public decoder, no key -- a definitive embedded signal on pristine files.
|
# Public decoder, no key -- a definitive embedded signal on pristine files.
|
||||||
if check_invisible and (scheme := _invisible_watermark(image_path)) is not None:
|
if check_invisible and (scheme := _invisible_watermark(image_path)) is not None:
|
||||||
@@ -503,11 +526,12 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
|||||||
platform = "Google Gemini family (visible sparkle detected)"
|
platform = "Google Gemini family (visible sparkle detected)"
|
||||||
|
|
||||||
visible_only = any(s.name == "visible_sparkle" for s in signals) and not ai_from_metadata
|
visible_only = any(s.name == "visible_sparkle" for s in signals) and not ai_from_metadata
|
||||||
|
hf_only = bool(hf_job) and not ai_from_metadata
|
||||||
|
|
||||||
if ai_from_metadata:
|
if ai_from_metadata:
|
||||||
is_ai: bool | None = True
|
is_ai: bool | None = True
|
||||||
confidence = "high"
|
confidence = "high"
|
||||||
elif visible_only:
|
elif visible_only or hf_only:
|
||||||
is_ai = True
|
is_ai = True
|
||||||
confidence = "medium"
|
confidence = "medium"
|
||||||
else:
|
else:
|
||||||
|
|||||||
@@ -108,6 +108,27 @@ AIGC_MARKERS: tuple[bytes, ...] = (
|
|||||||
b"TC260:AIGC",
|
b"TC260:AIGC",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# TC260 AIGC-label JSON fields (the standard's labeling object). Doubao writes
|
||||||
|
# the same object as a PNG ``tEXt`` chunk keyed ``AIGC`` (raw JSON, not XMP), so
|
||||||
|
# a JSON object carrying at least one of these is accepted as a valid TC260
|
||||||
|
# label even when the namespaced XMP element is absent.
|
||||||
|
_TC260_FIELDS: frozenset[str] = frozenset(
|
||||||
|
{
|
||||||
|
"Label",
|
||||||
|
"ContentProducer",
|
||||||
|
"ProduceID",
|
||||||
|
"ContentPropagator",
|
||||||
|
"PropagateID",
|
||||||
|
"ReservedCode1",
|
||||||
|
"ReservedCode2",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# HuggingFace-hosted GPU jobs (Jobs / Spaces) stamp generated PNGs with this
|
||||||
|
# ``tEXt`` chunk key holding the job UUID. It marks the hosting job, not a
|
||||||
|
# specific model -- a medium-confidence AI signal (commonly diffusion output).
|
||||||
|
_HF_JOB_KEY: str = "hf-job-id"
|
||||||
|
|
||||||
STANDARD_METADATA_KEYS: frozenset[str] = frozenset(
|
STANDARD_METADATA_KEYS: frozenset[str] = frozenset(
|
||||||
[
|
[
|
||||||
"Author",
|
"Author",
|
||||||
@@ -202,31 +223,90 @@ def has_ai_metadata(image_path: Path) -> bool:
|
|||||||
# IPTC 2025.1 AI-disclosure XMP properties (their presence flags AI content).
|
# IPTC 2025.1 AI-disclosure XMP properties (their presence flags AI content).
|
||||||
if any(marker in data for marker in IPTC_AI_FIELD_MARKERS):
|
if any(marker in data for marker in IPTC_AI_FIELD_MARKERS):
|
||||||
return True
|
return True
|
||||||
|
# China TC260 AIGC label as a PNG text chunk (the byte scan above catches
|
||||||
|
# only the XMP form; the raw-JSON tEXt chunk needs the PIL-based parse).
|
||||||
|
if aigc_label(image_path):
|
||||||
|
return True
|
||||||
|
# HuggingFace-hosted job marker (hf-job-id PNG text chunk).
|
||||||
|
if huggingface_job(image_path):
|
||||||
|
return True
|
||||||
# xAI / Grok: no C2PA/IPTC/XMP -- only the EXIF Signature + UUID-Artist pair.
|
# xAI / Grok: no C2PA/IPTC/XMP -- only the EXIF Signature + UUID-Artist pair.
|
||||||
return xai_signature(image_path)
|
return xai_signature(image_path)
|
||||||
|
|
||||||
|
|
||||||
def aigc_label(image_path: Path) -> dict[str, str] | None:
|
def aigc_label(image_path: Path) -> dict[str, str] | None:
|
||||||
"""Parse a China TC260 ``<TC260:AIGC>`` AI-labeling block, if present.
|
"""Parse a China TC260 AI-labeling block, if present.
|
||||||
|
|
||||||
|
Two serializations are recognized:
|
||||||
|
|
||||||
|
- a PNG ``tEXt``/``iTXt`` chunk keyed ``AIGC`` carrying the raw JSON object
|
||||||
|
(as written by Doubao / ByteDance), read via PIL; and
|
||||||
|
- an XMP ``<TC260:AIGC>{...}</TC260:AIGC>`` block (HTML-entity encoded text),
|
||||||
|
found by a container-agnostic raw-byte scan (PNG/JPEG/WebP alike).
|
||||||
|
|
||||||
Returns the decoded JSON (e.g. ``{"Label": "1", "ContentProducer": ...}``)
|
Returns the decoded JSON (e.g. ``{"Label": "1", "ContentProducer": ...}``)
|
||||||
or None. The block is XMP text (HTML-entity encoded), so it is found by a
|
or None. The PNG-chunk key ``AIGC`` is generic, so a JSON object there is
|
||||||
container-agnostic raw-byte scan and works for PNG/JPEG/WebP alike.
|
accepted only if it carries at least one known TC260 field (``_TC260_FIELDS``);
|
||||||
|
the namespaced XMP element is unambiguous, so any JSON object is accepted.
|
||||||
"""
|
"""
|
||||||
import html
|
import html
|
||||||
import json
|
import json
|
||||||
import re
|
from typing import cast
|
||||||
|
|
||||||
|
def _parse(text: str, *, require_tc260_field: bool) -> dict[str, str] | None:
|
||||||
|
try:
|
||||||
|
parsed = json.loads(text)
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
if not isinstance(parsed, dict):
|
||||||
|
return None
|
||||||
|
fields = {str(k): str(v) for k, v in cast("dict[object, object]", parsed).items()}
|
||||||
|
if require_tc260_field and not (_TC260_FIELDS & fields.keys()):
|
||||||
|
return None
|
||||||
|
return fields
|
||||||
|
|
||||||
|
# PNG tEXt chunk keyed "AIGC" with raw JSON (Doubao and other China gens).
|
||||||
|
# The key is generic, so require a TC260 field to avoid a false positive.
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(image_path) as img:
|
||||||
|
value = img.info.get("AIGC")
|
||||||
|
except Exception as exc:
|
||||||
|
logger.debug("PIL could not open %s for AIGC chunk scan: %s", image_path, exc)
|
||||||
|
value = None
|
||||||
|
if isinstance(value, str) and (result := _parse(value, require_tc260_field=True)):
|
||||||
|
return result
|
||||||
|
|
||||||
|
# XMP <TC260:AIGC>{...}</TC260:AIGC> block (namespaced element, unambiguous).
|
||||||
data = scan_head(image_path)
|
data = scan_head(image_path)
|
||||||
match = re.search(rb"<TC260:AIGC>(.*?)</TC260:AIGC>", data, re.DOTALL)
|
match = re.search(rb"<TC260:AIGC>(.*?)</TC260:AIGC>", data, re.DOTALL)
|
||||||
if not match:
|
if not match:
|
||||||
return None
|
return None
|
||||||
raw = html.unescape(match.group(1).decode("utf-8", "replace"))
|
return _parse(html.unescape(match.group(1).decode("utf-8", "replace")), require_tc260_field=False)
|
||||||
|
|
||||||
|
|
||||||
|
def huggingface_job(image_path: Path) -> str | None:
|
||||||
|
"""Return the HuggingFace job id if the image carries an ``hf-job-id`` PNG
|
||||||
|
text chunk, else None.
|
||||||
|
|
||||||
|
HuggingFace-hosted GPU jobs (Jobs / Spaces) stamp generated PNGs with an
|
||||||
|
``hf-job-id`` ``tEXt`` chunk holding the job's UUID. It identifies the
|
||||||
|
*hosting job*, not a specific model, and is most commonly seen on diffusion-
|
||||||
|
generation output -- a medium-confidence AI signal, not proof of AI pixels
|
||||||
|
on its own.
|
||||||
|
"""
|
||||||
try:
|
try:
|
||||||
parsed = json.loads(raw)
|
from PIL import Image
|
||||||
except ValueError:
|
|
||||||
|
with Image.open(image_path) as img:
|
||||||
|
value = img.info.get(_HF_JOB_KEY)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.debug("PIL could not open %s for hf-job-id scan: %s", image_path, exc)
|
||||||
|
return None
|
||||||
|
if isinstance(value, str) and value.strip():
|
||||||
|
return value.strip()
|
||||||
return None
|
return None
|
||||||
return {str(k): str(v) for k, v in parsed.items()} if isinstance(parsed, dict) else None
|
|
||||||
|
|
||||||
|
|
||||||
def iptc_ai_system(image_path: Path) -> str | None:
|
def iptc_ai_system(image_path: Path) -> str | None:
|
||||||
@@ -500,6 +580,10 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
|
|||||||
# IPTC 2025.1 AI-disclosure XMP fields (Iptc4xmpExt:AISystemUsed etc.).
|
# IPTC 2025.1 AI-disclosure XMP fields (Iptc4xmpExt:AISystemUsed etc.).
|
||||||
if system := iptc_ai_system(image_path):
|
if system := iptc_ai_system(image_path):
|
||||||
result.setdefault("ai_system", f"IPTC 2025.1 AI disclosure ({system})")
|
result.setdefault("ai_system", f"IPTC 2025.1 AI disclosure ({system})")
|
||||||
|
|
||||||
|
# HuggingFace-hosted job marker (hf-job-id PNG text chunk).
|
||||||
|
if job := huggingface_job(image_path):
|
||||||
|
result.setdefault("huggingface_job", f"HuggingFace-hosted job ({job})")
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -201,6 +201,78 @@ class TestIdentifyLocalParams:
|
|||||||
assert r.signals == []
|
assert r.signals == []
|
||||||
|
|
||||||
|
|
||||||
|
# ── China TC260 AIGC label as a PNG text chunk (Doubao) ─────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestIdentifyAigcPngChunk:
|
||||||
|
"""The raw-JSON ``AIGC`` PNG chunk (no namespaced XMP marker) is a high-
|
||||||
|
confidence AI verdict, same as the XMP form."""
|
||||||
|
|
||||||
|
def _aigc_chunk_png(self, tmp_path: Path) -> Path:
|
||||||
|
from PIL import Image
|
||||||
|
from PIL.PngImagePlugin import PngInfo
|
||||||
|
|
||||||
|
p = tmp_path / "doubao_chunk.png"
|
||||||
|
pnginfo = PngInfo()
|
||||||
|
pnginfo.add_text("AIGC", json.dumps({"Label": "1", "ContentProducer": "doubao"}))
|
||||||
|
Image.new("RGB", (32, 32)).save(p, pnginfo=pnginfo)
|
||||||
|
return p
|
||||||
|
|
||||||
|
def test_png_chunk_detected_high(self, tmp_path: Path):
|
||||||
|
r = identify(self._aigc_chunk_png(tmp_path), check_visible=False)
|
||||||
|
assert r.is_ai_generated is True
|
||||||
|
assert r.confidence == "high"
|
||||||
|
assert r.platform is not None
|
||||||
|
assert "AIGC" in r.platform
|
||||||
|
signal = next(s for s in r.signals if s.name == "aigc")
|
||||||
|
assert "doubao" in signal.detail
|
||||||
|
|
||||||
|
|
||||||
|
# ── HuggingFace-hosted job marker (medium confidence) ───────────────
|
||||||
|
|
||||||
|
|
||||||
|
class TestIdentifyHuggingFaceJob:
|
||||||
|
"""The hf-job-id chunk lifts an otherwise-Unknown verdict to a tentative
|
||||||
|
(medium) AI, never overriding a high-confidence metadata signal."""
|
||||||
|
|
||||||
|
def _hf_png(self, tmp_path: Path) -> Path:
|
||||||
|
from PIL import Image
|
||||||
|
from PIL.PngImagePlugin import PngInfo
|
||||||
|
|
||||||
|
p = tmp_path / "hfjob.png"
|
||||||
|
pnginfo = PngInfo()
|
||||||
|
pnginfo.add_text("hf-job-id", "ec8380a6-2091-423a-b835-209420f99ee1")
|
||||||
|
Image.new("RGB", (32, 32)).save(p, pnginfo=pnginfo)
|
||||||
|
return p
|
||||||
|
|
||||||
|
def test_hf_job_promotes_to_medium(self, tmp_path: Path):
|
||||||
|
r = identify(self._hf_png(tmp_path), check_visible=False)
|
||||||
|
assert r.is_ai_generated is True
|
||||||
|
assert r.confidence == "medium"
|
||||||
|
assert r.platform is not None
|
||||||
|
assert "HuggingFace" in r.platform
|
||||||
|
signal = next(s for s in r.signals if s.name == "hf_job")
|
||||||
|
assert signal.confidence == "medium"
|
||||||
|
|
||||||
|
def test_hf_job_caveat_present(self, tmp_path: Path):
|
||||||
|
r = identify(self._hf_png(tmp_path), check_visible=False)
|
||||||
|
assert any("hf-job-id" in c for c in r.caveats)
|
||||||
|
|
||||||
|
def test_metadata_keeps_high_even_with_hf_job(self, tmp_png_with_ai_metadata: Path):
|
||||||
|
# A high-confidence metadata verdict is not downgraded by an hf-job hit.
|
||||||
|
from PIL import Image
|
||||||
|
from PIL.PngImagePlugin import PngInfo
|
||||||
|
|
||||||
|
img = Image.open(tmp_png_with_ai_metadata)
|
||||||
|
pnginfo = PngInfo()
|
||||||
|
for k, v in img.text.items():
|
||||||
|
pnginfo.add_text(k, v)
|
||||||
|
pnginfo.add_text("hf-job-id", "ec8380a6-2091-423a-b835-209420f99ee1")
|
||||||
|
img.save(tmp_png_with_ai_metadata, pnginfo=pnginfo)
|
||||||
|
r = identify(tmp_png_with_ai_metadata, check_visible=False)
|
||||||
|
assert r.confidence == "high"
|
||||||
|
|
||||||
|
|
||||||
# ── Visible-sparkle fallback (mocked detector) ──────────────────────
|
# ── Visible-sparkle fallback (mocked detector) ──────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -554,6 +554,88 @@ class TestAIGCLabel:
|
|||||||
assert "aigc_label" in meta
|
assert "aigc_label" in meta
|
||||||
assert "TC260" in meta["aigc_label"]
|
assert "TC260" in meta["aigc_label"]
|
||||||
|
|
||||||
|
def _aigc_chunk_png(self, tmp_path: Path, producer: str = "doubao") -> Path:
|
||||||
|
"""Doubao writes the TC260 object as a PNG ``tEXt`` chunk keyed ``AIGC``
|
||||||
|
with raw JSON (no XMP, no namespaced marker)."""
|
||||||
|
import json
|
||||||
|
|
||||||
|
p = tmp_path / "doubao_chunk.png"
|
||||||
|
pnginfo = PngInfo()
|
||||||
|
pnginfo.add_text(
|
||||||
|
"AIGC",
|
||||||
|
json.dumps({"Label": "1", "ContentProducer": producer, "ProduceID": "abc123"}),
|
||||||
|
)
|
||||||
|
Image.new("RGB", (32, 32)).save(p, pnginfo=pnginfo)
|
||||||
|
return p
|
||||||
|
|
||||||
|
def test_parses_png_text_chunk_form(self, tmp_path: Path):
|
||||||
|
from remove_ai_watermarks.metadata import aigc_label
|
||||||
|
|
||||||
|
info = aigc_label(self._aigc_chunk_png(tmp_path))
|
||||||
|
assert info is not None
|
||||||
|
assert info["Label"] == "1"
|
||||||
|
assert info["ContentProducer"] == "doubao"
|
||||||
|
|
||||||
|
def test_png_chunk_without_tc260_field_ignored(self, tmp_path: Path):
|
||||||
|
"""A generic ``AIGC`` chunk with no TC260 field must not false-positive."""
|
||||||
|
import json
|
||||||
|
|
||||||
|
from remove_ai_watermarks.metadata import aigc_label
|
||||||
|
|
||||||
|
p = tmp_path / "unrelated.png"
|
||||||
|
pnginfo = PngInfo()
|
||||||
|
pnginfo.add_text("AIGC", json.dumps({"unrelated": "value"}))
|
||||||
|
Image.new("RGB", (32, 32)).save(p, pnginfo=pnginfo)
|
||||||
|
assert aigc_label(p) is None
|
||||||
|
|
||||||
|
def test_has_ai_metadata_detects_png_chunk_form(self, tmp_path: Path):
|
||||||
|
assert has_ai_metadata(self._aigc_chunk_png(tmp_path))
|
||||||
|
|
||||||
|
def test_remove_strips_png_chunk_form(self, tmp_path: Path):
|
||||||
|
from remove_ai_watermarks.metadata import aigc_label, remove_ai_metadata
|
||||||
|
|
||||||
|
out = tmp_path / "clean.png"
|
||||||
|
remove_ai_metadata(self._aigc_chunk_png(tmp_path), out)
|
||||||
|
assert aigc_label(out) is None
|
||||||
|
assert not has_ai_metadata(out)
|
||||||
|
|
||||||
|
|
||||||
|
class TestHuggingFaceJob:
|
||||||
|
"""HuggingFace-hosted job marker (``hf-job-id`` PNG text chunk)."""
|
||||||
|
|
||||||
|
def _hf_png(self, tmp_path: Path, job_id: str = "ec8380a6-2091-423a-b835-209420f99ee1") -> Path:
|
||||||
|
p = tmp_path / "hfjob.png"
|
||||||
|
pnginfo = PngInfo()
|
||||||
|
pnginfo.add_text("hf-job-id", job_id)
|
||||||
|
Image.new("RGB", (32, 32)).save(p, pnginfo=pnginfo)
|
||||||
|
return p
|
||||||
|
|
||||||
|
def test_returns_job_id(self, tmp_path: Path):
|
||||||
|
from remove_ai_watermarks.metadata import huggingface_job
|
||||||
|
|
||||||
|
assert huggingface_job(self._hf_png(tmp_path)) == "ec8380a6-2091-423a-b835-209420f99ee1"
|
||||||
|
|
||||||
|
def test_none_when_absent(self, tmp_clean_png):
|
||||||
|
from remove_ai_watermarks.metadata import huggingface_job
|
||||||
|
|
||||||
|
assert huggingface_job(tmp_clean_png) is None
|
||||||
|
|
||||||
|
def test_has_ai_metadata_detects_hf_job(self, tmp_path: Path):
|
||||||
|
assert has_ai_metadata(self._hf_png(tmp_path))
|
||||||
|
|
||||||
|
def test_get_ai_metadata_surfaces_hf_job(self, tmp_path: Path):
|
||||||
|
meta = get_ai_metadata(self._hf_png(tmp_path))
|
||||||
|
assert "huggingface_job" in meta
|
||||||
|
assert "ec8380a6" in meta["huggingface_job"]
|
||||||
|
|
||||||
|
def test_remove_strips_hf_job(self, tmp_path: Path):
|
||||||
|
from remove_ai_watermarks.metadata import huggingface_job, remove_ai_metadata
|
||||||
|
|
||||||
|
out = tmp_path / "clean.png"
|
||||||
|
remove_ai_metadata(self._hf_png(tmp_path), out)
|
||||||
|
assert huggingface_job(out) is None
|
||||||
|
assert not has_ai_metadata(out)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.skipif(not (SAMPLES_DIR / "doubao-1.png").exists(), reason="doubao sample not present")
|
@pytest.mark.skipif(not (SAMPLES_DIR / "doubao-1.png").exists(), reason="doubao sample not present")
|
||||||
class TestAIGCRealSample:
|
class TestAIGCRealSample:
|
||||||
|
|||||||
Reference in New Issue
Block a user