diff --git a/docs/watermarking-landscape.md b/docs/watermarking-landscape.md index acdcb2e..bf56afb 100644 --- a/docs/watermarking-landscape.md +++ b/docs/watermarking-landscape.md @@ -6,7 +6,7 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are fillable). See `identify.py` for what we read. - **Locally detectable (open decoder, no key/API):** Stable Diffusion / SDXL / FLUX via `imwatermark` DWT-DCT (now covered by `invisible_watermark.py`). FLUX uses the same library (`black-forest-labs/flux2` `src/flux2/watermark.py`, 48-bit `0b001010101111111010000111100111001111010100101110`); SDXL is the diffusers `WATERMARK_MESSAGE` (`0b101100111110110010010000011110111011000110011110`). Caveat: fragile to re-encoding. -- **C2PA / IPTC (covered by the issuer/marker scan):** OpenAI, Google, Adobe Firefly, Microsoft (Designer + **Bing Image Creator** — collected 2026-05-24; Bing now runs Microsoft's own **MAI-Image** model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), and **Stability AI** (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to `C2PA_ISSUERS`). Still unsampled: Canva (its downloads are re-encoded design *exports* that strip C2PA, so a Canva "positive" is inconclusive — skipped), Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (our `mj-*` sample carried only the IPTC tag). +- **C2PA / IPTC (covered by the issuer/marker scan):** OpenAI, Google, Adobe Firefly, Microsoft (Designer + **Bing Image Creator** — collected 2026-05-24; Bing now runs Microsoft's own **MAI-Image** model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), **Stability AI** (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to `C2PA_ISSUERS`), and **Canva** (Magic Media signs C2PA as "Canva" + `trainedAlgorithmicMedia` with a generic `c2pa-rs` claim generator, no SynthID — issuer `b"Canva"` → "Canva (Magic Media)"; found on real production traffic 2026-06-19, which **disproved the earlier assumption** that Canva downloads are re-encoded exports that always strip C2PA). Still unsampled: Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (our `mj-*` sample carried only the IPTC tag). **Samsung Galaxy AI** (Generative Edit / Sketch to Image / Portrait Studio on Galaxy S23 FE / S24 / S25, One UI 7+) signs C2PA as "Samsung Galaxy" with the standard `trainedAlgorithmicMedia` source type AND a proprietary `genAIType` marker; verified on real signed files 2026-05-29 (the standard scan catches the source type; `genAIType` additionally catches a Galaxy S24 file that omits it). It ALSO burns a **visible** localized wordmark into the pixels — a sparkle + "generated with AI" string in the bottom-LEFT corner (issue #37; the Italian "✦ Contenuti generati dall'AI" variant is calibrated) — removed by `samsung_engine.py` / `visible --mark samsung` (reverse-alpha, see the engine bullet); detection feeds `identify` as the medium `visible_samsung` signal. The string is locale-specific, so each locale needs its own captured alpha template. @@ -14,7 +14,7 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are **Black Forest Labs (FLUX)** API output signs C2PA: `claim_generator_info "Black Forest Labs API"` + a `c2pa.ai_generated_content` assertion + `trainedAlgorithmicMedia` (issuer `b"Black Forest Labs"` added to `C2PA_ISSUERS`, platform "Black Forest Labs (FLUX)"). -**ByteDance Volcano Engine (Volcengine)** — the cloud behind Doubao / Jimeng — signs its AI image output with a cert from `certificate_center@volcengine.com` + `trainedAlgorithmicMedia` (issuer `b"volcengine"` → "ByteDance (Volcano Engine)", platform "ByteDance (Doubao / Jimeng / Volcano Engine)"); note this is the C2PA-signed surface, distinct from the XMP/PNG TC260 `AIGC` label Doubao also uses. All three verified on real signed files 2026-05-29. +**ByteDance Volcano Engine (Volcengine)** — the cloud behind Doubao / Jimeng — signs its AI image output with a cert from `certificate_center@volcengine.com` + `trainedAlgorithmicMedia` (issuer `b"volcengine"` → "ByteDance (Volcano Engine)", platform "ByteDance (Doubao / Jimeng / Volcano Engine)"); note this is the C2PA-signed surface, distinct from the XMP/PNG TC260 `AIGC` label Doubao also uses. All three verified on real signed files 2026-05-29. ByteDance's **international brand (BytePlus / Seedream / Seededit)** signs the SAME content as **"Byteplus Pte. Ltd."** — the bare `volcengine` needle missed it, so real BytePlus output was mis-attributed to "Adobe Firefly" (an incidental "Adobe XMP" toolkit string in the file's XMP, picked up by the fallback byte-scan once the clean manifest issuer matched nothing). Added issuer `b"Byteplus"` → org "BytePlus (ByteDance)" (platform resolves to the shared "ByteDance (Doubao / Jimeng / Volcano Engine)" label via the common `ByteDance` needle) so the clean manifest issuer attributes it directly; found on real production traffic 2026-06-19. - **EXIF/XMP generator tag (caught by `exif_generator`):** **Ideogram** writes EXIF `Make="Ideogram AI"` (collected 2026-05-24 — no C2PA, no SynthID, no imwatermark; the Make tag is the only signal). - **xAI / Grok — its own EXIF signature scheme, NOT C2PA (DETECTED by `metadata.xai_signature`, built 2026-05-26).** diff --git a/src/remove_ai_watermarks/noai/constants.py b/src/remove_ai_watermarks/noai/constants.py index f293d9c..764000f 100644 --- a/src/remove_ai_watermarks/noai/constants.py +++ b/src/remove_ai_watermarks/noai/constants.py @@ -122,6 +122,20 @@ C2PA_AI_VENDORS: tuple[C2paAiVendor, ...] = ( C2paAiVendor( b"volcengine", "ByteDance (Volcano Engine)", "ByteDance (Doubao / Jimeng / Volcano Engine)", "ByteDance" ), + # ByteDance's international brand (BytePlus / Seedream / Seededit) signs its + # cert as "Byteplus Pte. Ltd." -- the bare ``volcengine`` needle misses it, so + # real BytePlus AI output was mis-attributed (an incidental "Adobe XMP" string + # in the file's XMP made it read "Adobe Firefly"). Adding the issuer means the + # clean manifest issuer matches "BytePlus (ByteDance)" directly. The platform + # string mirrors the volcengine row: both share the "ByteDance" needle, so the + # earlier row's label wins anyway -- they normalize together for clash + # detection. Verified on real signed files in production traffic, 2026-06-19. + C2paAiVendor(b"Byteplus", "BytePlus (ByteDance)", "ByteDance (Doubao / Jimeng / Volcano Engine)", "ByteDance"), + # Canva Magic Media signs AI-generated images as "Canva" with a generic + # c2pa-rs claim generator + trainedAlgorithmicMedia; without this entry the + # source read AI but no platform was attributed. Verified on real signed files + # in production traffic, 2026-06-19. Canva does not use SynthID. + C2paAiVendor(b"Canva", "Canva", "Canva (Magic Media)", "Canva"), # Truepic is a C2PA signing authority, not an AI generator: no platform label, # never asserts is_ai (the verdict comes from the digital-source-type). C2paAiVendor(b"Truepic", "Truepic", None, None), diff --git a/tests/test_identify.py b/tests/test_identify.py index 8d796f2..c02a3a0 100644 --- a/tests/test_identify.py +++ b/tests/test_identify.py @@ -66,6 +66,19 @@ class TestAttributePlatform: assert platform assert "Stability AI" in platform + def test_canva(self): + platform = _attribute_platform(["Canva"]) + assert platform + assert "Canva" in platform + + def test_byteplus_attributes_to_bytedance(self): + # ByteDance's intl brand signs as "Byteplus Pte. Ltd."; the registry maps + # it to the ByteDance platform (was mis-read as Adobe via an incidental + # "Adobe XMP" file string before the entry existed). + platform = _attribute_platform(["BytePlus (ByteDance)"]) + assert platform + assert "ByteDance" in platform + def test_empty_is_none(self): assert _attribute_platform([]) is None