From 58bdf51c597c26906ecca720ebab219b057f39c6 Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Fri, 29 May 2026 19:49:09 -0700 Subject: [PATCH] Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(trustmark): gate detection on re-encode durability to kill false positives TrustMark's wm_present flag is a BCH validity check that spuriously validates on a content-correlated fraction of un-watermarked images (AI textures trip it more than camera photos). On a 1343-image set all 20 raw detections were false, several on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with random-bytes secrets. A genuine TrustMark is a durable soft binding that survives re-encoding, so detect_trustmark now re-decodes after a mild JPEG round-trip and requires the same schema both times. Every observed false positive collapsed under this gate; the second decode runs only on the rare hit. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP Detection extensions verified on real signed files (2026-05-29): - Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type) does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai: the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium- confidence AI-editing signal (samsung_only branch). - Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng) added as C2PA issuers + issuer->platform mappings. - fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify + metadata. Verified: all 535 real C2PA files carry jumb. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(doubao): gate detection on text structure to cut ~95% of false positives (#23) Coverage alone over-fired: any textured bottom-right corner cleared the threshold, so the detector false-positived on ~28% of arbitrary images. The real '豆包AI生成' mark is six glyphs in one row, so detect now also requires the text-structure signature (_glyph_structure): many connected components, no single dominant blob, concentration in a thin horizontal band. False positives dropped 343 -> 17 across the corpus while keeping real-mark recall and the doubao-1.png sample. Also accept a no-op force kwarg for remover-interface symmetry. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(samsung): add Samsung Galaxy AI visible-badge remover New samsung_engine.py removes the bottom-left sparkle + localized 'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates on a band + left-anchor signature (the Doubao CJK-component gate does not transfer: Latin badge letters connect into few blobs). Explicit-only -- tuned on few real badges with a ~4% FP floor, so it is not used in auto. Synthetic byte-blob fixtures (real badges are user content, not shipped). Co-Authored-By: Claude Opus 4.8 (1M context) * feat(visible): unified known-watermark registry + LaMa inpaint backend watermark_registry.py is a single catalog of known visible marks, each tying {usual location, in_auto flag, recovery strategy, detect adapter, remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone. Cross-engine confidences are not comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is loose and weakly fired ~0.36 on Doubao text, hijacking auto). --backend auto|cv2|lama chooses background reconstruction for the mask-based marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges behind). cv2 stays the zero-dependency fallback. Co-Authored-By: Claude Opus 4.8 (1M context) * docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate Co-Authored-By: Claude Opus 4.8 (1M context) * feat(doubao): exact reverse-alpha removal from captured alpha map The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so given its alpha map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a) -- no inpaint hallucination. The alpha map + logo colour were solved from real black+gray Doubao captures on a controlled background: on black captured = a*logo, and the black/gray pair solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white); the white capture cross-validates (mark vanishes to a flat fill). Bundled as assets/doubao_alpha.png + geometry constants. remove_watermark_reverse_alpha applies it scaled to image width; exact at the captured width, so the registry routes doubao through it only when reverse_alpha_available (width within the calibrated band) and the mark is detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions to widen exact coverage. Co-Authored-By: Claude Opus 4.8 (1M context) * refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection Per the principle that we only remove/detect what we can do exactly, the visible-mark path is now reverse-alpha only: - Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4) -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP 7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate. - Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove, _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao outside the alpha resolution band is skipped, never inpainted. - Removed samsung_engine.py + tests + --mark samsung (no alpha map captured; Samsung C2PA/genAIType metadata detection in identify is unaffected). - The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region inpainting stays a user-directed tool, separate from the known-mark registry. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution A pure width-scale of the captured alpha map is only sub-pixel-accurate at the captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED scale+position search (_aligned_alpha_map) before inverting the blend, so the single 2048 capture works at any resolution -- verified clean on the 1773x2364 (3:4) corpus size, the biggest coverage gap (23 files). reverse_alpha_available is now just 'asset present' (no width band); the registry still gates removal on detect so a clean corner is never touched. Drops the _ALPHA_WIDTH_TOLERANCE gate. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(doubao): keep native recovery exact -- fixed geometry at captured width Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39). remove_watermark_reverse_alpha now uses exact width-relative geometry within _ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best of both: native back to 0.94, other resolutions still aligned. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56) On a faint/busy-background mark the NCC alignment peak can wander a few px off the true mark and leave a residual (2/56 real corpus files). Off the captured width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the NCC-aligned alpha map, applies each, and keeps whichever leaves the least residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip now removes 56/56 detected Doubao clean across every corpus resolution (was 54). Co-Authored-By: Claude Opus 4.8 (1M context) * perf(doubao): skip residual inpaint at native width for exact recovery At the captured width the fixed-geometry reverse-alpha is pixel-exact, so inpainting over it only replaced exactly-recovered interior pixels with a cv2 hallucination -- measured worse on a textured background (native error vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint). Native now returns the bare recovery untouched; off-native, where NCC alignment is only sub-pixel-approximate, the footprint inpaint stays to clean the seam. Real round-trip still 56/56 across all corpus resolutions; negatives 0/60, Gemini unaffected. Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression guard. Sync CLAUDE.md + README (the table cell and prose described the pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the session scheduled_tasks.lock, and add the text-protection research note. Co-Authored-By: Claude Opus 4.8 (1M context) --------- Co-authored-by: Claude Opus 4.8 (1M context) --- .gitignore | 1 + CLAUDE.md | 15 +- README.md | 18 +- docs/text-protection-research.md | 138 +++++++++ .../assets/doubao_alpha.png | Bin 0 -> 8182 bytes src/remove_ai_watermarks/cli.py | 159 +++------- src/remove_ai_watermarks/doubao_engine.py | 284 +++++++++++++----- src/remove_ai_watermarks/identify.py | 72 ++++- src/remove_ai_watermarks/metadata.py | 58 +++- src/remove_ai_watermarks/noai/constants.py | 8 + .../trustmark_detector.py | 51 +++- .../watermark_registry.py | 202 +++++++++++++ tests/test_doubao_engine.py | 161 +++++++--- tests/test_identify.py | 56 ++++ tests/test_metadata.py | 68 +++++ tests/test_trustmark_detector.py | 53 ++++ tests/test_watermark_registry.py | 70 +++++ 17 files changed, 1148 insertions(+), 266 deletions(-) create mode 100644 docs/text-protection-research.md create mode 100644 src/remove_ai_watermarks/assets/doubao_alpha.png create mode 100644 src/remove_ai_watermarks/watermark_registry.py create mode 100644 tests/test_watermark_registry.py diff --git a/.gitignore b/.gitignore index 7fdaba1..523ddb7 100644 --- a/.gitignore +++ b/.gitignore @@ -34,6 +34,7 @@ yolov8n.pt # Claude Code local settings .claude/settings.local.json +.claude/scheduled_tasks.lock # Doubao watermark calibration (local only; ship only the derived alpha-map asset). # Synthetic seeds + raw Doubao captures are regenerable and not committed. diff --git a/CLAUDE.md b/CLAUDE.md index 6915995..2352269 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -5,7 +5,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r ## How to run - `uv run remove-ai-watermarks all -o ` -- `uv run remove-ai-watermarks visible -o ` — visible-mark removal, CPU, no GPU. `--mark auto` (default) routes between the Gemini sparkle and the Doubao "豆包AI生成" text strip by detector confidence; `--mark gemini` / `--mark doubao` force one. +- `uv run remove-ai-watermarks visible -o ` — known-visible-mark removal, CPU, no GPU. **Reverse-alpha only**: every mark is removed by inverting its captured alpha map (exact pixel recovery, no inpaint). `--mark auto` (default) picks the strongest detected of the Gemini sparkle and the Doubao "豆包AI生成" text strip; `--mark gemini` / `--mark doubao` force one. For arbitrary logos/objects use `erase`. - `uv run remove-ai-watermarks erase --region x,y,w,h -o ` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable. - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector - `uv run remove-ai-watermarks metadata --check` — inspect AI metadata (C2PA, EXIF, PNG chunks) @@ -33,19 +33,22 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `noai/c2pa.py` — PNG chunk parser; use `extract_c2pa_chunk(path)` to get raw caBX payload, `has_c2pa_metadata(path)` to detect. Do not reimplement chunk parsing. `extract_c2pa_info(path)` sets `synthid_watermark`/`synthid_vendors` when the manifest is signed by a SynthID-using vendor, and `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). - `noai/constants.py` — PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, C2PA_ISSUERS, `SYNTHID_C2PA_ISSUERS` (issuers that pair SynthID with C2PA: Google, OpenAI), and `C2PA_SOFT_BINDINGS` (soft-binding `alg` prefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new issuer/binding here, not inline. - `metadata.py` — `scan_head(path, size=1MB)` is the shared input for every C2PA/AIGC/IPTC byte scan: first `size` bytes plus, for ISOBMFF, the late provenance-box payloads from `isobmff.scan_c2pa_region` (catches a manifest after a large `mdat`); behavior-neutral (`f.read(size)`) for non-ISOBMFF. Use it instead of `open().read(1MB)` for any new marker scan. `synthid_source(path)` returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). `get_ai_metadata` surfaces the verdict, and `metadata --check` prints it as a callout. Both `get_ai_metadata` and `has_ai_metadata` guard the PIL open with `except Exception` (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. `xai_signature(path)` detects xAI/Grok's EXIF-only scheme (`ImageDescription` = `Signature: ` + UUID `Artist`); it feeds `has_ai_metadata`, `get_ai_metadata` (key `xai_signature`), and `identify`. `iptc_ai_system(path)` detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (`IPTC_AI_FIELD_MARKERS` = `AISystemUsed`/`AISystemVersionUsed`/`AIPromptInformation`/`AIPromptWriterName`) and returns the `AISystemUsed` generator name (or `"fields present"`). `remove_ai_metadata` routes **ISOBMFF video** (`.mp4`/`.mov`/`.m4v`) through the same `isobmff.strip_c2pa_boxes` as AVIF/HEIF (MP4 is ISOBMFF), and `_scrub_ai_exif` removes the xAI signature + AI-generator EXIF tags on JPEG output. -- `identify.py` — `identify(path)` aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1 `AISystemUsed`, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via `metadata.xai_signature`, the China TC260 AIGC label via `metadata.aigc_label`, the HuggingFace `hf-job-id` job marker via `metadata.huggingface_job`, visible Gemini sparkle, open invisible watermark, Adobe TrustMark via `trustmark_detector`) into one `ProvenanceReport`. `is_ai_generated` is True or None (never asserted False — stripped metadata is not proof of clean origin). The `hf_job` and visible-sparkle signals are **medium** confidence: each lifts an otherwise-Unknown verdict to a tentative AI (`hf_only` / `visible_only`, parallel branches) but is excluded from the high-confidence `ai_from_metadata` set, so neither overrides a hard metadata signal. Visible-sparkle is promoted only at confidence ≥ `_SPARKLE_THRESHOLD` (0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49). The cv2 dependency lives in `gemini_engine.detect_sparkle_confidence`, not here. **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`). **Why, verified on real signed files 2026-05-26:** the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. **Token distinctiveness is load-bearing:** bare `b"Truepic"` mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI `chatgpt-1.png` fixture), so the token is the specific `b"Truepic_Lens"` from the Lens SDK claim generator; likewise `b"Pixel Camera"` (cert CN) not bare `b"Pixel"`. `_DEVICE_C2PA_PLATFORM` lists ONLY tokens **verified against a real C2PA file**: Leica (`lc_c2pa`/`Leica Camera`), Nikon (`NIKON`), Pixel (`Pixel Camera` -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (`sony.sig`/`sony.cert` -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (`Truepic_Lens`). Canon/Samsung/Bria have **no public direct-download C2PA sample** (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the `sony.*` namespace but are not separately verified. Camera C2PA marks capture authenticity, not AI (Pixel carries `computationalCapture`, not `trainedAlgorithmicMedia`), so these never set `is_ai` -- that stays driven by digital-source-type. `c2pa.cbor_text_after` (now public) is best-effort for the `generator` detail string only and can be None when the manifest keys it `claim_generator_info` (Pixel). **Issuer→generator mapping is `is_ai`-gated** (`_attribute_platform(issuers, is_ai=c2pa_is_ai)`): a specific AI-generator platform is named only when the digital-source-type is `trainedAlgorithmicMedia`; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an *unmapped* Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). `_attribute_platform` defaults `is_ai=True` so the mapping stays unit-testable in isolation. Add device tokens to `_DEVICE_C2PA_PLATFORM`, generator/issuer platforms to `_ISSUER_PLATFORM`, not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (`_issuers_in`) and generator (`_ai_tools_in`, reusing `C2PA_AI_TOOLS`) are recovered by binary-scanning the first MB. EXIF `Software` / `Make` / `Artist` / `ImageDescription` and XMP `CreatorTool` generator tags are read by `metadata.exif_generator` (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against `AI_GENERATOR_TOKENS` so ordinary editors (plain "Adobe Photoshop") and real-camera `Make` ("Apple"/"Canon") are not flagged. **Ideogram tags its output with EXIF `Make="Ideogram AI"`** (verified on a real download 2026-05-24) — that's why `Make` is read. **Integrity-clash detection** (`_integrity_clashes`, surfaced as `ProvenanceReport.integrity_clashes`, printed in red by `identify` and serialized to `--json`): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIF `Make="Ideogram AI"`), and (2) a camera-capture C2PA device (`_DEVICE_C2PA_PLATFORM`) coexisting with any AI-generation marker. Vendor normalization is `_vendor_of` over `_AI_VENDOR_TOKENS` (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). **High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`). +- `identify.py` — `identify(path)` aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1 `AISystemUsed`, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via `metadata.xai_signature`, the China TC260 AIGC label via `metadata.aigc_label`, the HuggingFace `hf-job-id` job marker via `metadata.huggingface_job`, the Samsung Galaxy AI editing marker via `metadata.samsung_genai`, visible Gemini sparkle, open invisible watermark, Adobe TrustMark via `trustmark_detector`) into one `ProvenanceReport`. `is_ai_generated` is True or None (never asserted False — stripped metadata is not proof of clean origin). The `hf_job`, visible-sparkle, and Samsung `samsung_genai` signals are **medium** confidence: each lifts an otherwise-Unknown verdict to a tentative AI (`hf_only` / `visible_only` / `samsung_only`, parallel branches) but is excluded from the high-confidence `ai_from_metadata` set, so none overrides a hard metadata signal. Visible-sparkle is promoted only at confidence ≥ `_SPARKLE_THRESHOLD` (0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49). The cv2 dependency lives in `gemini_engine.detect_sparkle_confidence`, not here. **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`). **Why, verified on real signed files 2026-05-26:** the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. **Token distinctiveness is load-bearing:** bare `b"Truepic"` mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI `chatgpt-1.png` fixture), so the token is the specific `b"Truepic_Lens"` from the Lens SDK claim generator; likewise `b"Pixel Camera"` (cert CN) not bare `b"Pixel"`. `_DEVICE_C2PA_PLATFORM` lists ONLY tokens **verified against a real C2PA file**: Leica (`lc_c2pa`/`Leica Camera`), Nikon (`NIKON`), Pixel (`Pixel Camera` -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (`sony.sig`/`sony.cert` -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (`Truepic_Lens`). Canon/Bria have **no public direct-download C2PA sample** (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the `sony.*` namespace but are not separately verified. **Samsung Galaxy + ASUS Gallery live in a separate `_SIGNER_C2PA_PLATFORM` (scanned after `_device_platform`, before the issuer fallback), NOT in `_DEVICE_C2PA_PLATFORM`** — verified on real signed files 2026-05-29. Reason: a Galaxy phone stamps BOTH its device cert AND a `trainedAlgorithmicMedia`/genAIType AI marker on a Generative-Edit image, so treating it as a "genuine camera capture" would false-fire integrity-clash rule 2 on every Galaxy AI edit. The signer tokens (`b"Samsung Galaxy"` cert org — distinct from the EXIF `SM-xxxx` model string on ordinary Samsung photos; `b"com.asus.gallery"` claim generator) only resolve the platform label; the AI verdict still comes from the source-type / genAIType. ASUS Gallery is a C2PA-signed edit with no AI marker, so it attributes the platform without asserting `is_ai`. **Samsung's `genAIType` (in the proprietary `PhotoEditor_Re_Edit_Data` JSON) is an undocumented Galaxy-AI editing marker** (`metadata.samsung_genai`, gated on the `PhotoEditor_Re_Edit_Data` container; non-zero value = AI tool used, values {1,5} observed): medium-confidence because the field has no public spec (verified 2026-05-29: absent from C2PA spec + Samsung docs), but it co-occurred with `trainedAlgorithmicMedia` in 3/3 verified files that record a source-type and was the SOLE AI marker on a Galaxy S24 file that omits the source type. Camera C2PA marks capture authenticity, not AI (Pixel carries `computationalCapture`, not `trainedAlgorithmicMedia`), so these never set `is_ai` -- that stays driven by digital-source-type. `c2pa.cbor_text_after` (now public) is best-effort for the `generator` detail string only and can be None when the manifest keys it `claim_generator_info` (Pixel). **Issuer→generator mapping is `is_ai`-gated** (`_attribute_platform(issuers, is_ai=c2pa_is_ai)`): a specific AI-generator platform is named only when the digital-source-type is `trainedAlgorithmicMedia`; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an *unmapped* Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). `_attribute_platform` defaults `is_ai=True` so the mapping stays unit-testable in isolation. Add capture-camera tokens to `_DEVICE_C2PA_PLATFORM`, editing-app/AI-device signer tokens to `_SIGNER_C2PA_PLATFORM`, generator/issuer platforms to `_ISSUER_PLATFORM`, not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (`_issuers_in`) and generator (`_ai_tools_in`, reusing `C2PA_AI_TOOLS`) are recovered by binary-scanning the first MB. EXIF `Software` / `Make` / `Artist` / `ImageDescription` and XMP `CreatorTool` generator tags are read by `metadata.exif_generator` (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against `AI_GENERATOR_TOKENS` so ordinary editors (plain "Adobe Photoshop") and real-camera `Make` ("Apple"/"Canon") are not flagged. **Ideogram tags its output with EXIF `Make="Ideogram AI"`** (verified on a real download 2026-05-24) — that's why `Make` is read. **Integrity-clash detection** (`_integrity_clashes`, surfaced as `ProvenanceReport.integrity_clashes`, printed in red by `identify` and serialized to `--json`): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIF `Make="Ideogram AI"`), and (2) a camera-capture C2PA device (`_DEVICE_C2PA_PLATFORM`) coexisting with any AI-generation marker. Vendor normalization is `_vendor_of` over `_AI_VENDOR_TOKENS` (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). **High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`). +- `watermark_registry.py` — **single catalog of known visible watermarks**, the unified "find known marks in their usual places, recognize, remove" entry. **Reverse-alpha only by policy**: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (`original = (wm - a*logo)/(1-a)`, exact recovery) — no inpaint/heuristic removal here (arbitrary-region inpainting lives in `region_eraser`/`erase`). Each `KnownMark` ties a key to {usual `location`, `in_auto` flag, `recovery` (="reverse-alpha"), a `detect` adapter → uniform `MarkDetection`, a `remove` adapter}. Entries today: `gemini` (bottom-right sparkle) and `doubao` (bottom-right "豆包AI生成"). `detect_marks` scans all; `best_auto_mark` picks the highest-confidence detection. **Cross-engine confidences aren't directly comparable**, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (`_GEMINI_AUTO_MIN_CONF`) for its `detected` flag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacks `auto`. `cli.cmd_visible` is registry-driven: `--mark auto` → `best_auto_mark`, `--mark ` → that mark; `--mark` choices come from `mark_keys()`. `_doubao_remove` applies reverse-alpha only when the mark is detected AND `reverse_alpha_available` (resolution in the alpha band); outside that, removal is **skipped** (not inpainted). Add a new visible mark = one `KnownMark` entry + its engine (with a captured alpha map); do not re-add per-mark `if` branches in the CLI. - `gemini_engine.py` — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). `detect_sparkle_confidence(path)` is the file-level entry point used by `identify.py`. -- `doubao_engine.py` — visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH, fractions in module constants; no bundled template), `extract_mask` pulls the light low-saturation glyphs with a **polarity-aware white top-hat** (brighter-than-blurred-local-bg, so white-paper documents are left untouched instead of smeared), `detect` thresholds glyph coverage (`DETECT_MIN_COVERAGE` 0.16 separates real marks ≥0.20 from corner noise, which stays ≤0.06 on large images but can spike to ~0.15 on tiny ones), `remove_watermark` inpaints (cv2 Telea/NS) and **bails when coverage > `MAX_INPAINT_COVERAGE` 0.50** (dense-text background → would smear). Wired into `visible --mark` via `cli._run_doubao_if_selected`. **Logo is near-white (~253), not the gray some third-party tools assume.** Best on photo/illustration backgrounds; high-contrast edges leave faint residue (cv2-inpaint limit). Clean per-pixel reverse-alpha (Gemini-style) needs a **black-background capture** (`alpha = capture/255`), not more content images -- content-image distillation was tried and fails; see "Doubao clean-reverse-alpha distillation" below. +- `doubao_engine.py` — visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU), **reverse-alpha only**. `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light low-saturation glyphs (the detection candidate). `detect` is **reverse-alpha-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`, the exact shape we invert) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage/structure heuristics) fixed #23: corpus FP fell to 7/1243 (0.6%); old coverage-only fired on ~28%. **Removal is exact reverse-alpha** (`remove_watermark_reverse_alpha`): `original = (wm - a*logo)/(1-a)` from the bundled alpha map + `_ALPHA_LOGO_BGR` (near-white ~253) + `_ALPHA_*_FRAC` geometry. The alpha map + logo were **solved from real black+gray Doubao captures** (`data/doubao_capture/captures/`, gitignored): on black `captured = a*logo`, the black/gray pair solves `a` per-pixel without assuming the logo colour (white capture cross-validates: mark → flat fill). The single captured alpha map (at width 2048) **generalizes to any resolution**: at (near) the captured width (`_ALPHA_NATIVE_BAND` of `_ALPHA_NATIVE_WIDTH`) `_fixed_alpha_map` places it by exact width-relative geometry (pixel-exact recovery, ~0.9 mean error — the whole point of reverse-alpha); off that width it **tries BOTH placements -- fixed geometry AND `_aligned_alpha_map`'s `TM_CCOEFF_NORMED` scale+position search (`_ALPHA_ALIGN_SEARCH`) -- and keeps whichever leaves the least residual mark** (re-`detect` confidence on the bare reverse-alpha). On a faint/busy-background mark the NCC peak wanders a few px and geometry wins; on a clear mark alignment wins -- no magic threshold, it just picks the better removal. Verified **56/56 real detected-Doubao removed clean across all corpus resolutions** (2048 fixed 27/27, 1773 22/22, plus 1185/1187/1535/1672); a single fixed-vs-aligned choice left 2/56 busy-background residuals, try-both fixed them. `reverse_alpha_available` is just "asset present"; the registry still gates removal on `detect` so a clean corner is never touched. **Residual inpaint is off-native-only:** at the captured width the fixed-geometry recovery is exact, so it is returned untouched -- inpainting over exactly-recovered interior pixels only swaps them for a cv2 hallucination (measured worse, native textured-bg error vs true bg **1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint**; regression-guarded by `test_native_returns_exact_reverse_alpha_no_inpaint`). Off-native the NCC alignment is only sub-pixel-approximate, so the interior is no longer exact and a residual inpaint over the glyph footprint cleans the seam (costs nothing there and reliably clears the mark). The shipped third-party `_refs/zhengsuanfa_doubao_alpha_120x20.png` is NOT a usable alpha (≈0.85 everywhere → blacks out on inversion; wrong resolution/version), verified 2026-05-29. There is no inpaint-based removal here (removed 2026-05-29; arbitrary-region inpainting is `region_eraser`/`erase`). - `region_eraser.py` — universal region eraser (`erase` CLI). `erase(image, boxes=|mask=, backend=)`: `boxes_to_mask` → `cv2.inpaint` (`cv2` backend, default, no deps) or big-LaMa via onnxruntime (`lama` backend, extra `lama`, `Carve/LaMa-ONNX` Apache-2.0 model downloaded on first use, never bundled). `erase_lama` crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy `_get_lama_session` singleton; `lama_available()` guards the optional import. **LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU** (FFC working set, not arena — `enable_cpu_mem_arena=False` does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal. - `invisible_watermark.py` — `detect_invisible_watermark(path)` decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via the `imwatermark` library. Known fixed patterns (verified against upstream source) live in `_BITS_48` (SDXL 48-bit, FLUX.2 48-bit) and `_SD1_STRING` ("StableDiffusionV1", SD 1.x/2.x). Optional dep (extra `detect`); returns None when absent. The `detect` extra pulls **torch** transitively (invisible-watermark declares torch a hard dep, and `WatermarkDecoder` eagerly imports `rivaGan` -> `torch` at import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separate `gpu` extra required. **Unlike SynthID this is locally detectable**, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs. -- `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. +- `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise. - `text_protector.py` — text-region protection for the `invisible` SDXL img2img pass (issue #21: CJK/small text deforms at watermark-removal strengths). `is_available()` gates on `cv2.dnn.TextDetectionModel_DB`; `TextProtector.detect_text_boxes(bgr)` runs the **PP-OCRv3 DB** ONNX detector (~2.4 MB, Apache-2.0, opencv_zoo, returns rotated quad polygons) — downloaded+cached to `~/.cache/remove-ai-watermarks` on first use via atomic temp-rename, never bundled, **no torch (cv2.dnn only)**. **Detection is script-agnostic** (DB segments text *regions*, not characters), so Latin / Cyrillic / CJK / Hangul / Arabic / digits all detect identically — language was never the recall lever, **resolution was**. `_detection_input_size(h, w)` (pure, unit-tested) detects at the **native long side capped at `_DET_MAX_LONG_SIDE` (1536), never upscaled**: the old fixed 736 downscaled large canvases so small text fell below the detector and was missed (issue #14, e.g. ~16 px text on a 2048 image). `scripts/text_detection_benchmark.py` measures recall across scripts × sizes × canvas: the cap fix lifts overall hit-rate 0.91 → 1.00 (worst cell 2048/16 px: 0.06 → 1.00) at ~100 ms CPU. Very large canvases with tiny text may still need tiling (documented limit, not built). `build_change_map(boxes, h, w, preserve=0.9, feather=15)` paints a Differential-Diffusion change map. **Polarity (verified empirically):** white(1.0)=PRESERVE original pixels, black(0.0)=MAX change; map is black bg + `preserve` inside text polygons, Gaussian-feathered edges, clipped to [0,1]. `preserve` stays below a hard 1.0 freeze by default so text still scrubs lightly (SynthID survives cropping). Wired into `watermark_remover._run_differential` via the community `pipeline_stable_diffusion_xl_differential_img2img` (loaded with `custom_revision="0.38.0"` — HF resolves the **PyPI** version string, not the `v0.38.0` git tag); gated to the SDXL `DEFAULT_MODEL_ID` only (`_can_protect_text`), falls back to plain img2img otherwise. **Autonomous by default** (`protect_text=True` in `invisible_engine`/`watermark_remover`, mirroring `protect_faces`): the detector runs per image and `_run_differential` falls back to plain img2img when **no boxes** are found, so text-free inputs pay only the cheap cv2 detection (no differential-pipeline load). CLI exposes a single off-switch `--no-protect-text` on `invisible`/`all` (passed as `protect_text=not no_protect_text`); the unavailable-model case logs at debug, not warning, since it is now the default path. The diff pipeline upcasts the VAE to fp32 internally, so do **not** add `upcast_vae()`/`enable_attention_slicing` (both produced NaN/black on fp16 MPS). `build_change_map` is unit-tested without any model download (`tests/test_text_protector.py`). - `face_protector.py` — YOLO detect + soft-blend pattern; mirror this for any "protect region during diffusion" features - `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env. ### Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29) -**Conclusion: pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it.** The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- `data/spaces/originals/` holds plenty. Curate them with `DoubaoEngine.detect` + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. **15 pixel-aligned 2048² marks** (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean `O` + weighted-LS (and per-pixel I-on-O regression) for `α` (+ logo colour) was tried end-to-end and **still leaves a persistent ghost outline.** +**RESOLVED 2026-05-29: black+gray Doubao captures were obtained and the exact reverse-alpha is built** (`doubao_engine.remove_watermark_reverse_alpha`, `assets/doubao_alpha.png`; see the `doubao_engine.py` bullet above). The captures (`data/doubao_capture/captures/`, gitignored) confirmed the alpha-composite model: on black `captured = a*logo`, the black/gray pair solves `a` per-pixel (`a_max≈0.65`, logo near-white), and the white capture cross-validates. A single 2048 capture suffices: sub-pixel NCC alignment (`_aligned_alpha_map`) registers the alpha glyph to the real mark, so it works at any resolution (verified on the 1773×2364 3:4 corpus size), not just 2048. The notes below (the failed content-image distillation) are retained as the record of why captures were necessary. + +**Conclusion (historical): pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it.** The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- `data/spaces/originals/` holds plenty. Curate them with `DoubaoEngine.detect` + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. **15 pixel-aligned 2048² marks** (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean `O` + weighted-LS (and per-pixel I-on-O regression) for `α` (+ logo colour) was tried end-to-end and **still leaves a persistent ghost outline.** Diagnosed why, empirically (cached stacks, `/tmp/doubao_distill`): (1) the mark is a clean white overlay with **no dark halo** -- over glyph pixels ~54% are brighter than the clean bg, only ~4% darker -- so the white-logo model `I=(1-α)O+α·255` is correct; (2) but content backgrounds are almost never dark *under* the mark (median darkest available bg over glyph pixels = **58/255**; only ~13% of mark pixels are ever observed on a bg < 40), so on bright backgrounds the equation is ill-conditioned and `α` is unidentifiable; (3) LaMa's `O` is a plausible **hallucination**, not the true pre-mark background, which compounds the error, and per-pixel regression on ~15 obs overfits into colour noise. @@ -57,7 +60,7 @@ Diagnosed why, empirically (cached stacks, `/tmp/doubao_distill`): (1) the mark Who embeds what, and whether it is locally detectable (so we know which gaps are fillable). See `identify.py` for what we read. - **Locally detectable (open decoder, no key/API):** Stable Diffusion / SDXL / FLUX via `imwatermark` DWT-DCT (now covered by `invisible_watermark.py`). FLUX uses the same library (`black-forest-labs/flux2` `src/flux2/watermark.py`, 48-bit `0b001010101111111010000111100111001111010100101110`); SDXL is the diffusers `WATERMARK_MESSAGE` (`0b101100111110110010010000011110111011000110011110`). Caveat: fragile to re-encoding. -- **C2PA / IPTC (covered by the issuer/marker scan):** OpenAI, Google, Adobe Firefly, Microsoft (Designer + **Bing Image Creator** — collected 2026-05-24; Bing now runs Microsoft's own **MAI-Image** model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), and **Stability AI** (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to `C2PA_ISSUERS`). Still unsampled: Canva (its downloads are re-encoded design *exports* that strip C2PA, so a Canva "positive" is inconclusive — skipped), Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (our `mj-*` sample carried only the IPTC tag). +- **C2PA / IPTC (covered by the issuer/marker scan):** OpenAI, Google, Adobe Firefly, Microsoft (Designer + **Bing Image Creator** — collected 2026-05-24; Bing now runs Microsoft's own **MAI-Image** model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), and **Stability AI** (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to `C2PA_ISSUERS`). Still unsampled: Canva (its downloads are re-encoded design *exports* that strip C2PA, so a Canva "positive" is inconclusive — skipped), Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (our `mj-*` sample carried only the IPTC tag). **Samsung Galaxy AI** (Generative Edit / Sketch to Image / Portrait Studio on Galaxy S23 FE / S24 / S25, One UI 7+) signs C2PA as "Samsung Galaxy" with the standard `trainedAlgorithmicMedia` source type AND a proprietary `genAIType` marker; verified on real signed files 2026-05-29 (the standard scan catches the source type; `genAIType` additionally catches a Galaxy S24 file that omits it). **ASUS Gallery** also signs edited photos as C2PA (`com.asus.gallery`) but with no AI source type — a signer, not an AI marker. **Black Forest Labs (FLUX)** API output signs C2PA: `claim_generator_info "Black Forest Labs API"` + a `c2pa.ai_generated_content` assertion + `trainedAlgorithmicMedia` (issuer `b"Black Forest Labs"` added to `C2PA_ISSUERS`, platform "Black Forest Labs (FLUX)"). **ByteDance Volcano Engine (Volcengine)** — the cloud behind Doubao / Jimeng — signs its AI image output with a cert from `certificate_center@volcengine.com` + `trainedAlgorithmicMedia` (issuer `b"volcengine"` → "ByteDance (Volcano Engine)", platform "ByteDance (Doubao / Jimeng / Volcano Engine)"); note this is the C2PA-signed surface, distinct from the XMP/PNG TC260 `AIGC` label Doubao also uses. All three verified on real signed files 2026-05-29. - **EXIF/XMP generator tag (caught by `exif_generator`):** **Ideogram** writes EXIF `Make="Ideogram AI"` (collected 2026-05-24 — no C2PA, no SynthID, no imwatermark; the Make tag is the only signal). - **xAI / Grok — its own EXIF signature scheme, NOT C2PA (DETECTED by `metadata.xai_signature`, built 2026-05-26).** Grok JPEG downloads (Aurora model) carry **no C2PA, no XMP, no SynthID, no IPTC** — only EXIF `Artist` = a UUID and EXIF `ImageDescription` = `Signature: ` (a crypto signature, unverifiable locally without xAI's public key). This empirically kills the earlier unverified "xAI signs C2PA as xAI" lead — xAI is not even a C2PA member. `exif_generator` misses it (neither field holds an `AI_GENERATOR_TOKENS` token), so a dedicated detector `xai_signature(path)` matches the pair (`ImageDescription ~ ^Signature: [A-Za-z0-9+/=]{64,}` AND UUID `Artist`); wired into `has_ai_metadata`, `get_ai_metadata` (key `xai_signature`), and `identify` (signal `xai_signature`, platform "xAI (Grok / Aurora)"). **Format confirmed stable across n=3 genuine generations:** exactly three EXIF tags (`Artist`, `ExifOffset`, `ImageDescription`), `Signature:` prefix constant, base64 payload 300-1004 chars. Two capture facts: (a) the `Artist` UUID **equals the public image id** in the asset URL (`https://imagine-public.x.ai/imagine-public/images/.jpg`), so it is NOT a private per-user secret — only the `Signature` blob is; (b) the Grok web-UI image is a re-encoded **WebP with no signature** — the EXIF survives only in the *original* JPEG (download button or that public tokenless URL), which is why screenshots / re-encodes are metadata-stripped. A real fixture `data/samples/grok-1.jpg` plus **synthetic** JPEG fixtures (fake UUID + fake `Signature:` blob) cover the detector; never add a real Grok image carrying private content (the repo is public). **Stripped on removal too:** `remove_ai_metadata` now calls `_scrub_ai_exif` on the JPEG EXIF, which deletes the xAI Signature+UUID-Artist pair **and** any `Software`/`Make`/`Artist`/`ImageDescription` tag holding an `AI_GENERATOR_TOKENS` token (so Ideogram's `Make="Ideogram AI"` is scrubbed too), while keeping genuine camera/editor EXIF. The shared `_is_xai_signature_pair` helper (module-level compiled regexes) is the single source of truth for the pattern, used by both `xai_signature` and `_scrub_ai_exif`. (AVIF/HEIF/JXL still strip only C2PA boxes via `isobmff`, not EXIF — unchanged.) - **China TC260 AIGC label (caught by `AIGC_MARKERS` / `metadata.aigc_label`, surfaced by `identify` as the `aigc` signal):** China-served generators embed an XMP `{"Label":"1","ContentProducer":...}` block — China's mandatory AI-content labeling (TC260 namespace `tc260.org.cn/ns/AIGC`). **Doubao** (ByteDance) uses it (verified on the real #13 sample 2026-05-25; `ContentProducer` `001191110102MACQD9K64010000`, no C2PA/SynthID/imwatermark — the XMP block is the only signal; GitHub attachment upload did NOT strip it). The same standard is mandatory for Jimeng/Kling/Qwen/Ernie etc., so the one marker covers the whole China-AIGC-labeled ecosystem. `aigc_label` reads **two serializations** through a shared `_parse` helper: the HTML-entity-encoded XMP `` block (container-agnostic raw-byte scan, any JSON object accepted) **and** a raw-JSON PNG `AIGC` tEXt chunk — Doubao also writes the label this way, with no namespaced marker at all (confirmed on the corpus 2026-05-28, `ContentProducer="doubao"`). The PNG-chunk path is gated on at least one TC260 field (`_TC260_FIELDS`) so a generic `AIGC` key cannot false-positive. In `identify`, `aigc` fires on the parsed label **or** the `AIGC_MARKERS` byte scan (the latter preserves the laundering-tell case where the JSON payload is truncated). diff --git a/README.md b/README.md index aab15fe..8080038 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu ## Features -- **Visible watermark removal** — Gemini / Nano Banana sparkle logo (reverse alpha blending) and the Doubao "豆包AI生成" text strip (locate + mask + inpaint); fast, offline, deterministic, no GPU. `visible --mark auto` picks the right one +- **Visible watermark removal** — a registry of known marks in their usual places: the Gemini / Nano Banana sparkle and the Doubao "豆包AI生成" text strip. Each is removed by **exact reverse-alpha blending** against a captured alpha map (`original = (wm − α·logo)/(1−α)`), recovering the true pixels rather than inpainting a guess. Fast, offline, no GPU. `visible --mark auto` finds and removes the strongest detected mark. (For arbitrary logos/objects, see `erase`.) - **Universal region eraser (`erase`)** — remove any logo / watermark / object inside boxes you specify, regardless of position or colour. Default cv2 inpainting (CPU, instant); optional big-LaMa via onnxruntime (`lama` extra) for higher quality - **Invisible watermark removal** — SynthID, StableSignature, TreeRing via diffusion-based regeneration (needs a local GPU, or run it with no setup on [raiw.cc](https://raiw.cc)) - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType @@ -49,7 +49,9 @@ If this tool saves you time, consider [sponsoring its development](https://githu | **xAI Grok (Aurora)** | — | — | ✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist` | Detected (`identify`); metadata strip | | **Midjourney** | — | — | ✅ EXIF + XMP (prompt, model, seed) | Metadata strip | | **Meta AI** | — | — | ✅ IPTC "Made with AI" (digitalSourceType) | Metadata strip (removes the label) | -| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label — `` XMP **or** `AIGC` PNG chunk (China's mandatory AI labeling) | Locate + mask + inpaint (cv2, CPU) + metadata strip | +| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label (`` XMP **or** `AIGC` PNG chunk) **+ C2PA** signed by ByteDance Volcano Engine (`volcengine`) | Exact reverse-alpha (captured α map): pixel-exact at native width, NCC-aligned at other resolutions, + metadata strip | +| **Samsung Galaxy AI** (Generative Edit, Sketch to Image, ...) | — | — | ✅ C2PA (signer "Samsung Galaxy") + `trainedAlgorithmicMedia` / proprietary `genAIType` marker | Detected (`identify`) + metadata strip | +| **Black Forest Labs** (FLUX API) | — | — | ✅ C2PA (`Black Forest Labs API` + `c2pa.ai_generated_content` + `trainedAlgorithmicMedia`) | Metadata strip | | **StableSignature** (Meta) | — | ✅ In-model watermark | — | Diffusion regeneration | | **TreeRing** | — | ✅ Latent space watermark | — | Diffusion regeneration | @@ -79,9 +81,9 @@ A three-stage NCC (Normalized Cross-Correlation) detector finds the watermark po ### Removing the Doubao "豆包AI生成" text watermark -Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. Unlike the fixed-size Gemini sparkle, it is a text strip that scales with image width, so we anchor a generous bottom-right box by geometry, extract the light low-saturation glyph pixels with a polarity-aware white top-hat mask, and inpaint them (cv2 Telea/NS). The mask is background-relative, so it leaves white-paper documents untouched instead of smearing their text. On dense-text backgrounds where the mask would explode, removal is skipped rather than guessed. +Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. It is a fixed semi-transparent white overlay, so — like the Gemini sparkle — it is removed by **exact reverse-alpha blending**: `original = (watermarked - α·logo) / (1 - α)`, recovering the true pixels instead of hallucinating them. The α map and logo colour were solved from controlled black + gray captures (on black, `captured = α·logo`; the black/gray pair solves α per-pixel). At the captured width the placement is exact, so the recovery is returned untouched (inpainting over exactly-recovered pixels only degrades them). The single capture generalizes to any resolution: off the captured width an NCC scale-and-position search registers the α template to the actual mark, and a light residual inpaint cleans the sub-pixel seam there. Detection is consistent with removal: it matches the same alpha glyph silhouette against the corner (normalized correlation), so it keys on the actual "豆包AI生成" shape, not on textured corners. -**Speed**: ~0.03s per image. No GPU needed. Best on photo / illustration backgrounds; on high-contrast edges a faint residue can remain (use `erase --backend lama` for neural-quality fill). +**Speed**: ~0.05s, no GPU needed. Reverse-alpha at the captured resolution recovers the true background pixels exactly. ### Universal region eraser @@ -237,9 +239,9 @@ remove-ai-watermarks batch ./images/ --mode all # of a clean origin. Add --json for machine-readable output. remove-ai-watermarks identify image.png -# Visible watermark only — fast, offline, CPU. --mark auto (default) picks -# between the Gemini sparkle and the Doubao "豆包AI生成" text strip; force one -# with --mark gemini / --mark doubao. +# Visible watermark only — fast, offline, CPU. --mark auto (default) finds the +# strongest known mark (Gemini sparkle / Doubao "豆包AI生成" text); force one +# with --mark gemini / doubao. Removed by exact reverse-alpha (true-pixel recovery). remove-ai-watermarks visible image.png -o clean.png # Erase arbitrary region(s) — universal, any logo/watermark/object, any position. @@ -329,7 +331,7 @@ Tracked but not yet implemented: - **Real non-PNG C2PA fixtures**. SynthID-source detection for JPEG / WebP / AVIF is currently covered only by synthetic byte blobs; replace with real vendor-emitted files to ground the binary-scan path. - **Maintenance debt**. Strict pyright is now clean across `src/` (0 errors): pure-logic files are fully typed, the cv2 / torch / diffusers boundary files carry a documented per-file relax pragma, and a local `typings/piexif` stub covers piexif. Remaining: full-project `pyright` (no path) still OOMs node on this ML-heavy repo, so it must be scoped to `src/`; narrowing the boundary pragmas back toward full strict (as upstream stubs improve) is the long tail. (`uv-secure` is already clean since `idna` was bumped to 3.16.) - **AVIF / HEIF `Exif` item inside the `meta` box**. An AI-label *XMP* packet in a `meta`-box item is now blanked in place (v0.6.9), but EXIF stored as a `meta`-box `Exif` *item* is still not removed — it needs full `iinf`/`iloc` surgery (offset rewrite, corruption risk) or `exiftool` (a non-bundled binary dependency). Low priority: the AI labels we target are XMP, not EXIF, so an EXIF-only meta-box case is rare. -- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic are mapped (each verified against a real signed file). Canon and Samsung Galaxy (AI-edit) are deferred until a real signed sample surfaces — no public direct-download C2PA file exists for them today (upload-to-verify / news-agency-licensed only). +- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic capture cameras are mapped (each verified against a real signed file); **Samsung Galaxy AI**, **Black Forest Labs (FLUX)**, and **ByteDance Volcano Engine** (Doubao / Jimeng) are now attributed too (verified on real signed files). Canon is still deferred until a real signed sample surfaces — no public direct-download C2PA file exists for it today (upload-to-verify / news-agency-licensed only). - **Resemble PerTh audio detection** — evaluated, not feasible with the public API: `get_watermark()` returns a raw bit array with no presence/confidence flag, so watermarked vs. clean audio can't be reliably separated without Resemble's fixed payload or a confidence service. Same wall as the SynthID pixel detector. - **Video pipeline (`noai-video`)**: per-frame inpainting and tracking for Sora 2 dynamic logo, Veo 3.1 badge, Kling, Runway. Separate package, not folded into this repo. diff --git a/docs/text-protection-research.md b/docs/text-protection-research.md new file mode 100644 index 0000000..9c54428 --- /dev/null +++ b/docs/text-protection-research.md @@ -0,0 +1,138 @@ +# Text protection research: crisp text under a "watermark removed everywhere" constraint + +Date: 2026-05-29. Source: a deep-research run (104 agents, 5 search angles, sources +fetched and 3-vote adversarially verified). Not committed automatically — saved as a +research note for the next session. + +## The constraint that frames everything + +The invisible watermark (Google SynthID) must be removed **everywhere, including inside +text regions**. Therefore any technique that keeps or composites the **original +(watermarked) text pixels** is disqualified — the text must be *regenerated / freshly +synthesized* enough to scrub the watermark, yet rendered crisply. This single rule is the +filter applied to every candidate below. + +## Problem recap + +The `invisible` pipeline is SDXL base 1.0 img2img at low strength (~0.05) to defeat +SynthID with minimal visible change. Text is protected via Differential Diffusion with a +per-pixel change map (`preserve` ~0.9) driven by the PP-OCRv3 DB detector +(`text_protector.py`). Large text survives; **small text (sub ~8 px strokes) softens or +garbles** (issue #14, confirmed on real content). + +## Executive summary + +The fine-text softening is an **architectural consequence of latent-space processing, not +a tuning problem**: SDXL's 4-channel VAE (~48x compression) discards high-frequency signal +on encode, and Differential Diffusion blends in latent space with the change map +downsampled by 8x, so any stroke under ~8 px sits inside one latent cell and cannot be +preserved or edited cleanly **regardless of `preserve`** (the Differential Diffusion +authors state this limit explicitly). Two structurally sound directions keep the +"watermark removed everywhere" guarantee because they **synthesize fresh glyph pixels** +rather than compositing originals: (1) glyph/text-conditioned diffusion re-render of +detected text (AnyText2, EasyText), and (2) a two-stage architecture — global scrub, then +a dedicated text-restoration / text-aware super-resolution pass over detected regions +(TIGER, TextSR, TeReDiff/TAIR). **EasyText** and **TextSR** are the most promising for this +CJK-first pipeline (both multilingual via DiT/ByT5, both regenerate from glyph or +character-shape priors). The deepest fix — a 16-channel (SD3/FLUX) VAE — materially reduces +the softening but means switching the base model, not a drop-in VAE swap. + +## Constraint reconciliation (important) + +The generic research "quick win: bump `preserve` toward 1.0" is **invalid under our hard +constraint**: raising `preserve` freezes the text region, so SynthID there is **not +scrubbed**. Likewise, pixel paste-back of the original text is disqualified. The only +constraint-compatible quick win is **higher resolution / tiled diffusion** (strokes span +more latent cells, less VAE softening, while the text is still fully regenerated and thus +scrubbed). The real answer is **regenerate text crisply**, not freeze it. + +## Findings (with confidence and sources) + +### Finding 1 — confidence: high + +**Claim.** The small-text softening is an architectural latent-space limit, not a tuning issue. SDXL's VAE compressively encodes (losing exact color and fine detail on every round-trip), and Differential Diffusion blends in latent space with the change map downsampled to latent resolution (8x), so the method explicitly caps edit/preserve granularity at ~8 px under SD settings. Text strokes below one latent cell cannot be cleanly preserved even at preserve ~0.9. + +**Evidence.** Differential Diffusion's paper states a "cap on the resolution of the change map ... can limit the ability to precisely edit small objects (less than 8 pixels for Stable-Diffusion's settings)"; the official SDXL pipeline downsamples the map by `vae_scale_factor=8` and blends `latents = original*mask + latents*(1-mask)` in latent space. The VAE encode is "compressive ... exact color qualities and exact visual fine-details are lost." arXiv:2512.05198 confirms "resizing the pixel mask to latent resolution discards fine structure ... downsamples by 1/8" and that linear latent blending "cannot be pixel-equivalent." Higher compression = more high-frequency loss (arXiv:2305.02541). + +**Sources.** https://onlinelibrary.wiley.com/doi/10.1111/cgf.70040 · https://differential-diffusion.github.io/ · https://github.com/exx8/differential-diffusion · https://arxiv.org/abs/2512.05198 · https://omriavrahami.com/blended-latent-diffusion-page/ · https://arxiv.org/pdf/2305.02541 + +### Finding 2 — confidence: low (do not build on it yet) + +**Claim.** Pixel-space differential / blended-latent variants exist as a research direction, but the specific full-resolution-mask solution (PELC/DecFormer, arXiv:2512.05198) was NOT verified to deliver its claimed seam/edge improvements. + +**Evidence.** arXiv:2512.05198 argues linear latent blending is not pixel-equivalent and proposes decoder-equivariant compositing; PixPerfect (arXiv:2512.03247) does pixel-space refinement of chromatic shifts at edit boundaries. But the specific PELC full-resolution-mask and DecFormer "53% error reduction" claims were **refuted on adversarial vote (0-3 and 1-2)**. Treat pixel-equivalent latent compositing as an emerging idea to watch, not a production fix. + +**Sources.** https://arxiv.org/abs/2512.05198 · https://arxiv.org/abs/2512.03247 + +### Finding 3 — confidence: high + +**Claim.** Glyph/text-conditioned diffusion can re-render detected text as freshly synthesized pixels (not copied), which inherently scrubs any watermark in the text region while rendering glyphs crisply. AnyText/AnyText2 inject text-rendering into a pretrained T2I model and support generation AND editing of existing scene images; multilingual including CJK and English. + +**Evidence.** AnyText2 "enables precise control over multilingual text attributes in natural scene image generation and editing" (WriteNet+AttnX); +3.3% (Chinese) / +9.3% (English) accuracy over AnyText v1. AnyText "can be plugged into existing diffusion models ... for rendering or editing text" and synthesizes text latent features through diffusion (fresh pixels), supporting zh/en/ja/ko/ar/bn/hi. **Caveat:** both are SD1.5-based, so NOT a drop-in into the SDXL scrub (separate base model); AnyText's own limitation: "the inpainting manner ... impedes editing quality on small text," and it ranks weak on STRICT (EMNLP 2025) — small-text crispness not guaranteed. + +**Sources.** https://github.com/tyxsspa/AnyText2 · https://arxiv.org/abs/2411.15245 · https://arxiv.org/abs/2311.03054 + +### Finding 4 — confidence: high + +**Claim.** EasyText is a strong glyph-conditioned re-render candidate: built on the FLUX-dev DiT framework with LoRA tuning, renders compact per-character glyph patches (64px-high adaptive for alphabetic, 64x64 for logographic) concatenated in latent space, supports 10+ languages including Chinese, Japanese, Korean, Thai, Vietnamese, Greek, and Latin. + +**Evidence.** AAAI 2025 + arXiv:2505.24417: "implemented based on the open-source FLUX-dev framework with LoRA-based parameter-efficient tuning," VAE and text encoder frozen, two-stage 512->1024 training. Glyph conditioning via "64-pixel-high images ... adaptive widths for alphabetic; fixed 64x64 for logographic," VAE-encoded and concatenated with denoised latents, "less than one-tenth the spatial size of layout-matching methods." FLUX-based (16-channel VAE, DiT) also sidesteps the SDXL 4-channel wall. Fresh-pixel generation preserves the watermark-removal guarantee. Cyrillic/Arabic crispness not separately benchmarked. + +**Sources.** https://arxiv.org/html/2505.24417 · https://ojs.aaai.org/index.php/AAAI/article/view/37697 + +### Finding 5 — confidence: high + +**Claim.** A two-stage "global watermark scrub then text-restoration pass" architecture is validated by recent literature, and the restoration stage can synthesize glyph pixels from priors (no original-pixel reintroduction). TIGER reconstructs stroke geometry then injects it as guidance into full-image super-resolution; TextSR uses a detector + multilingual OCR to regenerate text from character-shape priors; TeReDiff/TAIR couples a jointly-trained text-spotter with diffusion. + +**Evidence.** TIGER (arXiv:2510.21590): "a diffusion-based local text refiner ... reconstructing fine-grained stroke geometry ... injected as conditional guidance into the subsequent full-image restoration." TextSR (arXiv:2505.23119, Google): "leverages a text detector ... then employs OCR to extract multilingual text," regenerating from "multilingual character-to-shape diffusion priors" that "produce character shapes solely based on text prompts, even without visual input" — fresh pixels. TAIR/TeReDiff (ICLR 2026): standard restoration "frequently generates plausible but incorrect textures"; TeReDiff feeds text-spotter outputs back as prompts. **Caveat:** TIGER orders text-first then global (reverse of scrub-then-text); these target degraded-input super-resolution, not watermark removal, so the SynthID-scrub of the restoration stage must be verified empirically (the stages are themselves diffusion-based, so fresh-pixel = no SynthID is plausible but unproven here). + +**Sources.** https://arxiv.org/html/2510.21590v1 · https://arxiv.org/html/2505.23119v1 · https://cvlab-kaist.github.io/TAIR/ · https://arxiv.org/abs/2506.09993 + +### Finding 6 — confidence: high + +**Claim.** Switching to a 16-channel VAE (SD3/FLUX class) materially reduces small-text/latent softening vs SDXL's 4-channel VAE, but it requires switching the base model — not a drop-in latent swap into an SDXL UNet img2img pipeline. RAE approaches are DiT-native and likewise not drop-in. + +**Evidence.** SD3/FLUX moved from 4-channel (48x) to 16-channel (12x) VAEs specifically to preserve fine detail (diffusers Discussion #8713; madebyollin VAE notes; arXiv:2305.02541). RAE (arXiv:2510.11690) "should be the new default for diffusion transformer training" but produces high-dimensional latents needing a DiT wide-DDT head — NOT compatible with an SDXL 4-channel UNet. EasyText shows the practical path: adopt a FLUX-DiT base rather than retrofit SDXL. The VAE upgrade couples to a base-model migration. + +**Sources.** https://arxiv.org/abs/2510.11690 · https://arxiv.org/pdf/2305.02541 · https://arxiv.org/html/2505.24417 + +## Recommendation + +Under the hard constraint, the correct architecture is **not "protect text during the +scrub" (Differential Diffusion)** but **"scrub everywhere, then restore text crisply by +regeneration"**: + +1. Global SDXL scrub with text protection OFF (text region is scrubbed too). +2. On detected text regions, a **glyph-conditioned restoration** that re-renders the same + glyphs as fresh pixels (no original reused). + +This is the only path that delivers both "watermark everywhere" and crisp text. + +**Top-2 to prototype:** +- **TextSR** — detector + multilingual OCR + character-shape diffusion priors; closest to + the existing detector-driven pipeline. +- **EasyText** — FLUX-DiT glyph re-render, multilingual incl. CJK; also gets the 16-channel + VAE for free. + +**Honest costs / unknowns:** this is a re-architecture, not a quick fix. It needs a new +**OCR-recognition** step (we currently only detect text; we must know *what* to re-render). +Models are FLUX/DiT-class (heavy) -> serverless GPU. Maturity is research-grade; CJK is +covered, Cyrillic/Arabic crispness is not separately benchmarked -> a prototype must +measure real fidelity. The restoration stage being diffusion-based makes "fresh pixels = +no SynthID" plausible but **must be verified empirically** (run the SynthID oracle on the +restored output). + +**Constraint-compatible quick win to try first:** run the global scrub at **higher +resolution / tiled** so strokes exceed the latent cell — less softening, full scrub, no +freezing. Cheap to test; quantify recall/quality vs cost. + +**Do not pursue:** raising `preserve` toward 1.0 or pixel paste-back (both leave original +watermarked pixels in text); PELC/DecFormer pixel-equivalent latent compositing (refuted, +not production-ready). + +## Provenance + +Deep-research workflow run `wf_118b9a03-3eb` (2026-05-29). Findings adversarially verified +(2/3 refutes required to kill a claim). This note records research only; no code change is +implied until a prototype validates fidelity and the SynthID-scrub guarantee on the +restored output. diff --git a/src/remove_ai_watermarks/assets/doubao_alpha.png b/src/remove_ai_watermarks/assets/doubao_alpha.png new file mode 100644 index 0000000000000000000000000000000000000000..53ffc33c03a1f83e1ad07ec293c7f41930572a3b GIT binary patch literal 8182 zcmZ8`cQ~8v8@9x#ks@a7NQzQ3)M^o=;U#9Yw6;d+q9|(bHb@bxHnr)Xs`joGTDzsS zYLvz(B}L6I`o6#8_`dI-=Xn0Q&-=W_d7amBN1Gb!on_`{rlFxZ`;R`#oQ4LVL;crd zq<&~D1HpPUG$PIaptLO>0lM0ml2g8iE&V##4&J@SU7C=0z7%Y>kXRZWz@@?w1sguk zg)V@_qRkY)sPB*{(j_ixLB%Bj;gBn$KMlDI5_vv2p=iVJ>z#$8bf3+~N7Z(+6@ zyrDSjyYZ@_HYQ&^L(wHz>({vYJ-R9=13{&QbmDSzR~+W;W9rq`tCP%RyZYx_+`TxQ zfgDkwnXNEv%MjPuW1`PRboOk=q1cjGkaVbS(rWzhYX5>d=bfw%0gqez^taDATzIA4vyGd}I@?G{YUJ7YWyKFHM{=RH|3Z6yQi{$zewwQ0)<-%#&ukquDj z4@cw}pCc*-%Ht!wZ+S=4@5>WICh=w5YO~xg?d*f7d`wE^R}p82+;kozBgap^x0Uw$ z3;Gx@HBBD7FSa~P)Sgh#td3!G-QQbh;x?&%e{6qXdTvzX9@%NAw?CfDtb_>5k_Unq zvEb0&fhZHK*Zzl}P|TIN6z)mR*z=N!EZ{Tp9tVvAd}Eezl1Ym7bv=(8YY=&)L^y8R zhCh2hs<{u?l{!0BA3a6!5WlHE`D`2TG_%1a1{9^T+)U4#YYj|RK02OJVYk<+F$7`Q zD@jbWItk+0EV;!?eWSiH{Ay`Z#OScBE5@v?f9BMEma7dP7mi8ROufwl7^S@3OU&SOw_D)$!X;;aQrEMMpzJVAx+@Q?(<5?DCKo`EhkHl*byEv@>UeNAWnsM5@Fg#<}X892cB^Q?Zvu%+;4F&5q$4T?0k zqVd~M*CMtPLdf9o`$kUCd1lchEIADu4j82@nKbGfYgC6mBk{qVp*Q5%5l=ICwfd0MxbXn9UbYe$L>dEI z5j?WTzvAr86pjcM>|xY;1G&w9=EMW=Y3xb*PsarF+mI#Cht|&blT!b<*G@ zkZjBEHiq)osxQv@Gs;8T#`xIPRJ8m=Xd9L&yWzLD8@JxK%4)JgwKlY-?#}qV|=OULK>QR z(k(C>fXU&Cl24_~3-V|AYNpYsf*FWtIx{3}pS|cAZ?n$yb<5UlB$oz?R*gg2DHM_~ zuk=n!{8*~sa(yqzZ=?Kc zIvfmb!+|345%k5mOCoalZiguw=MdiceK@B|FccHzG^*%}Z|xhl?^){zjVPjm7XLA0y!rgBm+OOR(j+GmipSBhq6i6PoK}(VW}p9Ddr>2n zrf%kpP?%t3XJC1-wWSq9Ju)d1einpqKn;X8jaQmO;eGF~eRQEYu%VO*F@dFH`6Vh2 zX=zYY?Fk=NKa)@(y?I>EIz(3?#D0g?jN@l~=d<+RUD9WExP> zq@M3Coy6E0rR%4Nc~9w2J6Z~w!j|*&(TvuiWKn;-m6b+(152?RFSj_{KaC2=*b5X4 zd{U~uR#1?UfkRaF^3gG%x{=E@v&}70yX#Z2jkJ7b!Zmj^X;puhyD>#b_ao|unkfoC z7wc2({^=21G0ymmdJ2X{lK9xsN}i6tFR#d^Rlxxu>iOve007MH5dIEjK{!||)sF%{ z1>uEB11>p%5xaLgcpGT>p8R`#=jFBYp=Kc2WM@Q2m&@|o7d1>21N9|S-=*rra1x)k ze1`=D6(ajGM{io}RW_4@dTOQ!y=9OZtueGcJMo6nSUVqPc(uA z8o#>l>tX58h3;%7I8Qgfpmu?^{QE#PxIVSRr=zg7Vrf#GG=N*aHjj>#S7Z+`mOzD2 zK0;ZvQmdG!Wp<0koLs@crO&@KsYlWy#D#V!_t#c7D|qe`Jv_DxI5kIvK)s}zsKKgO z(kmVa-Dxad^N(}0qtm1Sw;UdKWLTGG?xf2S6_~+5Z}6zDXd;sX@8{?b?UwO*1NiCf zp~et&LA=n7hk4yNE__wK@{tcq{pYgj=t@zcor>D37bf=CfW!+e4@N%>-ZPZ1-%Q$g z8(YJaN@7fmQya7O+X(<*#mXbgI33DCUCwOgzQ4m@D`{1|b$k8x zXk2*o(pjR~q5%C0VM2rL`c@Esma;bTh!)9c`KK|w)%wJZF;3I8`LTbTgu79a8cy;Xyt@6RtF&T08wgVg z`R&2GZq36TWwd(~bNqLbPzfrjQt~u<`RJ|D-*0O2EAXY{hP-fXW5lL*px&kbRW;^A4`WT;#XtnD62VAE5knFU=+HgyxU&$0Tw%21wWhq0Sbb-zOVyMO;s#4v%lPMF4DGXS#tl9 z+bvFX+Jm0AW6>WRXQi;B5hTc@M0#%8Au;uY4geX;MoY%@V@w&drxF99VKXOxju4IZ;PW-ed9 zwVA>)(REfYl@2!l-1joQi_woo~f zxSyS;{p_?jO z?DoMM-sj|-=*&XG+*vioyuKXx9k`$QKwH6lF4cRzf|lxrGRr5n+7SEcQwAwTLkdsY zjkMrbl@NB-RED0vXUb@j238$-7i5hHH0wuxTSE5C*X+CuvIC`4h@_g#cBF{6qa;Y z#BHxv3|UUnl}l6G1LW3CmSn)#REKK|dE0WT-CY(aqUo?mQF-Oh*|I}2W?EZ(Ry$;f zi>nj8c$G9F26CXx{^Wy0FH?vcSl`We@IDxPUSzf?U$xv-Dp5ihHmd&al0yB%i9UMD zc67gHR$r*iqxI+ZNqpYZ0I_s}O_TG*AS1VT7+EWUCHXpXi5{pkp+?TdROBE=8et4% z1fbb%t(v%GwMwu+r~dBDnjzi)7krM9ku7NPU`!}*9beTeRS8e&xCwU*3QVsS2B z4z~S8aypt+^!}P`Uhbf6n)!DP%a0{apUZ|97v~GK@+zq2EQ#e>j}3zg)x2IO=YI#S z{Am3miAW`1xbHv5qq@8O(aeF_U90l3FwCp-Y+0Fsa8M+G(BfF(`hR5^0UHR4j#-cu z$Vnu=Us$Kgk&lGc%O*26rQOx}DAHXVw#Rgh_*V-xa=?)#1Pg2S@ zhs0vb1iYrHobg+)R?Ae6N7zzGy`-urocKHM|6zOo;O+5=jakfoa`H4rw?(~^fsRyT z9jqrpRSIzlot~$Zx3hrvqclJkpPUCu7s-9&u@LrptyZL#i}404MgH_IR$eTdz9Q2^ z#)Z_}Su`6_Z7Z{^>K-i3j&n#zLyX)68R+Il-BPIObxK}93z6%FX zH<9UWb?yWE!(UY(U=iEP{P@$@bIx_CQ53hU_1i*2UBl?DpmdPRhG&{+`=sJj6s@D@ z4W~5#+*xChBy(z!{Db(SCr1JxSm~I(?6V9NZ*RQOV z*u`DWzsxz46`(!5NSVmBJDdz!70&(&LYxgsKlQ;zR6 z^PK-limXtLCm?9emn7law^;2(LfnaKeaz3Ra^$j~h8U(!z7d5dEygVk1Xl+slvdYA z75BpUn!!{$AO=oUe$_&+>fgP|1%I*GYc91j4MT_=U5ge*gSutnsp>^#TZA5Z-{{MJfQ^!Lc4>Ixsj6 z>)!5<7ygUxOk^qg7hOau>Hp_UXwAQnmK7two2-ihs@OkTGd2T{=!q1WP15y45mYm? zul{Fc%%T^crES58T_~2B{h}{onSFypO#`+@TiZa>ye4cYu z)1q5n!LKR%Lt#x~>0g)lbf@*!iKo03I~|0b9l#>3a=&50nZMeo5(MM?JXmy^1%9|E z%mrz|gq>TGp(bLu_XVpFsVnu8WgLuSM07c~u7IYl?$+G3Zwc;7C5t1?!c$6fcu$L} zYl{PQ(qy!FXc>nZbvb&Knmbt1KJkSry%Jh~eB_y@4{}DF2#cqdA?d3@KJ2s>pT^bK zI7EO&88C7zETE|M*qq|_#cR}ze3hpbV*SWOzKEF^CMvUbs+x!{xFur61L|1h zGsxgbUC4f{_Ezwi54UyD(RPH&Lo@9D1njGBJ{A>USX@8&Y_8zH^2-*WOWi#@U5w?3 zEZ|s-m4rXx9xDvb2m>NSVO(N(y2d7A@L%pn5J#@Rz)4>lIRE7x5g&MyEmI0EeMkBZ zYVOsBx2a0~C(-aq(;l1sKY^M<`JqN~k6ib4Po$PZSjDaid4A`A%b3dVJ|Y zQJ=PZg@S+E-Okgtk~miXYbf>=I?i`4{<`w7&KGRiz2F&gnEDRDI-|0wbuFt&QT4vh zDF|a^xu2#R@&qJ0|IOk2J&}+q?FlKZ9rL$Oq3Wtl2U0;oUbaq%+uJYL4&!2mvGE(s z5at)l#7o~s;C)KWtDW{PRGVa=LqPxuNaNtI#>L)u{wa3)G5E8%{ljQJUnrQHgZQvQ zVx#!=sl^3&PxXGgl_mVN@~Yh)R|KDmOf+L7*^;dV-Bns9u}-7yXmr5i$!k{!Pk7UY zoTE3Zt$;$vM*9#xE_{q5XCs6<$r4f7U7}=rQIA#h;`$MC(NW%x<)#2A|r`a|#JGEW-GSNd>{nZc`lS-yjubD|tyZiY4a3YO>kT$f3C%NxnwohtXrJNgdDrz!2r zm!jLpyI1Uq)_b+2ar7Q~eB7#a-7La3a$rTVj-EiG0{0v8pUEGmwAG&Ty5k-w8k-Q? zrE29|Td7E2;^U1X^p zS->h9r6h_RCj9M?#NP9^hs&L;4Df{6)xeaPBR#m2expy5K>@AH(=X|egeYl$CIPM} zQs;aseZ@$@{qQo5m9+kEx8e-{gSXcZ7b{T3{o*J`+diC+i)4(t7+d4{>G}rOSqRI6 zi8u4RG z>9427bmX?|8X_tC-DZ;9BTLq^g*{g|N7&2G&on*$R}s`w+-(OByypCzr7UK)e;s`3 za(21miJp7YMWMO?3#R*N@2NeL6An9Fo!Sp;+_Q8;2L6{jZXs8BvW22b5q@_NSOWU*kDki(z-6tB&-=6+boRuk*YGq{`(w{wD|Tkr#+ zM%5qtYPRE8v{QA*a!-2QK`AudR(?^iBBI)^QPrX+fJjPBYM-f4}E+hE2~ zm8P9w;%=A_MLK`~x;HP>Kj>$;{!_P*IN_6D39eGi71PJ!M}+L7Ig3Ru_pfaSp55Pz zrC4oG2G{N^UXe(qiP)TX9>e57bm>GHSKXZ+jG8F&rF$MQGkC0hXf}tBq*`tqXiq z2r>CM@nSd5;(W93T_}vRmroni$*vs}iLZ4p9{;|)rVr}JFJmygC<_wE=@;> zK4zru-Lt4^#~v=DR_c@}Avzq@3{vNYaFhT~;g8{FX!+ROq< zNW?QK$j;(Vu{DUJEMJbIEBVf*%q+RUY{czI+WHAhg%URm=*A;1F@oU!X?KwH%7RcI zh$$3LJ%?d)EJt;RHroApW;7=F;LF@}SQg8M#-22->dp;-vZ!+F@$Q(-#GUjkYH;$$ z!>BJ2XJAaKu(ofYbQ=0QKvh_h>mQA=>Kyy(u)1ZrB_O;c>&D|TWZo2|qvO>F$cYUlP>IuU$mLvG;U!Ky2ZTU6hXym^K=^Wf8tmGgzWv&DdEoS35$A%0SnD+!6GZYw}IqB#Hd=;3(j9|~bTzoYA&ib5WFq4o>{ zas11?T=w2Y9=_+lXP=Mn^*7S#v(qEc4hS6()NeOe+3eyfmKZai#6}}wEO%p3+G%vf zPkv$THRVrA>-Q1Gc1YILZ!e|om&OV-@O1u&AI@SMQ-5IG@RFQ5MN?-52wZG+X=EV- z>k}g-_eX)2Kg7?{31MLUc?$C>>Qy%kKIXyMJdib$f=YlgI}+M(kw4@(64y`aqRT6U zCQ~4h!F!Ak=DCd)K``Z4IcIpYPcpep06jJm1had7hoVZWAB=~efKTE!cbZqrGCbRN zpdX;pcXalX^h(vfH3WfBg+sryXXW$ajRc5~^}A)XURqDG{OUq-y7r*hYqrCfqbJ2%IHVNtoFxWZwsP~piNtBYu)4VKRMD9#HZIUl+{&o zUkE33G-Q&!cdD~tH_K79!%$}0Fx`uKMV?%PDH6)ZF}bt(os|ryF0S z_kr9g&0oP$wHvl3A3wbHrIU}4M~B8{o+xS*=VXSSo)C+dnOE-Cq$Bqg5{n4wDNd2O5Gl6z)*>`9`KrfZ{8mO>~e(S0Og<_y1 zx}oI$D6QGwux8ERC54ZY+{!>ltRt$z>XGlyyA;)ptmjt*&Nu%m__X$Ucv(m_GHW(y z-%%9gLsjsUjF>2Iz z7RO$one2d)K6#v>AcJ7A{^hHbwwJD9R0s<#LjdXW2XOPb!a;EvEXG2QUs6;IgH7C~ z)Znx{4o%1QGx@{xE;Uw&&xki6L@DZf(i(LEyFh(#k(P&of>Vys+zE_kNG^8TbbWCY zTu|lL$=6Z`R5HfcC-gjyE?*}G2g_3_n>0DcBS7{DeN*} bool: - """Run the Doubao text-strip removal path when it is the selected mark. - - Returns True when this path handled the image (caller should stop). In - ``auto`` mode the Doubao detector competes with the Gemini detector and wins - only when it is both positive and at least as confident. - """ - from remove_ai_watermarks.doubao_engine import DoubaoEngine - - doubao = DoubaoEngine() - d_det = doubao.detect(image) - - if mark == "auto": - g_det = gemini_engine.detect_watermark(image) - use_doubao = d_det.detected and d_det.confidence >= g_det.confidence - console.print( - f" [dim]Mark auto:[/] gemini={g_det.confidence:.2f} doubao={d_det.confidence:.2f} " - f"-> {'doubao' if use_doubao else 'gemini'}" - ) - else: - use_doubao = mark == "doubao" - - if not use_doubao: - return False - - if detect and not d_det.detected and d_det.confidence < detect_threshold: - console.print( - f" [yellow]⚠[/] Doubao mark not detected [dim](coverage {d_det.coverage:.1%}). " - f"Use --no-detect to force.[/]" - ) - raise SystemExit(0) - - method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea" - t0 = time.monotonic() - with console.status("[cyan]Removing Doubao watermark…[/]"): - result = doubao.remove_watermark(image, inpaint_method=method) - elapsed = time.monotonic() - t0 - - output.parent.mkdir(parents=True, exist_ok=True) - _write_bgr_with_alpha(output, result, alpha, clear_region=d_det.region) - - if strip_metadata: - try: - from remove_ai_watermarks.metadata import remove_ai_metadata - - remove_ai_metadata(output, output) - except Exception as e: - if ctx.obj.get("verbose"): - console.print(f" [yellow]⚠[/] Failed to strip metadata: {e}") - - size_kb = output.stat().st_size / 1024 - console.print(f" [green]✓[/] Doubao mark removed → {output} [dim]({size_kb:.0f} KB, {elapsed:.2f}s)[/]") - return True - - # ── Main group ─────────────────────────────────────────────────────── @@ -238,9 +172,10 @@ def main(ctx: click.Context, verbose: bool) -> None: @click.option("--detect-threshold", type=float, default=0.25, help="Detection confidence threshold.") @click.option( "--mark", - type=click.Choice(["auto", "gemini", "doubao"]), + type=click.Choice(["auto", *watermark_registry.mark_keys()]), default="auto", - help="Which visible mark to target. auto picks the stronger of the two detectors.", + help="Which known visible mark to target (auto picks the strongest detected). " + "All marks are removed by exact reverse-alpha against a captured alpha map.", ) @click.option("--strip-metadata/--keep-metadata", default=True, help="Strip AI metadata from output.") @click.pass_context @@ -256,13 +191,14 @@ def cmd_visible( mark: str, strip_metadata: bool, ) -> None: - """Remove a visible AI watermark from an image. + """Remove a known visible AI watermark from an image. - Targets the Gemini sparkle logo (reverse alpha blending) or the Doubao - "豆包AI生成" text strip (locate -> mask -> inpaint). Fast, deterministic, - offline. ``--mark auto`` picks whichever detector fires stronger. + Finds a known mark in its usual place (Gemini sparkle / Doubao text) via the + watermark registry and removes it by exact reverse-alpha against a captured + alpha map -- recovering the true pixels, not an inpaint guess. ``--mark auto`` + picks the strongest detected mark. For arbitrary logos/objects, use ``erase``. """ - from remove_ai_watermarks.gemini_engine import GeminiEngine + from remove_ai_watermarks import watermark_registry as registry _banner() source = _validate_image(source) @@ -270,8 +206,6 @@ def cmd_visible( if output is None: output = source.with_stem(source.stem + "_clean") - engine = GeminiEngine() - # Load image (preserving any alpha channel separately) image, alpha = _read_bgr_and_alpha(source) if image is None: @@ -281,45 +215,44 @@ def cmd_visible( h, w = image.shape[:2] console.print(f" [dim]Input:[/] {source.name} ({w}x{h})") - # Resolve which visible mark to target, then run the Doubao path if chosen. - if _run_doubao_if_selected( - ctx, image, alpha, output, mark, engine, detect, detect_threshold, inpaint_method, strip_metadata - ): - return - - # Detection (we always detect softly, to find dynamic region for inpainting) - with console.status("[cyan]Detecting watermark…[/]"): - det = engine.detect_watermark(image) - - if detect: - if det.detected: - console.print( - f" [green]✓[/] Watermark detected " - f"[dim](confidence: {det.confidence:.1%}, " - f"spatial: {det.spatial_score:.3f}, " - f"gradient: {det.gradient_score:.3f})[/]" - ) - else: - console.print(f" [yellow]⚠[/] Watermark not detected [dim](confidence: {det.confidence:.1%})[/]") - if det.confidence < detect_threshold: - console.print(" [dim]Skipping. Use --no-detect to force removal.[/]") + # Resolve the target mark from the known-watermark registry. ``auto`` scans + # every in-auto mark in its usual place and picks the strongest; an explicit + # ``--mark `` targets that one (the user asserts its presence). + if mark == "auto": + best = registry.best_auto_mark(image) + if best is None: + console.print(" [yellow]⚠[/] No known visible mark detected (gemini / doubao).") + if detect: + console.print(" [dim]Skipping. Use --mark --no-detect to force.[/]") raise SystemExit(0) + target = "gemini" # forced (no-detect): fall back to the default mark + else: + target = best.key + console.print(f" [dim]Mark auto:[/] {best.label} [dim]({best.location}, conf {best.confidence:.2f})[/]") + else: + target = mark - # Removal + chosen = registry.get_mark(target) + det = chosen.detect(image) + if detect and not det.detected: + console.print( + f" [yellow]⚠[/] {chosen.label} not detected " + f"[dim](conf {det.confidence:.2f}). Use --no-detect to force.[/]" + ) + raise SystemExit(0) + if det.detected: + console.print(f" [green]✓[/] {chosen.label} detected [dim]({chosen.location}, conf {det.confidence:.2f})[/]") + + method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea" t0 = time.monotonic() - region: tuple[int, int, int, int] | None = None - with console.status("[cyan]Removing watermark…[/]"): - result = engine.remove_watermark(image) - - if inpaint: - region = _watermark_region(det, w, h) - result = engine.inpaint_residual( - result, - region, - strength=inpaint_strength, - method=inpaint_method, - ) - + with console.status(f"[cyan]Removing {chosen.label}… ({chosen.recovery})[/]"): + result, region = chosen.remove( + image, + inpaint_method=method, + inpaint=inpaint, + inpaint_strength=inpaint_strength, + force=not detect, + ) elapsed = time.monotonic() - t0 # Save (preserves transparency by clearing alpha in the watermark region) diff --git a/src/remove_ai_watermarks/doubao_engine.py b/src/remove_ai_watermarks/doubao_engine.py index f4d8b21..e80a5ca 100644 --- a/src/remove_ai_watermarks/doubao_engine.py +++ b/src/remove_ai_watermarks/doubao_engine.py @@ -1,29 +1,24 @@ """Doubao visible watermark removal engine. Doubao (ByteDance) stamps every generated image with a visible "豆包AI生成" -(Doubao AI generated) text strip in the bottom-right corner. This is the -explicit AIGC label mandated by China's TC260 standard, rendered as a -near-white / light-gray, low-saturation text overlay. +(Doubao AI generated) text strip in the bottom-right corner -- the explicit AIGC +label mandated by China's TC260 standard, a near-white semi-transparent overlay. -Unlike the Gemini sparkle (a fixed square logo removed by reverse alpha -blending against a captured alpha map), the Doubao mark is a text strip whose -exact alpha map we do not yet have. This engine therefore removes it by: +Like the Gemini sparkle, it is a fixed overlay, so it is removed by **exact +reverse-alpha blending** against a captured alpha map (``remove_watermark_reverse_alpha``): +``original = (wm - a*logo)/(1-a)`` -- recovering the true pixels, not an inpaint +guess. The alpha map + logo colour were solved from black+gray Doubao captures +(see data/doubao_capture/ and the reverse-alpha section below) and bundled as +``assets/doubao_alpha.png``. - locate -> mask -> inpaint +Detection (``detect``) is reverse-alpha-consistent: it matches that same alpha +glyph silhouette against the corner via normalized correlation, so it keys on +the actual "豆包AI生成" shape rather than coverage/structure heuristics. -1. Locate: the mark scales with image WIDTH and sits in the bottom-right at a - fixed margin, so we anchor a generous box there (geometry only -- no bundled - template). Constants below are derived from measured Doubao output. -2. Mask: within the box, extract the light, low-saturation glyph pixels with a - polarity-aware rule (the mark is brighter than dark backgrounds and a - distinct off-white gray against light backgrounds). -3. Inpaint: cv2 inpainting (TELEA / NS) reconstructs the covered pixels. - -This is fast, offline, deterministic, and needs no GPU. A future upgrade path -is per-pixel reverse alpha blending once a Doubao alpha map is captured on a -controlled black background (see data/doubao_capture/), which would recover the -true pixels instead of hallucinating them -- the same approach as the Gemini -engine. +``locate`` (geometry box, scales with image WIDTH) and ``extract_mask`` (the +candidate glyph mask the detector correlates) remain; there is no inpaint-based +removal here -- arbitrary-region inpainting lives in ``region_eraser`` / the +``erase`` command. Fast, offline, no GPU. """ # cv2/numpy boundary: third-party libs ship no usable element types; relax the @@ -33,7 +28,7 @@ from __future__ import annotations import logging from dataclasses import dataclass -from typing import TYPE_CHECKING, Any, Literal +from typing import TYPE_CHECKING, Any import cv2 import numpy as np @@ -66,17 +61,63 @@ MAX_SATURATION = 55 # max channel spread to count a pixel as "grayish" LOGO_MIN_LUMA = 150 # glyphs are at least this bright in absolute terms TOPHAT_DELTA = 12 # glyph must exceed the local background by this many levels -# Detection: a genuine label fills a meaningful fraction of the box. Measured -# coverage is >=0.20 on real Doubao outputs; random/textured corners stay <=0.06 -# on large images but can spike to ~0.15 on tiny ones (small box -> high variance), -# so the threshold sits above that spike and below the real-mark floor. -DETECT_MIN_COVERAGE = 0.16 +# Detection is reverse-alpha-consistent: the mark is recognized by matching the +# bundled alpha-template glyph silhouette (assets/doubao_alpha.png -- the exact +# shape we invert) against the extracted candidate mask via zero-mean normalized +# correlation (cv2 TM_CCOEFF_NORMED). It keys on the actual "豆包AI生成" glyph +# SHAPE, not on coverage/structure heuristics, so a merely-textured corner does +# not fire (the old coverage detector false-positived on ~28% of images; #23). +# Corpus-tuned: real marks score median ~0.61, arbitrary corners <=0.17 (p99); +# threshold 0.4 -> false positives 7/1243 (0.6%). A small coverage floor skips +# the template match on a near-empty candidate box. +DETECT_MIN_COVERAGE = 0.04 +DETECT_NCC_THRESHOLD = 0.4 -# Safety: a text strip fills a modest slice of the (generous) box. When the box -# is over a dense-text / document background the mask explodes and cv2 inpainting -# would smear the real content. Above this coverage we refuse to inpaint and -# leave the image untouched -- that hard case needs the neural path, not a guess. -MAX_INPAINT_COVERAGE = 0.50 +# ── Reverse-alpha (exact recovery, Gemini-style) ───────────────────── +# The Doubao mark is a fixed semi-transparent white overlay, so given its alpha +# map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a). +# The alpha map + logo colour were solved from black+gray Doubao captures on a +# controlled background (data/doubao_capture/): on black, captured = a*logo, and +# the black/gray pair solves a per-pixel WITHOUT assuming the logo colour. The +# bundled asset (assets/doubao_alpha.png) is the alpha template (a*255) at the +# captured width. The mark scales with image WIDTH, but a pure width-scale is +# only sub-pixel-accurate at the captured width and ghosts elsewhere, so removal +# does NOT trust fixed geometry: `_aligned_alpha_map` registers the template to +# the actual mark by a TM_CCOEFF_NORMED scale+position search, which makes the +# single capture work at any resolution (verified clean on 1773x2364). Verified +# 2026-05-29: white-capture cross-check -> mark vanishes to a flat fill; clean on +# doubao-1.png (2048) and the 3:4 portrait corpus size. +_ALPHA_NATIVE_WIDTH = 2048 +_ALPHA_LOGO_BGR: tuple[float, float, float] = (252.0, 255.0, 255.0) +_ALPHA_WIDTH_FRAC = 0.1572 # glyph width / image width -- the alignment scale seed +_ALPHA_HEIGHT_FRAC = 0.0347 +# Margins (of image WIDTH) of the captured mark -- the geometry record / where to +# seed; alignment refines the actual position, so these are not load-bearing. +_ALPHA_MARGIN_RIGHT_FRAC = 0.0166 +_ALPHA_MARGIN_BOTTOM_FRAC = 0.0195 +# Alignment scale search (np.linspace args) around the width-scaled glyph size. +_ALPHA_ALIGN_SEARCH = (0.88, 1.12, 13) +# At (near) the captured width the fixed geometry is pixel-exact, so we use it +# directly there -- NCC alignment is integer-pixel and would land ~1px off, +# degrading the otherwise-exact native recovery. Off this band, alignment wins. +_ALPHA_NATIVE_BAND = 0.03 +_alpha_template_cache: NDArray[Any] | None = None + + +def _alpha_template() -> NDArray[Any] | None: + """Lazily load the bundled Doubao alpha template (float [0,1]), or None.""" + global _alpha_template_cache + if _alpha_template_cache is None: + from pathlib import Path + + from remove_ai_watermarks import image_io + + path = Path(__file__).parent / "assets" / "doubao_alpha.png" + img = image_io.imread(str(path), cv2.IMREAD_GRAYSCALE) + if img is None: + return None + _alpha_template_cache = img.astype(np.float32) / 255.0 + return _alpha_template_cache @dataclass(frozen=True) @@ -104,6 +145,39 @@ class DoubaoDetection: coverage: float = 0.0 # fraction of the box occupied by glyph pixels +_silhouette_cache: NDArray[Any] | None = None + + +def _glyph_silhouette() -> NDArray[Any] | None: + """Binary "豆包AI生成" silhouette (255 = glyph) from the bundled alpha map, + used as the detection template. None if the alpha asset is missing.""" + global _silhouette_cache + if _silhouette_cache is None: + at = _alpha_template() + if at is None: + return None + _silhouette_cache = (at > 0.15).astype(np.uint8) * 255 + return _silhouette_cache + + +def _template_match_score(box_mask: NDArray[Any], image_width: int) -> float: + """Zero-mean normalized correlation of the alpha-template glyph silhouette + (scaled to the mark's expected size) against the candidate ``box_mask``. + + TM_CCOEFF_NORMED keys on glyph SHAPE, not coverage, so a dense textured + corner does not score highly -- only the actual "豆包AI生成" shape does. + """ + sil = _glyph_silhouette() + if sil is None or box_mask.size == 0: + return 0.0 + gw = min(box_mask.shape[1] - 1, max(8, int(_ALPHA_WIDTH_FRAC * image_width))) + gh = min(box_mask.shape[0] - 1, max(4, int(_ALPHA_HEIGHT_FRAC * image_width))) + if gw < 8 or gh < 4: + return 0.0 + template = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST) + return float(cv2.matchTemplate(box_mask, template, cv2.TM_CCOEFF_NORMED).max()) + + class DoubaoEngine: """Remove the visible Doubao "豆包AI生成" watermark (locate -> mask -> inpaint).""" @@ -176,10 +250,12 @@ class DoubaoEngine: # ── Detect ──────────────────────────────────────────────────────── def detect(self, image: NDArray[Any]) -> DoubaoDetection: - """Detect the visible Doubao mark by glyph coverage in the corner box. + """Detect the visible Doubao mark by matching the alpha-template glyph + silhouette against the corner candidate (TM_CCOEFF_NORMED). - Heuristic: a genuine label fills a meaningful fraction of the box with - text-like glyph pixels. Coverage maps to a confidence score. + Keys on the "豆包AI生成" SHAPE, not coverage, so a textured corner does + not fire. ``confidence`` is the correlation score; ``detected`` is it + clearing ``DETECT_NCC_THRESHOLD``. """ det = DoubaoDetection() if image is None or image.size == 0: @@ -191,53 +267,113 @@ class DoubaoEngine: coverage = float((box > 0).sum()) / float(max(1, bw * bh)) det.region = loc.bbox det.coverage = coverage - # Map coverage to a 0-1 confidence: ~0.06 (noise floor) -> 0, ~0.26 -> 1. - det.confidence = float(max(0.0, min(1.0, (coverage - 0.06) / 0.20))) - det.detected = coverage >= DETECT_MIN_COVERAGE - logger.debug("Doubao detect: coverage=%.3f conf=%.3f", coverage, det.confidence) + if coverage >= DETECT_MIN_COVERAGE: + score = _template_match_score(box, image.shape[1]) + det.confidence = score + det.detected = score >= DETECT_NCC_THRESHOLD + logger.debug("Doubao detect: coverage=%.3f ncc=%.2f detected=%s", coverage, score, det.detected) return det - # ── Remove ──────────────────────────────────────────────────────── + # ── Reverse-alpha (exact recovery) ──────────────────────────────── - def remove_watermark( - self, - image: NDArray[Any], - *, - inpaint_method: Literal["telea", "ns"] = "telea", - inpaint_radius: int = 6, - dilate: int = 3, - ) -> NDArray[Any]: - """Remove the visible Doubao watermark by inpainting the glyph mask. + def reverse_alpha_available(self, image: NDArray[Any]) -> bool: + """True if the bundled alpha map is loadable. Sub-pixel NCC alignment + (see ``_aligned_alpha_map``) places it on the actual mark at ANY + resolution, so there is no width gate -- the caller still gates on + ``detect`` so a clean corner is never touched.""" + return image is not None and image.size > 0 and _alpha_template() is not None - Returns an unmodified copy when no glyph pixels are found (so we never - smear a clean corner). ``dilate`` grows the mask to cover anti-aliased - glyph edges before inpainting. - """ - if image is None or image.size == 0: - return image + def _fixed_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None: + """Place the template by fixed width-relative geometry -- pixel-exact at + the captured width (used there instead of integer-pixel NCC alignment).""" + at = _alpha_template() + if at is None: + return None + h, w = image.shape[:2] + gw, gh = max(1, int(_ALPHA_WIDTH_FRAC * w)), max(1, int(_ALPHA_HEIGHT_FRAC * w)) + ax = max(0, w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw) + ay = max(0, h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh) + amap = np.zeros((h, w), np.float32) + amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR) + return amap, (ax, ay, gw, gh) + + def _aligned_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None: + """Build a full-image alpha map with the captured template registered to + the actual mark via a TM_CCOEFF_NORMED scale + position search -- so the + single capture works off the captured width (a pure width-scale ghosts). + Returns ``(alpha_map, glyph_bbox)`` or None.""" + at = _alpha_template() + sil = _glyph_silhouette() + if at is None or sil is None: + return None + h, w = image.shape[:2] loc = self.locate(image) - mask = self.extract_mask(image, loc) - if not mask.any(): - logger.debug("Doubao remove: no glyph pixels found; returning copy") + bx, by, bw, bh = loc.bbox + box_mask = self.extract_mask(image, loc)[by : by + bh, bx : bx + bw] + expected = _ALPHA_WIDTH_FRAC * w + best: tuple[float, int, int, int, int] | None = None + for scale in np.linspace(*_ALPHA_ALIGN_SEARCH): + gw, gh = int(expected * scale), int(_ALPHA_HEIGHT_FRAC * w * scale) + if gw < 8 or gh < 4 or gw >= bw or gh >= bh: + continue + t = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST) + _, score, _, top_left = cv2.minMaxLoc(cv2.matchTemplate(box_mask, t, cv2.TM_CCOEFF_NORMED)) + if best is None or score > best[0]: + best = (score, gw, gh, top_left[0], top_left[1]) + if best is None: + return None + _, gw, gh, ox, oy = best + ax, ay = bx + ox, by + oy + amap = np.zeros((h, w), np.float32) + amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR) + return amap, (ax, ay, gw, gh) + + def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]: + """Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``.""" + a3 = np.clip(amap, 0.0, 1.0)[:, :, None] + logo = np.array(_ALPHA_LOGO_BGR, np.float32) + return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8) + + def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]: + """Recover the original pixels by inverting the alpha blend + ``original = (wm - a*logo)/(1-a)``. + + Placement: at (near) the captured width the fixed geometry is pixel-exact, + so the recovery is returned UNTOUCHED -- inpainting over exactly-recovered + interior pixels only swaps them for a cv2 hallucination (measured worse on + textured backgrounds: native error vs true bg 1.6 reverse-alpha-only vs + 2.6 with full-footprint inpaint). Off-native, NCC alignment registers the + template to the real mark; the alignment is only sub-pixel-approximate, so + the interior recovery is no longer exact and the seam can re-trip the + detector. There we try BOTH placements and keep whichever leaves the least + residual mark (on a faint/busy-background mark the NCC peak can wander a + few px, where geometry wins; on a clear mark alignment wins) -- no magic + threshold, it just picks the better removal -- then a residual inpaint over + the glyph footprint cleans the seam (the interior is approximate anyway, so + inpaint there costs nothing and reliably clears the mark). + Call only when :meth:`reverse_alpha_available` and the mark is detected. + """ + at_native = abs(image.shape[1] / _ALPHA_NATIVE_WIDTH - 1.0) <= _ALPHA_NATIVE_BAND + if at_native: + amap = self._fixed_alpha_map(image) + return self._apply_reverse_alpha(image, amap[0]) if amap is not None else image.copy() + maps = [c for c in (self._fixed_alpha_map(image), self._aligned_alpha_map(image)) if c is not None] + if not maps: return image.copy() - - x, y, bw, bh = loc.bbox - coverage = float((mask[y : y + bh, x : x + bw] > 0).sum()) / float(max(1, bw * bh)) - if coverage > MAX_INPAINT_COVERAGE: - logger.warning( - "Doubao remove: box coverage %.2f exceeds %.2f (dense-text/document " - "background); leaving image untouched to avoid smearing content", - coverage, - MAX_INPAINT_COVERAGE, - ) + best_out: NDArray[Any] | None = None + best_amap: NDArray[Any] | None = None + best_residual = float("inf") + for amap, _region in maps: + out = self._apply_reverse_alpha(image, amap) + residual = self.detect(out).confidence + if residual < best_residual: + best_residual, best_out, best_amap = residual, out, amap + if best_out is None or best_amap is None: # pragma: no cover - maps is non-empty return image.copy() - - if dilate > 0: - k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2 * dilate + 1, 2 * dilate + 1)) - mask = cv2.dilate(mask, k) - - flag = cv2.INPAINT_TELEA if inpaint_method == "telea" else cv2.INPAINT_NS - return cv2.inpaint(image, mask, inpaint_radius, flag) + if residual_inpaint: + rm = cv2.dilate((best_amap > 0.10).astype(np.uint8) * 255, np.ones((3, 3), np.uint8)) + best_out = cv2.inpaint(best_out, rm, 3, cv2.INPAINT_TELEA) + return best_out def load_image_bgr(path: str | Path) -> NDArray[Any]: diff --git a/src/remove_ai_watermarks/identify.py b/src/remove_ai_watermarks/identify.py index 51cc055..2b54ba6 100644 --- a/src/remove_ai_watermarks/identify.py +++ b/src/remove_ai_watermarks/identify.py @@ -25,14 +25,15 @@ from typing import TYPE_CHECKING from remove_ai_watermarks.metadata import ( AI_METADATA_KEYS, AIGC_MARKERS, - C2PA_UUID, IPTC_AI_FIELD_MARKERS, IPTC_AI_MARKERS, aigc_label, + c2pa_marker_in, exif_generator, get_ai_metadata, huggingface_job, iptc_ai_system, + samsung_genai, scan_head, xai_signature, ) @@ -65,6 +66,8 @@ _ISSUER_PLATFORM: tuple[tuple[str, str], ...] = ( ("OpenAI", "OpenAI (ChatGPT / gpt-image / DALL-E / Sora)"), ("Google", "Google (Gemini / Imagen)"), ("Stability AI", "Stability AI (Stable Image / DreamStudio)"), + ("Black Forest Labs", "Black Forest Labs (FLUX)"), + ("ByteDance", "ByteDance (Doubao / Jimeng / Volcano Engine)"), ) # PNG-text / EXIF keys that indicate a local diffusion pipeline (vs. a hosted @@ -95,6 +98,12 @@ _HF_JOB_CAVEAT = ( "generation) but names neither the model nor the content type, so it is a " "medium-confidence signal, not proof the pixels are AI-generated." ) +_SAMSUNG_GENAI_CAVEAT = ( + "Samsung's genAIType marker shows a Galaxy AI editing tool (Generative Edit, " + "Sketch to Image, ...) touched the image; it is an undocumented proprietary " + "field, so it is a medium-confidence signal of AI editing, not proof the " + "whole image is AI-generated." +) @dataclass @@ -151,7 +160,9 @@ def _ai_tools_in(data: bytes) -> list[str]: # assert is_ai on their own (the verdict still comes from the digital-source-type: # the Pixel sample carries `computationalCapture`, not `trainedAlgorithmicMedia`). # Only tokens verified against a real signed file are listed (Leica, Nikon, -# Truepic, Google Pixel); add Sony/Canon/Samsung/Bria as real samples are captured. +# Sony, Truepic, Google Pixel); add Canon/Bria as real samples are captured. +# Samsung Galaxy is an AI-capable editing device, not a pure-capture camera, so +# it lives in `_SIGNER_C2PA_PLATFORM` below (it must not feed the camera clash). _DEVICE_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = ( (b"lc_c2pa", "Leica (camera, C2PA capture)"), (b"Leica Camera", "Leica (camera, C2PA capture)"), @@ -177,6 +188,32 @@ def _device_platform(head: bytes) -> str | None: return None +# C2PA signers that are an editing app or AI-capable device rather than a +# verified-capture camera. Unlike `_DEVICE_C2PA_PLATFORM`, these do NOT feed the +# camera-vs-AI integrity clash (rule 2 in `_integrity_clashes`): a Galaxy phone +# legitimately stamps BOTH its device credentials AND a `trainedAlgorithmicMedia` +# source type on a Generative-Edit image, so treating it as a "genuine camera +# capture" would false-flag every Galaxy AI edit. They only resolve the platform +# label; the AI verdict still comes from the digital-source-type / genAIType. +# Tokens verified against real signed files (2026-05-29): +# Samsung Galaxy -- cert org on Galaxy S23 FE / S24 / S25 C2PA JPEGs/PNGs +# (distinct from the EXIF "SM-xxxx" model string on ordinary Samsung photos). +# com.asus.gallery -- ASUS Gallery claim_generator (a C2PA-signed edit, no AI +# source type or genAIType on the samples, so it never asserts is_ai). +_SIGNER_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = ( + (b"Samsung Galaxy", "Samsung Galaxy (C2PA)"), + (b"com.asus.gallery", "ASUS Gallery (C2PA signer)"), +) + + +def _signer_platform(head: bytes) -> str | None: + """Map a C2PA editing-app / AI-capable-device signer token to a platform.""" + for token, platform in _SIGNER_C2PA_PLATFORM: + if token in head: + return platform + return None + + def _attribute_platform(issuers: list[str], *, is_ai: bool = True) -> str | None: """Map a set of C2PA issuer names to a human-readable generating platform. @@ -353,9 +390,10 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b # neither is a trustworthy "the generator stamped its identity" claim. ai_vendor_claims: dict[str, str] = {} camera_label = _device_platform(head) + signer_label = _signer_platform(head) # ── C2PA Content Credentials ──────────────────────────────────── - has_c2pa = bool(info) or b"c2pa" in head.lower() or C2PA_UUID in head + has_c2pa = bool(info) or c2pa_marker_in(head) issuers = [info["issuer"]] if info.get("issuer") else _issuers_in(head) c2pa_is_ai = "trainedAlgorithmicMedia" in info.get("source_type", "") or any( m in head for m in (b"trainedAlgorithmicMedia", b"compositeWithTrainedAlgorithmicMedia") @@ -370,10 +408,11 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b or (", ".join(tools) if (tools := _ai_tools_in(head)) else None) ) # Platform: a distinctive device/camera token in the manifest wins (it is the - # signer/producer), with the issuer byte-scan only as fallback. The issuer - # scan alone mis-attributed real samples (Leica->Truepic timestamp authority, - # Nikon->Adobe namespace, Pixel->Google Gemini) -- the device scan fixes that. - platform = (camera_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None + # signer/producer), then an editing-app/AI-device signer (Samsung Galaxy, + # ASUS Gallery), with the issuer byte-scan only as fallback. The issuer scan + # alone mis-attributed real samples (Leica->Truepic timestamp authority, + # Nikon->Adobe namespace, Pixel->Google Gemini) -- the token scans fix that. + platform = (camera_label or signer_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None if has_c2pa: detail = ", ".join(filter(None, [", ".join(issuers), generator, info.get("source_type")])) signals.append(Signal("c2pa", detail or "C2PA manifest present", "high")) @@ -484,6 +523,22 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b if platform is None: platform = "HuggingFace-hosted job (model not identified)" + # ── Samsung Galaxy AI editing marker (genAIType) ───────────────── + # Galaxy AI tools stamp a proprietary genAIType in PhotoEditor_Re_Edit_Data. + # Medium confidence: it co-occurs with the C2PA trainedAlgorithmicMedia type + # on Galaxy files that record one, and is the SOLE AI marker on a Galaxy S24 + # sample that omits the source type -- so it lifts an otherwise-Unknown + # verdict, but the field is undocumented, so it never overrides a high- + # confidence signal. The platform is usually already "Samsung Galaxy" via the + # signer-token scan; the fallback covers a future file without the cert org. + samsung_genai_type = samsung_genai(image_path) + if samsung_genai_type is not None: + signals.append(Signal("samsung_genai", f"Samsung genAIType={samsung_genai_type}", "medium")) + watermarks.append("Samsung Galaxy AI editing marker (genAIType)") + caveats.append(_SAMSUNG_GENAI_CAVEAT) + if platform is None: + platform = "Samsung Galaxy (Galaxy AI editing)" + # ── Open invisible watermark (SD / SDXL / FLUX, dwtDct) ────────── # Public decoder, no key -- a definitive embedded signal on pristine files. if check_invisible and (scheme := _invisible_watermark(image_path)) is not None: @@ -527,11 +582,12 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b visible_only = any(s.name == "visible_sparkle" for s in signals) and not ai_from_metadata hf_only = bool(hf_job) and not ai_from_metadata + samsung_only = samsung_genai_type is not None and not ai_from_metadata if ai_from_metadata: is_ai: bool | None = True confidence = "high" - elif visible_only or hf_only: + elif visible_only or hf_only or samsung_only: is_ai = True confidence = "medium" else: diff --git a/src/remove_ai_watermarks/metadata.py b/src/remove_ai_watermarks/metadata.py index 9f678c3..83ffe16 100644 --- a/src/remove_ai_watermarks/metadata.py +++ b/src/remove_ai_watermarks/metadata.py @@ -65,6 +65,22 @@ AI_KEYWORDS: tuple[str, ...] = ( # Reference: https://spec.c2pa.org/specifications/specifications/2.1/specs/C2PA_Specification.html C2PA_UUID: bytes = bytes.fromhex("d8fec3d61b0e483c92975828877ec481") + +def c2pa_marker_in(data: bytes) -> bool: + """True if ``data`` carries a real C2PA manifest marker, not just an + incidental 4-byte ``c2pa`` substring. + + A bare ``c2pa`` byte match false-positives on compressed pixel data -- a + recompressed PNG IDAT (or any large binary) can contain the bytes ``c2pa`` + by chance (verified 2026-05-29: 4 cleaned PNGs re-flagged this way after + their manifest was correctly stripped). Every real manifest is JUMBF-wrapped + (the ``jumb`` box FourCC accompanies the ``c2pa`` content type) or uses the + standalone C2PA ``uuid`` box in ISOBMFF, so we require one of those: the + joint ``jumb`` + ``c2pa`` match has negligible random-collision probability. + """ + return C2PA_UUID in data or (b"jumb" in data and b"c2pa" in data.lower()) + + # IPTC ``digitalSourceType`` values (IPTC 2025.1) that flag AI provenance. # Used by Instagram, Facebook, X (Twitter) to show "Made with AI" labels. IPTC_AI_MARKERS: tuple[bytes, ...] = ( @@ -213,9 +229,7 @@ def has_ai_metadata(image_path: Path) -> bool: # Binary scan covers C2PA (PNG caBX, JPEG APP11, AVIF/HEIF/JXL uuid boxes) # and IPTC AI markers in XMP. First 512KB (plus late ISOBMFF provenance boxes). data = scan_head(image_path, 512 * 1024) - if b"c2pa" in data.lower() or b"C2PA" in data: - return True - if C2PA_UUID in data: + if c2pa_marker_in(data): return True if any(marker in data for marker in AIGC_MARKERS): return True @@ -310,6 +324,39 @@ def huggingface_job(image_path: Path) -> str | None: return None +# Samsung Galaxy AI editing marker. Galaxy AI tools (Generative Edit, Sketch to +# Image, Portrait Studio, Drawing Assist, ...) record their re-edit data as a +# proprietary ``PhotoEditor_Re_Edit_Data`` JSON that carries a ``genAIType`` +# field; a non-zero value flags that a generative-AI tool produced or altered +# the pixels. The field is undocumented by Samsung (verified 2026-05-29: absent +# from the C2PA spec and Samsung's public docs/forums), so detection is +# empirical -- on real Galaxy S23/S24/S25 files it co-occurs with the C2PA +# ``trainedAlgorithmicMedia`` source type (3/3 of the verified files that record +# that type), and on a Galaxy S24 sample it is the *only* AI marker (the C2PA +# source type was absent there). Medium confidence: it signals Galaxy AI editing +# without proving the whole image is AI-generated. Scoped to the Samsung editor +# container to avoid matching a stray ``genAIType`` token elsewhere. +_SAMSUNG_GENAI_RE = re.compile(rb'genAIType"\s*:\s*(-?\d+)') +_SAMSUNG_EDITOR_MARKER = b"PhotoEditor_Re_Edit_Data" + + +def samsung_genai(image_path: Path) -> int | None: + """Return Samsung's non-zero ``genAIType`` value if the image carries the + Galaxy AI editing marker, else None. + + See the module note above ``_SAMSUNG_GENAI_RE``: detection is empirical and + gated on the ``PhotoEditor_Re_Edit_Data`` container so an incidental + ``genAIType`` token cannot false-positive. + """ + head = scan_head(image_path, 512 * 1024) + if _SAMSUNG_EDITOR_MARKER not in head: + return None + m = _SAMSUNG_GENAI_RE.search(head) + if m is None: + return None + return int(m.group(1)) or None + + def iptc_ai_system(image_path: Path) -> str | None: """Return an IPTC 2025.1 AI-disclosure note if the file carries those XMP properties, else None. @@ -360,7 +407,7 @@ def synthid_source(image_path: Path) -> str | None: # C2PA manifest where the PNG parser can't reach it. Binary-scan for the # same signal: a C2PA manifest from a SynthID-using issuer on AI content. data = scan_head(image_path) - has_c2pa = b"c2pa" in data.lower() or C2PA_UUID in data + has_c2pa = c2pa_marker_in(data) # Matches both "trainedAlgorithmicMedia" and "compositeWithTrainedAlgorithmicMedia". ai_source = b"trainedAlgorithmicMedia" in data or b"TrainedAlgorithmicMedia" in data if not (has_c2pa and ai_source): @@ -585,6 +632,9 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]: # HuggingFace-hosted job marker (hf-job-id PNG text chunk). if job := huggingface_job(image_path): result.setdefault("huggingface_job", f"HuggingFace-hosted job ({job})") + # Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data). + if (genai := samsung_genai(image_path)) is not None: + result.setdefault("samsung_genai", f"Samsung Galaxy AI editing marker (genAIType={genai})") return result diff --git a/src/remove_ai_watermarks/noai/constants.py b/src/remove_ai_watermarks/noai/constants.py index 424fe0b..d477581 100644 --- a/src/remove_ai_watermarks/noai/constants.py +++ b/src/remove_ai_watermarks/noai/constants.py @@ -88,6 +88,14 @@ C2PA_ISSUERS = { # Stability AI signs C2PA as "Stability AI" (cert org "Stability AI Ltd"). # Verified on a live Brand Studio (DreamStudio successor) output, 2026-05-24. b"Stability AI": "Stability AI", + # Black Forest Labs (FLUX) API output: claim_generator_info "Black Forest + # Labs API" + a c2pa.ai_generated_content assertion + trainedAlgorithmicMedia. + # Verified on a real signed FLUX JPEG, 2026-05-29. + b"Black Forest Labs": "Black Forest Labs", + # ByteDance's Volcano Engine (Volcengine) signs its AI image output with a + # cert from certificate_center@volcengine.com -- the platform behind Doubao / + # Jimeng. Verified on two real signed JPEGs, 2026-05-29. + b"volcengine": "ByteDance (Volcano Engine)", } # C2PA issuers whose signed outputs also carry an invisible SynthID pixel diff --git a/src/remove_ai_watermarks/trustmark_detector.py b/src/remove_ai_watermarks/trustmark_detector.py index dd34d13..d3249bc 100644 --- a/src/remove_ai_watermarks/trustmark_detector.py +++ b/src/remove_ai_watermarks/trustmark_detector.py @@ -51,12 +51,31 @@ def _decoder() -> Any: return _tm +# JPEG quality for the false-positive durability gate (see detect_trustmark). +# Deliberately mild: a genuine TrustMark survives far harsher, while every +# observed false positive collapsed even at this quality. +_REENCODE_QUALITY = 95 + + def detect_trustmark(image_path: Path) -> str | None: - """Return a TrustMark scheme note if a TrustMark watermark is decoded, else None. + """Return a TrustMark scheme note if a *durable* TrustMark watermark is + decoded, else None. Returns e.g. ``"Adobe TrustMark (variant P, schema 0)"`` when the decoder - reports the watermark present, or None if it is absent, the optional - ``trustmark`` package is not installed, or the image cannot be read/decoded. + reports the watermark present AND it survives a mild JPEG re-encode, or None + if it is absent, the optional ``trustmark`` package is not installed, or the + image cannot be read/decoded. + + **False-positive gate.** TrustMark's ``wm_present`` flag is a BCH + error-correction validity check, which spuriously validates on a small + fraction of un-watermarked images -- content-correlated, so AI-generated + textures trip it more often than camera photos (verified 2026-05-29 on real + files: the false "detections" were on Gemini / OpenAI / Doubao output that + cannot carry Adobe's watermark, and decoded a random-bytes secret). A genuine + TrustMark is a *durable* soft binding engineered to survive re-encoding (that + is its entire purpose once C2PA is stripped), so we re-decode after a mild + JPEG round-trip and require the same schema both times. Every observed false + positive collapsed under this gate. """ if not is_available(): return None @@ -65,8 +84,30 @@ def detect_trustmark(image_path: Path) -> str | None: with Image.open(image_path) as img: cover = img.convert("RGB") - _wm_secret, wm_present, wm_schema = _decoder().decode(cover) + decoder = _decoder() + _wm_secret, wm_present, wm_schema = decoder.decode(cover) + if not wm_present: + return None + if not _survives_reencode(decoder, cover, wm_schema): + log.debug("TrustMark decode for %s did not survive re-encode; treating as false positive", image_path) + return None except Exception as exc: # model download / decode failure / unreadable image log.debug("TrustMark decode failed for %s: %s", image_path, exc) return None - return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})" if wm_present else None + return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})" + + +def _survives_reencode(decoder: Any, cover: Any, schema: int) -> bool: + """True if the watermark re-decodes with the same schema after a mild JPEG + round-trip -- the durability a genuine TrustMark guarantees, which a BCH + false positive (content noise) does not.""" + import io + + from PIL import Image + + buffer = io.BytesIO() + cover.save(buffer, "JPEG", quality=_REENCODE_QUALITY) + buffer.seek(0) + with Image.open(buffer) as reencoded: + _secret, present, reencoded_schema = decoder.decode(reencoded.convert("RGB")) + return bool(present) and reencoded_schema == schema diff --git a/src/remove_ai_watermarks/watermark_registry.py b/src/remove_ai_watermarks/watermark_registry.py new file mode 100644 index 0000000..7fb130d --- /dev/null +++ b/src/remove_ai_watermarks/watermark_registry.py @@ -0,0 +1,202 @@ +"""Registry of known visible watermarks. + +A single catalog that ties each known visible mark to (a) where it usually sits, +(b) how to recognize it there, and (c) how to remove it. One pass over the +registry detects every known mark in its usual place and removes the ones +present. + +**Reverse-alpha only.** A known mark is a fixed semi-transparent overlay, so it +is removed by inverting the alpha blend against a captured alpha map +(``original = (wm - a*logo)/(1-a)``) -- exact recovery of the true pixels, not an +inpaint guess. Detection is consistent with that: each mark is recognized by +matching its known shape/template (the thing we invert), not by heuristics. A +mark is therefore listed here only once a real alpha map has been captured for +it; everything else (arbitrary logos/objects) is the user-directed +``erase --region`` tool, not this catalog. + +Entries: + - ``gemini`` -- Google Gemini / Nano Banana sparkle, bottom-right. + - ``doubao`` -- ByteDance Doubao "豆包AI生成" text strip, bottom-right. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import TYPE_CHECKING, Any, Literal + +if TYPE_CHECKING: + from collections.abc import Callable + + from numpy.typing import NDArray + +# cv2 method for the Gemini reverse-alpha edge-residual cleanup (not a standalone +# remover): "ns" / "telea". +InpaintMethod = Literal["telea", "ns"] +Region = tuple[int, int, int, int] + + +@dataclass(frozen=True) +class MarkDetection: + """Uniform detection result for a known mark (across heterogeneous engines).""" + + key: str + label: str + location: str + detected: bool + confidence: float + region: Region + + +@dataclass(frozen=True) +class KnownMark: + """A known visible watermark: where it lives, how to find and remove it.""" + + key: str + label: str + location: str # usual place, human-readable ("bottom-right") + in_auto: bool # participate in `--mark auto` scanning + recovery: str # removal strategy (all reverse-alpha today) + _detect: Callable[[NDArray[Any]], MarkDetection] + _remove: Callable[..., tuple[NDArray[Any], Region | None]] + + def detect(self, image: NDArray[Any]) -> MarkDetection: + return self._detect(image) + + def remove( + self, + image: NDArray[Any], + *, + inpaint_method: InpaintMethod = "ns", + inpaint: bool = True, + inpaint_strength: float = 0.85, + force: bool = False, + ) -> tuple[NDArray[Any], Region | None]: + """Remove this mark by reverse-alpha; returns ``(result, cleared_region)`` + (region for clearing alpha on save, or None if nothing was removed). + + ``inpaint`` / ``inpaint_strength`` / ``inpaint_method`` tune the Gemini + reverse-alpha edge-residual cleanup only. ``force`` removes at the mark's + usual location even without a positive detection (the ``--no-detect`` path). + """ + return self._remove(image, inpaint_method, inpaint, inpaint_strength, force) + + +# Gemini-sparkle confidence above which the registry treats it as a confident +# detection for arbitration. Matches identify's corpus-validated sparkle +# threshold (0.5): the gemini engine's own detect flag uses a looser internal +# threshold and weakly fires (~0.36) on unrelated bottom-right text (e.g. the +# Doubao mark), which would otherwise let it hijack `--mark auto`. 0.5 gives 0 +# false positives on the corpus. +_GEMINI_AUTO_MIN_CONF = 0.5 + +# ── Engine adapters (lazy singletons; engines are cv2-only, no model load) ── + +_engines: dict[str, Any] = {} + + +def _engine(key: str) -> Any: + if key not in _engines: + if key == "gemini": + from remove_ai_watermarks.gemini_engine import GeminiEngine + + _engines[key] = GeminiEngine() + elif key == "doubao": + from remove_ai_watermarks.doubao_engine import DoubaoEngine + + _engines[key] = DoubaoEngine() + else: # pragma: no cover - guarded by the registry keys + raise KeyError(key) + return _engines[key] + + +def _gemini_detect(image: NDArray[Any]) -> MarkDetection: + d = _engine("gemini").detect_watermark(image) + detected = bool(d.detected) and d.confidence >= _GEMINI_AUTO_MIN_CONF + return MarkDetection("gemini", "Google Gemini sparkle", "bottom-right", detected, d.confidence, d.region) + + +def _gemini_remove( + image: NDArray[Any], inpaint_method: InpaintMethod, inpaint: bool, strength: float, force: bool +) -> tuple[NDArray[Any], Region | None]: + engine = _engine("gemini") + det = engine.detect_watermark(image) + if not det.detected: + if not force: + return image.copy(), None + # Forced (--no-detect): remove at the default sparkle slot for the size. + from remove_ai_watermarks.gemini_engine import get_watermark_config + + h, w = image.shape[:2] + cfg = get_watermark_config(w, h) + px, py = cfg.get_position(w, h) + region = (px, py, cfg.logo_size, cfg.logo_size) + result = engine.remove_watermark_custom(image, region) + if inpaint: + result = engine.inpaint_residual(result, region, strength=strength, method=inpaint_method) + return result, region + result = engine.remove_watermark(image) + # Reverse-alpha leaves a faint residual at the sparkle edge; the engine's + # own residual inpaint cleans that seam (part of its reverse-alpha pipeline). + if inpaint: + result = engine.inpaint_residual(result, det.region, strength=strength, method=inpaint_method) + return result, det.region + + +def _doubao_detect(image: NDArray[Any]) -> MarkDetection: + d = _engine("doubao").detect(image) + return MarkDetection("doubao", "Doubao 豆包AI生成 text", "bottom-right", d.detected, d.confidence, d.region) + + +def _doubao_remove( + image: NDArray[Any], _inpaint_method: InpaintMethod, _inpaint: bool, _strength: float, force: bool +) -> tuple[NDArray[Any], Region | None]: + # Reverse-alpha only: apply when the mark is present AND the resolution is in + # the alpha map's calibrated band. Outside it we do NOT inpaint (no + # hallucination) -- removal is skipped until a capture for that resolution. + engine = _engine("doubao") + det = engine.detect(image) + if (det.detected or force) and engine.reverse_alpha_available(image): + return engine.remove_watermark_reverse_alpha(image), (det.region if det.detected else None) + return image.copy(), None + + +_REGISTRY: tuple[KnownMark, ...] = ( + KnownMark("gemini", "Google Gemini sparkle", "bottom-right", True, "reverse-alpha", _gemini_detect, _gemini_remove), + KnownMark( + "doubao", "Doubao 豆包AI生成 text", "bottom-right", True, "reverse-alpha", _doubao_detect, _doubao_remove + ), +) + + +def known_marks() -> tuple[KnownMark, ...]: + """All registered known visible watermarks.""" + return _REGISTRY + + +def mark_keys() -> list[str]: + """Keys of all registered marks (for CLI choices).""" + return [m.key for m in _REGISTRY] + + +def get_mark(key: str) -> KnownMark: + """Look up a known mark by key (raises KeyError if unknown).""" + for m in _REGISTRY: + if m.key == key: + return m + raise KeyError(key) + + +def detect_marks(image: NDArray[Any], *, include_explicit: bool = True) -> list[MarkDetection]: + """Detect every known mark in its usual place. + + Returns one MarkDetection per scanned mark (``detected`` flags which fired). + ``include_explicit=False`` scans only the ``in_auto`` marks -- the set used + by ``--mark auto``. + """ + return [m.detect(image) for m in _REGISTRY if include_explicit or m.in_auto] + + +def best_auto_mark(image: NDArray[Any]) -> MarkDetection | None: + """The highest-confidence detected ``in_auto`` mark, or None if none fired.""" + fired = [d for d in detect_marks(image, include_explicit=False) if d.detected] + return max(fired, key=lambda d: d.confidence) if fired else None diff --git a/tests/test_doubao_engine.py b/tests/test_doubao_engine.py index 6682990..8d27c58 100644 --- a/tests/test_doubao_engine.py +++ b/tests/test_doubao_engine.py @@ -1,4 +1,4 @@ -"""Tests for the Doubao visible-watermark engine.""" +"""Tests for the Doubao visible-watermark engine (reverse-alpha only).""" from __future__ import annotations @@ -8,91 +8,156 @@ import cv2 import numpy as np import pytest -from remove_ai_watermarks.doubao_engine import DoubaoEngine, load_image_bgr +from remove_ai_watermarks.doubao_engine import ( + _ALPHA_HEIGHT_FRAC, + _ALPHA_LOGO_BGR, + _ALPHA_MARGIN_BOTTOM_FRAC, + _ALPHA_MARGIN_RIGHT_FRAC, + _ALPHA_NATIVE_WIDTH, + _ALPHA_WIDTH_FRAC, + DETECT_NCC_THRESHOLD, + DoubaoEngine, + _alpha_template, + _glyph_silhouette, + _template_match_score, + load_image_bgr, +) SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png" -# ── Locate ────────────────────────────────────────────────────────── - - class TestLocate: def test_box_anchored_bottom_right(self): eng = DoubaoEngine() img = np.zeros((2048, 2048, 3), np.uint8) loc = eng.locate(img) - # right and bottom edges sit close to the image corner (within margins) assert 2048 - (loc.x + loc.w) < int(2048 * 0.03) assert 2048 - (loc.y + loc.h) < int(2048 * 0.03) - assert loc.is_fallback # geometry anchor, no bundled template yet def test_box_scales_with_width(self): eng = DoubaoEngine() small = eng.locate(np.zeros((1024, 1024, 3), np.uint8)) large = eng.locate(np.zeros((2048, 2048, 3), np.uint8)) - # width-relative geometry: 2x wider image -> ~2x wider box assert large.w == pytest.approx(small.w * 2, rel=0.1) -# ── Detect + remove on the real sample ────────────────────────────── +# ── Detection: alpha-template NCC ─────────────────────────────────── + + +class TestDetect: + def test_clean_gradient_not_detected(self): + eng = DoubaoEngine() + ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1)) + img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR) + assert not eng.detect(img).detected + + def test_solid_blob_corner_not_detected(self): + """A bright blob is not the glyph shape -> low correlation, not detected.""" + eng = DoubaoEngine() + img = np.zeros((1024, 1024, 3), np.uint8) + x, y, bw, bh = eng.locate(img).bbox + img[y + bh // 4 : y + bh * 3 // 4, x : x + bw // 2] = 200 + assert not eng.detect(img).detected + + def test_silhouette_loads(self): + sil = _glyph_silhouette() + assert sil is not None + assert set(np.unique(sil)).issubset({0, 255}) + + def test_match_score_shape_sensitive(self): + """The glyph silhouette correlates with itself, not with a filled block.""" + sil = _glyph_silhouette() + h, w = sil.shape + # box that contains the silhouette -> high score + box = np.zeros((h + 8, int(w / _ALPHA_WIDTH_FRAC * 0.2) + w), np.uint8) + box[4 : 4 + h, 4 : 4 + w] = sil + assert _template_match_score(box, _ALPHA_NATIVE_WIDTH) >= DETECT_NCC_THRESHOLD + # a uniformly filled box has no glyph structure -> low score + solid = np.full_like(box, 255) + assert _template_match_score(solid, _ALPHA_NATIVE_WIDTH) < DETECT_NCC_THRESHOLD @pytest.mark.skipif(not SAMPLE.exists(), reason="sample image not present") class TestRealSample: def test_detects_watermark(self): - eng = DoubaoEngine() - det = eng.detect(load_image_bgr(SAMPLE)) + det = DoubaoEngine().detect(load_image_bgr(SAMPLE)) assert det.detected - assert det.confidence > 0.0 - assert det.coverage > 0.04 + assert det.confidence >= DETECT_NCC_THRESHOLD - def test_remove_reduces_glyph_coverage(self): + def test_reverse_alpha_removes_mark(self): eng = DoubaoEngine() img = load_image_bgr(SAMPLE) - before = eng.detect(img).coverage - out = eng.remove_watermark(img) - after = eng.detect(out).coverage - # the inpaint should clear most glyph pixels from the corner box - assert after < before * 0.5 + assert eng.reverse_alpha_available(img) # sample is at the captured width + out = eng.remove_watermark_reverse_alpha(img) + assert not eng.detect(out).detected # mark gone after recovery - def test_pixels_outside_box_untouched(self): + def test_far_region_untouched(self): eng = DoubaoEngine() img = load_image_bgr(SAMPLE) - out = eng.remove_watermark(img) - # top-left quadrant is far from the bottom-right mark: must be identical + out = eng.remove_watermark_reverse_alpha(img) h, w = img.shape[:2] assert np.array_equal(img[: h // 2, : w // 2], out[: h // 2, : w // 2]) -# ── Negative + safety guard ───────────────────────────────────────── +# ── Reverse-alpha (exact recovery) ────────────────────────────────── -class TestNegativeAndGuard: - def test_clean_image_not_detected(self): +class TestReverseAlpha: + def test_alpha_asset_loads(self): + at = _alpha_template() + assert at is not None + assert at.dtype.kind == "f" + assert float(at.min()) >= 0.0 + assert float(at.max()) <= 1.0 + + def test_available_whenever_asset_present(self): + # NCC alignment generalizes to any resolution, so availability is just + # "asset loadable" (any non-empty image); the caller gates on detect. eng = DoubaoEngine() - # smooth gradient, no watermark - ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1)) - img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR) - det = eng.detect(img) - assert not det.detected + assert eng.reverse_alpha_available(np.zeros((1024, 1024, 3), np.uint8)) + assert eng.reverse_alpha_available(np.zeros((1773, 1535, 3), np.uint8)) + assert not eng.reverse_alpha_available(np.zeros((0, 0, 3), np.uint8)) - def test_clean_image_returned_unchanged(self): - eng = DoubaoEngine() - ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1)) - img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR) - out = eng.remove_watermark(img) - assert np.array_equal(img, out) + @staticmethod + def _compose(w: int, h: int, bg: float = 100.0): + """Composite the real alpha (scaled to width ``w``) onto a flat bg. + Returns ``(watermarked_uint8, mark_bool_mask)``.""" + img = np.full((h, w, 3), bg, np.float32) + at = _alpha_template() + gw, gh = int(_ALPHA_WIDTH_FRAC * w), int(_ALPHA_HEIGHT_FRAC * w) + ax = w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw + ay = h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh + amap = np.zeros((h, w), np.float32) + amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh)) + a3 = amap[:, :, None] + wm = (a3 * np.array(_ALPHA_LOGO_BGR, np.float32) + (1 - a3) * img).clip(0, 255).astype(np.uint8) + return wm, amap > 0.2 - def test_document_background_guard(self): - """A dense high-frequency corner (document-like) trips the coverage - guard, so the image is left untouched rather than smeared.""" + def test_native_returns_exact_reverse_alpha_no_inpaint(self): + """At native width the recovery is exact, so it must be returned untouched + -- inpainting over exactly-recovered interior pixels degrades quality + (regression: native textured error 1.6 reverse-alpha-only vs 2.6 with the + old full-footprint inpaint). The output must equal pure reverse-alpha.""" eng = DoubaoEngine() - rng = np.random.default_rng(0) - img = np.full((1024, 1024, 3), 255, np.uint8) - # fill the bottom-right box area with random grayish text-like noise - loc = eng.locate(img) - x, y, bw, bh = loc.bbox - noise = rng.integers(150, 246, size=(bh, bw), dtype=np.uint8) - img[y : y + bh, x : x + bw] = noise[:, :, None] - out = eng.remove_watermark(img) - assert np.array_equal(img, out) + wm, _mark = self._compose(_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH) + out = eng.remove_watermark_reverse_alpha(wm) + amap = eng._fixed_alpha_map(wm) + assert amap is not None + expected = eng._apply_reverse_alpha(wm, amap[0]) + assert np.array_equal(out, expected) # no inpaint touched the recovery + + @pytest.mark.parametrize( + ("w", "h", "max_err"), + [ + (_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH, 5.0), # native 1:1 -> fixed geometry, ~exact + (1773, 2364, 8.0), # 3:4 portrait -> NCC alignment generalizes the single capture + ], + ) + def test_recovers_flat_background(self, w, h, max_err): + """Recovers the flat background at native (fixed geometry, exact) AND a + non-native resolution (NCC alignment generalizes the single capture).""" + eng = DoubaoEngine() + wm, mark = self._compose(w, h) + assert float(np.abs(wm.astype(np.float32)[mark] - 100.0).mean()) > 15 # mark visible + out = eng.remove_watermark_reverse_alpha(wm).astype(np.float32) + assert float(np.abs(out[mark] - 100.0).mean()) < max_err diff --git a/tests/test_identify.py b/tests/test_identify.py index 019e761..17991e9 100644 --- a/tests/test_identify.py +++ b/tests/test_identify.py @@ -113,6 +113,18 @@ class TestIdentifyNonPng: r = identify(path, check_visible=False) assert any("SynthID" in w for w in r.watermarks) + def test_black_forest_labs_flux_attributed(self, tmp_path: Path): + path = self._c2pa_jpeg(tmp_path, b"Black Forest Labs API ... trainedAlgorithmicMedia") + r = identify(path, check_visible=False, check_invisible=False) + assert r.is_ai_generated is True + assert r.platform == "Black Forest Labs (FLUX)" + + def test_bytedance_volcengine_attributed(self, tmp_path: Path): + path = self._c2pa_jpeg(tmp_path, b"certificate_center@volcengine.com ... trainedAlgorithmicMedia") + r = identify(path, check_visible=False, check_invisible=False) + assert r.is_ai_generated is True + assert "ByteDance" in (r.platform or "") + def test_stability_ai_issuer_attributed_no_synthid(self, tmp_path: Path): path = self._c2pa_jpeg(tmp_path, b"Stability AI ... trainedAlgorithmicMedia") r = identify(path, check_visible=False) @@ -132,6 +144,50 @@ class TestIdentifyNonPng: assert not any("SynthID" in w for w in r.watermarks) +class TestIdentifySamsungGalaxy: + """Samsung Galaxy / ASUS Gallery C2PA signers (verified on real signed files + 2026-05-29; synthetic byte blobs here since the originals are private). + + Galaxy AI edits stamp BOTH the device cert AND an AI source-type / genAIType, + so the signer attribution must NOT trip the camera-vs-AI integrity clash. + """ + + def _jpeg(self, tmp_path: Path, name: str, blob: bytes) -> Path: + path = tmp_path / name + path.write_bytes(b"\xff\xd8\xff\xe1jumbc2pa" + blob + b"\xff\xd9") + return path + + def test_galaxy_trained_source_is_high_ai(self, tmp_path: Path): + path = self._jpeg(tmp_path, "s25.jpg", b"Samsung Galaxy Galaxy S25 c2pa-rs trainedAlgorithmicMedia") + r = identify(path, check_visible=False, check_invisible=False) + assert r.is_ai_generated is True + assert r.confidence == "high" + assert r.platform == "Samsung Galaxy (C2PA)" + assert r.integrity_clashes == [] # device cert + AI source-type is legitimate, not a clash + + def test_galaxy_genai_only_is_medium_ai(self, tmp_path: Path): + # The Galaxy S24 case: no trainedAlgorithmicMedia, genAIType is the only + # AI marker -- previously missed, now a medium-confidence verdict. + path = self._jpeg( + tmp_path, "s24.jpg", b'Samsung Galaxy Galaxy S24 c2pa-rs PhotoEditor_Re_Edit_Data{"genAIType":1}' + ) + r = identify(path, check_visible=False, check_invisible=False) + assert r.is_ai_generated is True + assert r.confidence == "medium" + assert r.platform == "Samsung Galaxy (C2PA)" + assert any(s.name == "samsung_genai" for s in r.signals) + assert r.integrity_clashes == [] + + def test_asus_gallery_signer_not_ai(self, tmp_path: Path): + # ASUS Gallery signs edited photos; no AI source-type or genAIType, so the + # platform is attributed but the verdict stays unknown. + path = self._jpeg(tmp_path, "asus.jpg", b"/com.asus.gallery/3.8.0.98 c2pa-rs no ai marker") + r = identify(path, check_visible=False, check_invisible=False) + assert r.is_ai_generated is None + assert r.platform == "ASUS Gallery (C2PA signer)" + assert any("C2PA" in w for w in r.watermarks) + + # ── End-to-end verdicts on real fixtures ──────────────────────────── diff --git a/tests/test_metadata.py b/tests/test_metadata.py index 3e3c8b7..436ab3b 100644 --- a/tests/test_metadata.py +++ b/tests/test_metadata.py @@ -12,12 +12,15 @@ from PIL import Image from PIL.PngImagePlugin import PngInfo from remove_ai_watermarks.metadata import ( + C2PA_UUID, _is_ai_key, + c2pa_marker_in, exif_generator, get_ai_metadata, has_ai_metadata, iptc_ai_system, remove_ai_metadata, + samsung_genai, synthid_source, xai_signature, ) @@ -135,6 +138,71 @@ class TestHasAiMetadata: assert has_ai_metadata(path) +class TestC2paMarkerIn: + """The C2PA presence check requires a JUMBF wrapper or the C2PA uuid box, so + a bare 4-byte ``c2pa`` substring (e.g. random compressed pixel data) does not + false-positive -- the regression behind 4 cleaned PNGs re-flagging C2PA.""" + + def test_jumbf_wrapped_c2pa_detected(self): + assert c2pa_marker_in(b"....jumbc2pa....manifest....") is True + + def test_c2pa_uuid_box_detected(self): + assert c2pa_marker_in(b"\x00\x00\x00\x18uuid" + C2PA_UUID + b"payload") is True + + def test_bare_c2pa_substring_not_detected(self): + # The exact false positive: "c2pa" appears in noise but no JUMBF/uuid box. + assert c2pa_marker_in(b"\x9c\xc3\xa7B1\x11c2pa\x80b\x804\xc5\xf9random idat") is False + + def test_jumb_without_c2pa_not_detected(self): + assert c2pa_marker_in(b"some jumb box but no manifest label") is False + + def test_empty_not_detected(self): + assert c2pa_marker_in(b"") is False + + +class TestSamsungGenai: + """Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data). + + Synthetic byte blobs -- real Galaxy files are user content and not shipped + (public repo), same discipline as the Grok/Doubao fixtures. + """ + + @staticmethod + def _samsung_jpeg(tmp_path: Path, name: str, payload: bytes) -> Path: + path = tmp_path / name + path.write_bytes(b"\xff\xd8\xff\xe1" + payload + b"\xff\xd9") + return path + + def test_nonzero_genai_type_detected(self, tmp_path: Path): + p = self._samsung_jpeg( + tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"connectorType":"srvg","genAIType":1}' + ) + assert samsung_genai(p) == 1 + + def test_other_nonzero_value_detected(self, tmp_path: Path): + p = self._samsung_jpeg(tmp_path, "galaxy5.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":5}') + assert samsung_genai(p) == 5 + + def test_zero_genai_type_is_none(self, tmp_path: Path): + """genAIType:0 means no generative AI was used -- not a positive signal.""" + p = self._samsung_jpeg(tmp_path, "edit.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":0}') + assert samsung_genai(p) is None + + def test_genai_without_editor_container_ignored(self, tmp_path: Path): + """An incidental genAIType token outside Samsung's editor JSON is ignored.""" + p = self._samsung_jpeg(tmp_path, "stray.jpg", b'some other blob "genAIType":1 elsewhere') + assert samsung_genai(p) is None + + def test_clean_image_is_none(self, tmp_clean_png): + assert samsung_genai(tmp_clean_png) is None + + def test_surfaced_in_get_ai_metadata(self, tmp_path: Path): + p = self._samsung_jpeg(tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":1}') + meta = get_ai_metadata(p) + assert "samsung_genai" in meta + assert "genAIType=1" in meta["samsung_genai"] + + class TestGetAiMetadata: """Tests for extracting AI metadata.""" diff --git a/tests/test_trustmark_detector.py b/tests/test_trustmark_detector.py index d1697ff..f2fec3d 100644 --- a/tests/test_trustmark_detector.py +++ b/tests/test_trustmark_detector.py @@ -12,12 +12,28 @@ from typing import TYPE_CHECKING import pytest +from remove_ai_watermarks import trustmark_detector from remove_ai_watermarks.trustmark_detector import detect_trustmark, is_available if TYPE_CHECKING: from pathlib import Path +class _FakeDecoder: + """A TrustMark decoder whose successive ``decode`` calls return scripted + ``(secret, present, schema)`` tuples -- the first for the original image, the + second for the re-encoded copy used by the false-positive durability gate.""" + + def __init__(self, *results: tuple[bytes, bool, int]): + self._results = list(results) + self.calls = 0 + + def decode(self, _img: object) -> tuple[bytes, bool, int]: + result = self._results[min(self.calls, len(self._results) - 1)] + self.calls += 1 + return result + + def test_detect_never_raises(tmp_clean_png: Path): # Whether or not trustmark is installed, a clean image must yield None # (no watermark) without raising. When absent, the import guard returns None. @@ -34,3 +50,40 @@ def test_unreadable_file_returns_none(tmp_path: Path): def test_clean_image_reports_no_watermark(tmp_clean_png: Path): # With the decoder present, an un-watermarked image must report absent. assert detect_trustmark(tmp_clean_png) is None + + +class TestFalsePositiveGate: + """The re-encode durability gate keeps real (durable) TrustMarks and drops + BCH false positives that collapse under a mild JPEG round-trip.""" + + @pytest.fixture(autouse=True) + def _force_available(self, monkeypatch: pytest.MonkeyPatch): + monkeypatch.setattr(trustmark_detector, "is_available", lambda: True) + + def _patch_decoder(self, monkeypatch: pytest.MonkeyPatch, decoder: _FakeDecoder) -> None: + monkeypatch.setattr(trustmark_detector, "_decoder", lambda: decoder) + + def test_durable_watermark_survives_and_is_reported(self, monkeypatch, tmp_clean_png: Path): + decoder = _FakeDecoder((b"secret", True, 2), (b"secret", True, 2)) + self._patch_decoder(monkeypatch, decoder) + result = detect_trustmark(tmp_clean_png) + assert result == "Adobe TrustMark (variant P, schema 2)" + assert decoder.calls == 2 # original + re-encode + + def test_false_positive_collapsing_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path): + # Present on the original, absent after re-encode -> content-noise FP. + decoder = _FakeDecoder((b"\x00\x01", True, 3), (b"", False, -1)) + self._patch_decoder(monkeypatch, decoder) + assert detect_trustmark(tmp_clean_png) is None + + def test_schema_drift_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path): + # Present both times but the schema changes -> not a stable watermark. + decoder = _FakeDecoder((b"\x00", True, 2), (b"\x00", True, 3)) + self._patch_decoder(monkeypatch, decoder) + assert detect_trustmark(tmp_clean_png) is None + + def test_absent_skips_reencode(self, monkeypatch, tmp_clean_png: Path): + decoder = _FakeDecoder((b"", False, -1)) + self._patch_decoder(monkeypatch, decoder) + assert detect_trustmark(tmp_clean_png) is None + assert decoder.calls == 1 # no second decode when the first is absent diff --git a/tests/test_watermark_registry.py b/tests/test_watermark_registry.py new file mode 100644 index 0000000..884c873 --- /dev/null +++ b/tests/test_watermark_registry.py @@ -0,0 +1,70 @@ +"""Tests for the known-visible-watermark registry (reverse-alpha only).""" + +from __future__ import annotations + +from pathlib import Path + +import numpy as np +import pytest + +from remove_ai_watermarks import watermark_registry as reg + +DOUBAO_SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png" + + +class TestCatalog: + def test_keys(self): + assert reg.mark_keys() == ["gemini", "doubao"] + + def test_all_in_auto(self): + assert all(m.in_auto for m in reg.known_marks()) + + def test_recovery_is_reverse_alpha(self): + # Every catalogued mark is removed by exact reverse-alpha (no inpaint). + assert all(m.recovery == "reverse-alpha" for m in reg.known_marks()) + + def test_locations(self): + by_key = {m.key: m for m in reg.known_marks()} + assert by_key["gemini"].location == "bottom-right" + assert by_key["doubao"].location == "bottom-right" + + def test_get_mark_unknown_raises(self): + with pytest.raises(KeyError): + reg.get_mark("nope") + + +class TestScan: + def test_detect_marks_scans_all(self): + img = np.zeros((256, 256, 3), np.uint8) + keys = {d.key for d in reg.detect_marks(img)} + assert keys == {"gemini", "doubao"} + + def test_blank_image_no_auto_mark(self): + assert reg.best_auto_mark(np.zeros((256, 256, 3), np.uint8)) is None + + +@pytest.mark.skipif(not DOUBAO_SAMPLE.exists(), reason="doubao sample not present") +class TestRealSample: + def test_doubao_sample_wins_auto(self): + from remove_ai_watermarks.image_io import imread + + best = reg.best_auto_mark(imread(DOUBAO_SAMPLE)) + assert best is not None + assert best.key == "doubao" + + def test_doubao_remove_returns_region(self): + from remove_ai_watermarks.image_io import imread + + img = imread(DOUBAO_SAMPLE) # 2048 wide -> reverse-alpha applies + result, region = reg.get_mark("doubao").remove(img) + assert region is not None + assert result.shape == img.shape + + +class TestReverseAlphaOnly: + def test_doubao_off_resolution_is_skipped(self): + # No alpha capture for this width -> no inpaint fallback, image untouched. + img = np.zeros((512, 512, 3), np.uint8) + result, region = reg.get_mark("doubao").remove(img) + assert region is None + assert np.array_equal(result, img)