mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-05 10:38:00 +02:00
Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28)
* fix(trustmark): gate detection on re-encode durability to kill false positives TrustMark's wm_present flag is a BCH validity check that spuriously validates on a content-correlated fraction of un-watermarked images (AI textures trip it more than camera photos). On a 1343-image set all 20 raw detections were false, several on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with random-bytes secrets. A genuine TrustMark is a durable soft binding that survives re-encoding, so detect_trustmark now re-decodes after a mild JPEG round-trip and requires the same schema both times. Every observed false positive collapsed under this gate; the second decode runs only on the rare hit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP Detection extensions verified on real signed files (2026-05-29): - Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type) does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai: the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium- confidence AI-editing signal (samsung_only branch). - Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng) added as C2PA issuers + issuer->platform mappings. - fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify + metadata. Verified: all 535 real C2PA files carry jumb. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): gate detection on text structure to cut ~95% of false positives (#23) Coverage alone over-fired: any textured bottom-right corner cleared the threshold, so the detector false-positived on ~28% of arbitrary images. The real '豆包AI生成' mark is six glyphs in one row, so detect now also requires the text-structure signature (_glyph_structure): many connected components, no single dominant blob, concentration in a thin horizontal band. False positives dropped 343 -> 17 across the corpus while keeping real-mark recall and the doubao-1.png sample. Also accept a no-op force kwarg for remover-interface symmetry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(samsung): add Samsung Galaxy AI visible-badge remover New samsung_engine.py removes the bottom-left sparkle + localized 'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates on a band + left-anchor signature (the Doubao CJK-component gate does not transfer: Latin badge letters connect into few blobs). Explicit-only -- tuned on few real badges with a ~4% FP floor, so it is not used in auto. Synthetic byte-blob fixtures (real badges are user content, not shipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(visible): unified known-watermark registry + LaMa inpaint backend watermark_registry.py is a single catalog of known visible marks, each tying {usual location, in_auto flag, recovery strategy, detect adapter, remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone. Cross-engine confidences are not comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is loose and weakly fired ~0.36 on Doubao text, hijacking auto). --backend auto|cv2|lama chooses background reconstruction for the mask-based marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges behind). cv2 stays the zero-dependency fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): exact reverse-alpha removal from captured alpha map The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so given its alpha map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a) -- no inpaint hallucination. The alpha map + logo colour were solved from real black+gray Doubao captures on a controlled background: on black captured = a*logo, and the black/gray pair solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white); the white capture cross-validates (mark vanishes to a flat fill). Bundled as assets/doubao_alpha.png + geometry constants. remove_watermark_reverse_alpha applies it scaled to image width; exact at the captured width, so the registry routes doubao through it only when reverse_alpha_available (width within the calibrated band) and the mark is detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions to widen exact coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection Per the principle that we only remove/detect what we can do exactly, the visible-mark path is now reverse-alpha only: - Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4) -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP 7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate. - Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove, _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao outside the alpha resolution band is skipped, never inpainted. - Removed samsung_engine.py + tests + --mark samsung (no alpha map captured; Samsung C2PA/genAIType metadata detection in identify is unaffected). - The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region inpainting stays a user-directed tool, separate from the known-mark registry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution A pure width-scale of the captured alpha map is only sub-pixel-accurate at the captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED scale+position search (_aligned_alpha_map) before inverting the blend, so the single 2048 capture works at any resolution -- verified clean on the 1773x2364 (3:4) corpus size, the biggest coverage gap (23 files). reverse_alpha_available is now just 'asset present' (no width band); the registry still gates removal on detect so a clean corner is never touched. Drops the _ALPHA_WIDTH_TOLERANCE gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): keep native recovery exact -- fixed geometry at captured width Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39). remove_watermark_reverse_alpha now uses exact width-relative geometry within _ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best of both: native back to 0.94, other resolutions still aligned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56) On a faint/busy-background mark the NCC alignment peak can wander a few px off the true mark and leave a residual (2/56 real corpus files). Off the captured width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the NCC-aligned alpha map, applies each, and keeps whichever leaves the least residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip now removes 56/56 detected Doubao clean across every corpus resolution (was 54). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(doubao): skip residual inpaint at native width for exact recovery At the captured width the fixed-geometry reverse-alpha is pixel-exact, so inpainting over it only replaced exactly-recovered interior pixels with a cv2 hallucination -- measured worse on a textured background (native error vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint). Native now returns the bare recovery untouched; off-native, where NCC alignment is only sub-pixel-approximate, the footprint inpaint stays to clean the seam. Real round-trip still 56/56 across all corpus resolutions; negatives 0/60, Gemini unaffected. Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression guard. Sync CLAUDE.md + README (the table cell and prose described the pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the session scheduled_tasks.lock, and add the text-protection research note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -34,6 +34,7 @@ yolov8n.pt
|
||||
|
||||
# Claude Code local settings
|
||||
.claude/settings.local.json
|
||||
.claude/scheduled_tasks.lock
|
||||
|
||||
# Doubao watermark calibration (local only; ship only the derived alpha-map asset).
|
||||
# Synthetic seeds + raw Doubao captures are regenerable and not committed.
|
||||
|
||||
@@ -17,7 +17,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
|
||||
## Features
|
||||
|
||||
- **Visible watermark removal** — Gemini / Nano Banana sparkle logo (reverse alpha blending) and the Doubao "豆包AI生成" text strip (locate + mask + inpaint); fast, offline, deterministic, no GPU. `visible --mark auto` picks the right one
|
||||
- **Visible watermark removal** — a registry of known marks in their usual places: the Gemini / Nano Banana sparkle and the Doubao "豆包AI生成" text strip. Each is removed by **exact reverse-alpha blending** against a captured alpha map (`original = (wm − α·logo)/(1−α)`), recovering the true pixels rather than inpainting a guess. Fast, offline, no GPU. `visible --mark auto` finds and removes the strongest detected mark. (For arbitrary logos/objects, see `erase`.)
|
||||
- **Universal region eraser (`erase`)** — remove any logo / watermark / object inside boxes you specify, regardless of position or colour. Default cv2 inpainting (CPU, instant); optional big-LaMa via onnxruntime (`lama` extra) for higher quality
|
||||
- **Invisible watermark removal** — SynthID, StableSignature, TreeRing via diffusion-based regeneration (needs a local GPU, or run it with no setup on [raiw.cc](https://raiw.cc))
|
||||
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
|
||||
@@ -49,7 +49,9 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
| **xAI Grok (Aurora)** | — | — | ✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist` | Detected (`identify`); metadata strip |
|
||||
| **Midjourney** | — | — | ✅ EXIF + XMP (prompt, model, seed) | Metadata strip |
|
||||
| **Meta AI** | — | — | ✅ IPTC "Made with AI" (digitalSourceType) | Metadata strip (removes the label) |
|
||||
| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label — `<TC260:AIGC>` XMP **or** `AIGC` PNG chunk (China's mandatory AI labeling) | Locate + mask + inpaint (cv2, CPU) + metadata strip |
|
||||
| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label (`<TC260:AIGC>` XMP **or** `AIGC` PNG chunk) **+ C2PA** signed by ByteDance Volcano Engine (`volcengine`) | Exact reverse-alpha (captured α map): pixel-exact at native width, NCC-aligned at other resolutions, + metadata strip |
|
||||
| **Samsung Galaxy AI** (Generative Edit, Sketch to Image, ...) | — | — | ✅ C2PA (signer "Samsung Galaxy") + `trainedAlgorithmicMedia` / proprietary `genAIType` marker | Detected (`identify`) + metadata strip |
|
||||
| **Black Forest Labs** (FLUX API) | — | — | ✅ C2PA (`Black Forest Labs API` + `c2pa.ai_generated_content` + `trainedAlgorithmicMedia`) | Metadata strip |
|
||||
| **StableSignature** (Meta) | — | ✅ In-model watermark | — | Diffusion regeneration |
|
||||
| **TreeRing** | — | ✅ Latent space watermark | — | Diffusion regeneration |
|
||||
|
||||
@@ -79,9 +81,9 @@ A three-stage NCC (Normalized Cross-Correlation) detector finds the watermark po
|
||||
|
||||
### Removing the Doubao "豆包AI生成" text watermark
|
||||
|
||||
Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. Unlike the fixed-size Gemini sparkle, it is a text strip that scales with image width, so we anchor a generous bottom-right box by geometry, extract the light low-saturation glyph pixels with a polarity-aware white top-hat mask, and inpaint them (cv2 Telea/NS). The mask is background-relative, so it leaves white-paper documents untouched instead of smearing their text. On dense-text backgrounds where the mask would explode, removal is skipped rather than guessed.
|
||||
Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. It is a fixed semi-transparent white overlay, so — like the Gemini sparkle — it is removed by **exact reverse-alpha blending**: `original = (watermarked - α·logo) / (1 - α)`, recovering the true pixels instead of hallucinating them. The α map and logo colour were solved from controlled black + gray captures (on black, `captured = α·logo`; the black/gray pair solves α per-pixel). At the captured width the placement is exact, so the recovery is returned untouched (inpainting over exactly-recovered pixels only degrades them). The single capture generalizes to any resolution: off the captured width an NCC scale-and-position search registers the α template to the actual mark, and a light residual inpaint cleans the sub-pixel seam there. Detection is consistent with removal: it matches the same alpha glyph silhouette against the corner (normalized correlation), so it keys on the actual "豆包AI生成" shape, not on textured corners.
|
||||
|
||||
**Speed**: ~0.03s per image. No GPU needed. Best on photo / illustration backgrounds; on high-contrast edges a faint residue can remain (use `erase --backend lama` for neural-quality fill).
|
||||
**Speed**: ~0.05s, no GPU needed. Reverse-alpha at the captured resolution recovers the true background pixels exactly.
|
||||
|
||||
### Universal region eraser
|
||||
|
||||
@@ -237,9 +239,9 @@ remove-ai-watermarks batch ./images/ --mode all
|
||||
# of a clean origin. Add --json for machine-readable output.
|
||||
remove-ai-watermarks identify image.png
|
||||
|
||||
# Visible watermark only — fast, offline, CPU. --mark auto (default) picks
|
||||
# between the Gemini sparkle and the Doubao "豆包AI生成" text strip; force one
|
||||
# with --mark gemini / --mark doubao.
|
||||
# Visible watermark only — fast, offline, CPU. --mark auto (default) finds the
|
||||
# strongest known mark (Gemini sparkle / Doubao "豆包AI生成" text); force one
|
||||
# with --mark gemini / doubao. Removed by exact reverse-alpha (true-pixel recovery).
|
||||
remove-ai-watermarks visible image.png -o clean.png
|
||||
|
||||
# Erase arbitrary region(s) — universal, any logo/watermark/object, any position.
|
||||
@@ -329,7 +331,7 @@ Tracked but not yet implemented:
|
||||
- **Real non-PNG C2PA fixtures**. SynthID-source detection for JPEG / WebP / AVIF is currently covered only by synthetic byte blobs; replace with real vendor-emitted files to ground the binary-scan path.
|
||||
- **Maintenance debt**. Strict pyright is now clean across `src/` (0 errors): pure-logic files are fully typed, the cv2 / torch / diffusers boundary files carry a documented per-file relax pragma, and a local `typings/piexif` stub covers piexif. Remaining: full-project `pyright` (no path) still OOMs node on this ML-heavy repo, so it must be scoped to `src/`; narrowing the boundary pragmas back toward full strict (as upstream stubs improve) is the long tail. (`uv-secure` is already clean since `idna` was bumped to 3.16.)
|
||||
- **AVIF / HEIF `Exif` item inside the `meta` box**. An AI-label *XMP* packet in a `meta`-box item is now blanked in place (v0.6.9), but EXIF stored as a `meta`-box `Exif` *item* is still not removed — it needs full `iinf`/`iloc` surgery (offset rewrite, corruption risk) or `exiftool` (a non-bundled binary dependency). Low priority: the AI labels we target are XMP, not EXIF, so an EXIF-only meta-box case is rare.
|
||||
- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic are mapped (each verified against a real signed file). Canon and Samsung Galaxy (AI-edit) are deferred until a real signed sample surfaces — no public direct-download C2PA file exists for them today (upload-to-verify / news-agency-licensed only).
|
||||
- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic capture cameras are mapped (each verified against a real signed file); **Samsung Galaxy AI**, **Black Forest Labs (FLUX)**, and **ByteDance Volcano Engine** (Doubao / Jimeng) are now attributed too (verified on real signed files). Canon is still deferred until a real signed sample surfaces — no public direct-download C2PA file exists for it today (upload-to-verify / news-agency-licensed only).
|
||||
- **Resemble PerTh audio detection** — evaluated, not feasible with the public API: `get_watermark()` returns a raw bit array with no presence/confidence flag, so watermarked vs. clean audio can't be reliably separated without Resemble's fixed payload or a confidence service. Same wall as the SynthID pixel detector.
|
||||
- **Video pipeline (`noai-video`)**: per-frame inpainting and tracking for Sora 2 dynamic logo, Veo 3.1 badge, Kling, Runway. Separate package, not folded into this repo.
|
||||
|
||||
|
||||
@@ -0,0 +1,138 @@
|
||||
# Text protection research: crisp text under a "watermark removed everywhere" constraint
|
||||
|
||||
Date: 2026-05-29. Source: a deep-research run (104 agents, 5 search angles, sources
|
||||
fetched and 3-vote adversarially verified). Not committed automatically — saved as a
|
||||
research note for the next session.
|
||||
|
||||
## The constraint that frames everything
|
||||
|
||||
The invisible watermark (Google SynthID) must be removed **everywhere, including inside
|
||||
text regions**. Therefore any technique that keeps or composites the **original
|
||||
(watermarked) text pixels** is disqualified — the text must be *regenerated / freshly
|
||||
synthesized* enough to scrub the watermark, yet rendered crisply. This single rule is the
|
||||
filter applied to every candidate below.
|
||||
|
||||
## Problem recap
|
||||
|
||||
The `invisible` pipeline is SDXL base 1.0 img2img at low strength (~0.05) to defeat
|
||||
SynthID with minimal visible change. Text is protected via Differential Diffusion with a
|
||||
per-pixel change map (`preserve` ~0.9) driven by the PP-OCRv3 DB detector
|
||||
(`text_protector.py`). Large text survives; **small text (sub ~8 px strokes) softens or
|
||||
garbles** (issue #14, confirmed on real content).
|
||||
|
||||
## Executive summary
|
||||
|
||||
The fine-text softening is an **architectural consequence of latent-space processing, not
|
||||
a tuning problem**: SDXL's 4-channel VAE (~48x compression) discards high-frequency signal
|
||||
on encode, and Differential Diffusion blends in latent space with the change map
|
||||
downsampled by 8x, so any stroke under ~8 px sits inside one latent cell and cannot be
|
||||
preserved or edited cleanly **regardless of `preserve`** (the Differential Diffusion
|
||||
authors state this limit explicitly). Two structurally sound directions keep the
|
||||
"watermark removed everywhere" guarantee because they **synthesize fresh glyph pixels**
|
||||
rather than compositing originals: (1) glyph/text-conditioned diffusion re-render of
|
||||
detected text (AnyText2, EasyText), and (2) a two-stage architecture — global scrub, then
|
||||
a dedicated text-restoration / text-aware super-resolution pass over detected regions
|
||||
(TIGER, TextSR, TeReDiff/TAIR). **EasyText** and **TextSR** are the most promising for this
|
||||
CJK-first pipeline (both multilingual via DiT/ByT5, both regenerate from glyph or
|
||||
character-shape priors). The deepest fix — a 16-channel (SD3/FLUX) VAE — materially reduces
|
||||
the softening but means switching the base model, not a drop-in VAE swap.
|
||||
|
||||
## Constraint reconciliation (important)
|
||||
|
||||
The generic research "quick win: bump `preserve` toward 1.0" is **invalid under our hard
|
||||
constraint**: raising `preserve` freezes the text region, so SynthID there is **not
|
||||
scrubbed**. Likewise, pixel paste-back of the original text is disqualified. The only
|
||||
constraint-compatible quick win is **higher resolution / tiled diffusion** (strokes span
|
||||
more latent cells, less VAE softening, while the text is still fully regenerated and thus
|
||||
scrubbed). The real answer is **regenerate text crisply**, not freeze it.
|
||||
|
||||
## Findings (with confidence and sources)
|
||||
|
||||
### Finding 1 — confidence: high
|
||||
|
||||
**Claim.** The small-text softening is an architectural latent-space limit, not a tuning issue. SDXL's VAE compressively encodes (losing exact color and fine detail on every round-trip), and Differential Diffusion blends in latent space with the change map downsampled to latent resolution (8x), so the method explicitly caps edit/preserve granularity at ~8 px under SD settings. Text strokes below one latent cell cannot be cleanly preserved even at preserve ~0.9.
|
||||
|
||||
**Evidence.** Differential Diffusion's paper states a "cap on the resolution of the change map ... can limit the ability to precisely edit small objects (less than 8 pixels for Stable-Diffusion's settings)"; the official SDXL pipeline downsamples the map by `vae_scale_factor=8` and blends `latents = original*mask + latents*(1-mask)` in latent space. The VAE encode is "compressive ... exact color qualities and exact visual fine-details are lost." arXiv:2512.05198 confirms "resizing the pixel mask to latent resolution discards fine structure ... downsamples by 1/8" and that linear latent blending "cannot be pixel-equivalent." Higher compression = more high-frequency loss (arXiv:2305.02541).
|
||||
|
||||
**Sources.** https://onlinelibrary.wiley.com/doi/10.1111/cgf.70040 · https://differential-diffusion.github.io/ · https://github.com/exx8/differential-diffusion · https://arxiv.org/abs/2512.05198 · https://omriavrahami.com/blended-latent-diffusion-page/ · https://arxiv.org/pdf/2305.02541
|
||||
|
||||
### Finding 2 — confidence: low (do not build on it yet)
|
||||
|
||||
**Claim.** Pixel-space differential / blended-latent variants exist as a research direction, but the specific full-resolution-mask solution (PELC/DecFormer, arXiv:2512.05198) was NOT verified to deliver its claimed seam/edge improvements.
|
||||
|
||||
**Evidence.** arXiv:2512.05198 argues linear latent blending is not pixel-equivalent and proposes decoder-equivariant compositing; PixPerfect (arXiv:2512.03247) does pixel-space refinement of chromatic shifts at edit boundaries. But the specific PELC full-resolution-mask and DecFormer "53% error reduction" claims were **refuted on adversarial vote (0-3 and 1-2)**. Treat pixel-equivalent latent compositing as an emerging idea to watch, not a production fix.
|
||||
|
||||
**Sources.** https://arxiv.org/abs/2512.05198 · https://arxiv.org/abs/2512.03247
|
||||
|
||||
### Finding 3 — confidence: high
|
||||
|
||||
**Claim.** Glyph/text-conditioned diffusion can re-render detected text as freshly synthesized pixels (not copied), which inherently scrubs any watermark in the text region while rendering glyphs crisply. AnyText/AnyText2 inject text-rendering into a pretrained T2I model and support generation AND editing of existing scene images; multilingual including CJK and English.
|
||||
|
||||
**Evidence.** AnyText2 "enables precise control over multilingual text attributes in natural scene image generation and editing" (WriteNet+AttnX); +3.3% (Chinese) / +9.3% (English) accuracy over AnyText v1. AnyText "can be plugged into existing diffusion models ... for rendering or editing text" and synthesizes text latent features through diffusion (fresh pixels), supporting zh/en/ja/ko/ar/bn/hi. **Caveat:** both are SD1.5-based, so NOT a drop-in into the SDXL scrub (separate base model); AnyText's own limitation: "the inpainting manner ... impedes editing quality on small text," and it ranks weak on STRICT (EMNLP 2025) — small-text crispness not guaranteed.
|
||||
|
||||
**Sources.** https://github.com/tyxsspa/AnyText2 · https://arxiv.org/abs/2411.15245 · https://arxiv.org/abs/2311.03054
|
||||
|
||||
### Finding 4 — confidence: high
|
||||
|
||||
**Claim.** EasyText is a strong glyph-conditioned re-render candidate: built on the FLUX-dev DiT framework with LoRA tuning, renders compact per-character glyph patches (64px-high adaptive for alphabetic, 64x64 for logographic) concatenated in latent space, supports 10+ languages including Chinese, Japanese, Korean, Thai, Vietnamese, Greek, and Latin.
|
||||
|
||||
**Evidence.** AAAI 2025 + arXiv:2505.24417: "implemented based on the open-source FLUX-dev framework with LoRA-based parameter-efficient tuning," VAE and text encoder frozen, two-stage 512->1024 training. Glyph conditioning via "64-pixel-high images ... adaptive widths for alphabetic; fixed 64x64 for logographic," VAE-encoded and concatenated with denoised latents, "less than one-tenth the spatial size of layout-matching methods." FLUX-based (16-channel VAE, DiT) also sidesteps the SDXL 4-channel wall. Fresh-pixel generation preserves the watermark-removal guarantee. Cyrillic/Arabic crispness not separately benchmarked.
|
||||
|
||||
**Sources.** https://arxiv.org/html/2505.24417 · https://ojs.aaai.org/index.php/AAAI/article/view/37697
|
||||
|
||||
### Finding 5 — confidence: high
|
||||
|
||||
**Claim.** A two-stage "global watermark scrub then text-restoration pass" architecture is validated by recent literature, and the restoration stage can synthesize glyph pixels from priors (no original-pixel reintroduction). TIGER reconstructs stroke geometry then injects it as guidance into full-image super-resolution; TextSR uses a detector + multilingual OCR to regenerate text from character-shape priors; TeReDiff/TAIR couples a jointly-trained text-spotter with diffusion.
|
||||
|
||||
**Evidence.** TIGER (arXiv:2510.21590): "a diffusion-based local text refiner ... reconstructing fine-grained stroke geometry ... injected as conditional guidance into the subsequent full-image restoration." TextSR (arXiv:2505.23119, Google): "leverages a text detector ... then employs OCR to extract multilingual text," regenerating from "multilingual character-to-shape diffusion priors" that "produce character shapes solely based on text prompts, even without visual input" — fresh pixels. TAIR/TeReDiff (ICLR 2026): standard restoration "frequently generates plausible but incorrect textures"; TeReDiff feeds text-spotter outputs back as prompts. **Caveat:** TIGER orders text-first then global (reverse of scrub-then-text); these target degraded-input super-resolution, not watermark removal, so the SynthID-scrub of the restoration stage must be verified empirically (the stages are themselves diffusion-based, so fresh-pixel = no SynthID is plausible but unproven here).
|
||||
|
||||
**Sources.** https://arxiv.org/html/2510.21590v1 · https://arxiv.org/html/2505.23119v1 · https://cvlab-kaist.github.io/TAIR/ · https://arxiv.org/abs/2506.09993
|
||||
|
||||
### Finding 6 — confidence: high
|
||||
|
||||
**Claim.** Switching to a 16-channel VAE (SD3/FLUX class) materially reduces small-text/latent softening vs SDXL's 4-channel VAE, but it requires switching the base model — not a drop-in latent swap into an SDXL UNet img2img pipeline. RAE approaches are DiT-native and likewise not drop-in.
|
||||
|
||||
**Evidence.** SD3/FLUX moved from 4-channel (48x) to 16-channel (12x) VAEs specifically to preserve fine detail (diffusers Discussion #8713; madebyollin VAE notes; arXiv:2305.02541). RAE (arXiv:2510.11690) "should be the new default for diffusion transformer training" but produces high-dimensional latents needing a DiT wide-DDT head — NOT compatible with an SDXL 4-channel UNet. EasyText shows the practical path: adopt a FLUX-DiT base rather than retrofit SDXL. The VAE upgrade couples to a base-model migration.
|
||||
|
||||
**Sources.** https://arxiv.org/abs/2510.11690 · https://arxiv.org/pdf/2305.02541 · https://arxiv.org/html/2505.24417
|
||||
|
||||
## Recommendation
|
||||
|
||||
Under the hard constraint, the correct architecture is **not "protect text during the
|
||||
scrub" (Differential Diffusion)** but **"scrub everywhere, then restore text crisply by
|
||||
regeneration"**:
|
||||
|
||||
1. Global SDXL scrub with text protection OFF (text region is scrubbed too).
|
||||
2. On detected text regions, a **glyph-conditioned restoration** that re-renders the same
|
||||
glyphs as fresh pixels (no original reused).
|
||||
|
||||
This is the only path that delivers both "watermark everywhere" and crisp text.
|
||||
|
||||
**Top-2 to prototype:**
|
||||
- **TextSR** — detector + multilingual OCR + character-shape diffusion priors; closest to
|
||||
the existing detector-driven pipeline.
|
||||
- **EasyText** — FLUX-DiT glyph re-render, multilingual incl. CJK; also gets the 16-channel
|
||||
VAE for free.
|
||||
|
||||
**Honest costs / unknowns:** this is a re-architecture, not a quick fix. It needs a new
|
||||
**OCR-recognition** step (we currently only detect text; we must know *what* to re-render).
|
||||
Models are FLUX/DiT-class (heavy) -> serverless GPU. Maturity is research-grade; CJK is
|
||||
covered, Cyrillic/Arabic crispness is not separately benchmarked -> a prototype must
|
||||
measure real fidelity. The restoration stage being diffusion-based makes "fresh pixels =
|
||||
no SynthID" plausible but **must be verified empirically** (run the SynthID oracle on the
|
||||
restored output).
|
||||
|
||||
**Constraint-compatible quick win to try first:** run the global scrub at **higher
|
||||
resolution / tiled** so strokes exceed the latent cell — less softening, full scrub, no
|
||||
freezing. Cheap to test; quantify recall/quality vs cost.
|
||||
|
||||
**Do not pursue:** raising `preserve` toward 1.0 or pixel paste-back (both leave original
|
||||
watermarked pixels in text); PELC/DecFormer pixel-equivalent latent compositing (refuted,
|
||||
not production-ready).
|
||||
|
||||
## Provenance
|
||||
|
||||
Deep-research workflow run `wf_118b9a03-3eb` (2026-05-29). Findings adversarially verified
|
||||
(2/3 refutes required to kill a claim). This note records research only; no code change is
|
||||
implied until a prototype validates fidelity and the SynthID-scrub guarantee on the
|
||||
restored output.
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 8.0 KiB |
+46
-113
@@ -20,12 +20,12 @@ from rich.panel import Panel
|
||||
from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeElapsedColumn
|
||||
from rich.table import Table
|
||||
|
||||
from remove_ai_watermarks import __version__
|
||||
from remove_ai_watermarks import __version__, watermark_registry
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from numpy.typing import NDArray
|
||||
|
||||
from remove_ai_watermarks.gemini_engine import DetectionResult, GeminiEngine
|
||||
from remove_ai_watermarks.gemini_engine import DetectionResult
|
||||
|
||||
console = Console()
|
||||
|
||||
@@ -133,72 +133,6 @@ def _write_bgr_with_alpha(
|
||||
image_io.imwrite(path, bgra)
|
||||
|
||||
|
||||
def _run_doubao_if_selected(
|
||||
ctx: click.Context,
|
||||
image: NDArray[Any],
|
||||
alpha: NDArray[Any] | None,
|
||||
output: Path,
|
||||
mark: str,
|
||||
gemini_engine: GeminiEngine,
|
||||
detect: bool,
|
||||
detect_threshold: float,
|
||||
inpaint_method: str,
|
||||
strip_metadata: bool,
|
||||
) -> bool:
|
||||
"""Run the Doubao text-strip removal path when it is the selected mark.
|
||||
|
||||
Returns True when this path handled the image (caller should stop). In
|
||||
``auto`` mode the Doubao detector competes with the Gemini detector and wins
|
||||
only when it is both positive and at least as confident.
|
||||
"""
|
||||
from remove_ai_watermarks.doubao_engine import DoubaoEngine
|
||||
|
||||
doubao = DoubaoEngine()
|
||||
d_det = doubao.detect(image)
|
||||
|
||||
if mark == "auto":
|
||||
g_det = gemini_engine.detect_watermark(image)
|
||||
use_doubao = d_det.detected and d_det.confidence >= g_det.confidence
|
||||
console.print(
|
||||
f" [dim]Mark auto:[/] gemini={g_det.confidence:.2f} doubao={d_det.confidence:.2f} "
|
||||
f"-> {'doubao' if use_doubao else 'gemini'}"
|
||||
)
|
||||
else:
|
||||
use_doubao = mark == "doubao"
|
||||
|
||||
if not use_doubao:
|
||||
return False
|
||||
|
||||
if detect and not d_det.detected and d_det.confidence < detect_threshold:
|
||||
console.print(
|
||||
f" [yellow]⚠[/] Doubao mark not detected [dim](coverage {d_det.coverage:.1%}). "
|
||||
f"Use --no-detect to force.[/]"
|
||||
)
|
||||
raise SystemExit(0)
|
||||
|
||||
method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
|
||||
t0 = time.monotonic()
|
||||
with console.status("[cyan]Removing Doubao watermark…[/]"):
|
||||
result = doubao.remove_watermark(image, inpaint_method=method)
|
||||
elapsed = time.monotonic() - t0
|
||||
|
||||
output.parent.mkdir(parents=True, exist_ok=True)
|
||||
_write_bgr_with_alpha(output, result, alpha, clear_region=d_det.region)
|
||||
|
||||
if strip_metadata:
|
||||
try:
|
||||
from remove_ai_watermarks.metadata import remove_ai_metadata
|
||||
|
||||
remove_ai_metadata(output, output)
|
||||
except Exception as e:
|
||||
if ctx.obj.get("verbose"):
|
||||
console.print(f" [yellow]⚠[/] Failed to strip metadata: {e}")
|
||||
|
||||
size_kb = output.stat().st_size / 1024
|
||||
console.print(f" [green]✓[/] Doubao mark removed → {output} [dim]({size_kb:.0f} KB, {elapsed:.2f}s)[/]")
|
||||
return True
|
||||
|
||||
|
||||
# ── Main group ───────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@@ -238,9 +172,10 @@ def main(ctx: click.Context, verbose: bool) -> None:
|
||||
@click.option("--detect-threshold", type=float, default=0.25, help="Detection confidence threshold.")
|
||||
@click.option(
|
||||
"--mark",
|
||||
type=click.Choice(["auto", "gemini", "doubao"]),
|
||||
type=click.Choice(["auto", *watermark_registry.mark_keys()]),
|
||||
default="auto",
|
||||
help="Which visible mark to target. auto picks the stronger of the two detectors.",
|
||||
help="Which known visible mark to target (auto picks the strongest detected). "
|
||||
"All marks are removed by exact reverse-alpha against a captured alpha map.",
|
||||
)
|
||||
@click.option("--strip-metadata/--keep-metadata", default=True, help="Strip AI metadata from output.")
|
||||
@click.pass_context
|
||||
@@ -256,13 +191,14 @@ def cmd_visible(
|
||||
mark: str,
|
||||
strip_metadata: bool,
|
||||
) -> None:
|
||||
"""Remove a visible AI watermark from an image.
|
||||
"""Remove a known visible AI watermark from an image.
|
||||
|
||||
Targets the Gemini sparkle logo (reverse alpha blending) or the Doubao
|
||||
"豆包AI生成" text strip (locate -> mask -> inpaint). Fast, deterministic,
|
||||
offline. ``--mark auto`` picks whichever detector fires stronger.
|
||||
Finds a known mark in its usual place (Gemini sparkle / Doubao text) via the
|
||||
watermark registry and removes it by exact reverse-alpha against a captured
|
||||
alpha map -- recovering the true pixels, not an inpaint guess. ``--mark auto``
|
||||
picks the strongest detected mark. For arbitrary logos/objects, use ``erase``.
|
||||
"""
|
||||
from remove_ai_watermarks.gemini_engine import GeminiEngine
|
||||
from remove_ai_watermarks import watermark_registry as registry
|
||||
|
||||
_banner()
|
||||
source = _validate_image(source)
|
||||
@@ -270,8 +206,6 @@ def cmd_visible(
|
||||
if output is None:
|
||||
output = source.with_stem(source.stem + "_clean")
|
||||
|
||||
engine = GeminiEngine()
|
||||
|
||||
# Load image (preserving any alpha channel separately)
|
||||
image, alpha = _read_bgr_and_alpha(source)
|
||||
if image is None:
|
||||
@@ -281,45 +215,44 @@ def cmd_visible(
|
||||
h, w = image.shape[:2]
|
||||
console.print(f" [dim]Input:[/] {source.name} ({w}x{h})")
|
||||
|
||||
# Resolve which visible mark to target, then run the Doubao path if chosen.
|
||||
if _run_doubao_if_selected(
|
||||
ctx, image, alpha, output, mark, engine, detect, detect_threshold, inpaint_method, strip_metadata
|
||||
):
|
||||
return
|
||||
|
||||
# Detection (we always detect softly, to find dynamic region for inpainting)
|
||||
with console.status("[cyan]Detecting watermark…[/]"):
|
||||
det = engine.detect_watermark(image)
|
||||
|
||||
if detect:
|
||||
if det.detected:
|
||||
console.print(
|
||||
f" [green]✓[/] Watermark detected "
|
||||
f"[dim](confidence: {det.confidence:.1%}, "
|
||||
f"spatial: {det.spatial_score:.3f}, "
|
||||
f"gradient: {det.gradient_score:.3f})[/]"
|
||||
)
|
||||
else:
|
||||
console.print(f" [yellow]⚠[/] Watermark not detected [dim](confidence: {det.confidence:.1%})[/]")
|
||||
if det.confidence < detect_threshold:
|
||||
console.print(" [dim]Skipping. Use --no-detect to force removal.[/]")
|
||||
# Resolve the target mark from the known-watermark registry. ``auto`` scans
|
||||
# every in-auto mark in its usual place and picks the strongest; an explicit
|
||||
# ``--mark <key>`` targets that one (the user asserts its presence).
|
||||
if mark == "auto":
|
||||
best = registry.best_auto_mark(image)
|
||||
if best is None:
|
||||
console.print(" [yellow]⚠[/] No known visible mark detected (gemini / doubao).")
|
||||
if detect:
|
||||
console.print(" [dim]Skipping. Use --mark <name> --no-detect to force.[/]")
|
||||
raise SystemExit(0)
|
||||
target = "gemini" # forced (no-detect): fall back to the default mark
|
||||
else:
|
||||
target = best.key
|
||||
console.print(f" [dim]Mark auto:[/] {best.label} [dim]({best.location}, conf {best.confidence:.2f})[/]")
|
||||
else:
|
||||
target = mark
|
||||
|
||||
# Removal
|
||||
chosen = registry.get_mark(target)
|
||||
det = chosen.detect(image)
|
||||
if detect and not det.detected:
|
||||
console.print(
|
||||
f" [yellow]⚠[/] {chosen.label} not detected "
|
||||
f"[dim](conf {det.confidence:.2f}). Use --no-detect to force.[/]"
|
||||
)
|
||||
raise SystemExit(0)
|
||||
if det.detected:
|
||||
console.print(f" [green]✓[/] {chosen.label} detected [dim]({chosen.location}, conf {det.confidence:.2f})[/]")
|
||||
|
||||
method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
|
||||
t0 = time.monotonic()
|
||||
region: tuple[int, int, int, int] | None = None
|
||||
with console.status("[cyan]Removing watermark…[/]"):
|
||||
result = engine.remove_watermark(image)
|
||||
|
||||
if inpaint:
|
||||
region = _watermark_region(det, w, h)
|
||||
result = engine.inpaint_residual(
|
||||
result,
|
||||
region,
|
||||
strength=inpaint_strength,
|
||||
method=inpaint_method,
|
||||
)
|
||||
|
||||
with console.status(f"[cyan]Removing {chosen.label}… ({chosen.recovery})[/]"):
|
||||
result, region = chosen.remove(
|
||||
image,
|
||||
inpaint_method=method,
|
||||
inpaint=inpaint,
|
||||
inpaint_strength=inpaint_strength,
|
||||
force=not detect,
|
||||
)
|
||||
elapsed = time.monotonic() - t0
|
||||
|
||||
# Save (preserves transparency by clearing alpha in the watermark region)
|
||||
|
||||
@@ -1,29 +1,24 @@
|
||||
"""Doubao visible watermark removal engine.
|
||||
|
||||
Doubao (ByteDance) stamps every generated image with a visible "豆包AI生成"
|
||||
(Doubao AI generated) text strip in the bottom-right corner. This is the
|
||||
explicit AIGC label mandated by China's TC260 standard, rendered as a
|
||||
near-white / light-gray, low-saturation text overlay.
|
||||
(Doubao AI generated) text strip in the bottom-right corner -- the explicit AIGC
|
||||
label mandated by China's TC260 standard, a near-white semi-transparent overlay.
|
||||
|
||||
Unlike the Gemini sparkle (a fixed square logo removed by reverse alpha
|
||||
blending against a captured alpha map), the Doubao mark is a text strip whose
|
||||
exact alpha map we do not yet have. This engine therefore removes it by:
|
||||
Like the Gemini sparkle, it is a fixed overlay, so it is removed by **exact
|
||||
reverse-alpha blending** against a captured alpha map (``remove_watermark_reverse_alpha``):
|
||||
``original = (wm - a*logo)/(1-a)`` -- recovering the true pixels, not an inpaint
|
||||
guess. The alpha map + logo colour were solved from black+gray Doubao captures
|
||||
(see data/doubao_capture/ and the reverse-alpha section below) and bundled as
|
||||
``assets/doubao_alpha.png``.
|
||||
|
||||
locate -> mask -> inpaint
|
||||
Detection (``detect``) is reverse-alpha-consistent: it matches that same alpha
|
||||
glyph silhouette against the corner via normalized correlation, so it keys on
|
||||
the actual "豆包AI生成" shape rather than coverage/structure heuristics.
|
||||
|
||||
1. Locate: the mark scales with image WIDTH and sits in the bottom-right at a
|
||||
fixed margin, so we anchor a generous box there (geometry only -- no bundled
|
||||
template). Constants below are derived from measured Doubao output.
|
||||
2. Mask: within the box, extract the light, low-saturation glyph pixels with a
|
||||
polarity-aware rule (the mark is brighter than dark backgrounds and a
|
||||
distinct off-white gray against light backgrounds).
|
||||
3. Inpaint: cv2 inpainting (TELEA / NS) reconstructs the covered pixels.
|
||||
|
||||
This is fast, offline, deterministic, and needs no GPU. A future upgrade path
|
||||
is per-pixel reverse alpha blending once a Doubao alpha map is captured on a
|
||||
controlled black background (see data/doubao_capture/), which would recover the
|
||||
true pixels instead of hallucinating them -- the same approach as the Gemini
|
||||
engine.
|
||||
``locate`` (geometry box, scales with image WIDTH) and ``extract_mask`` (the
|
||||
candidate glyph mask the detector correlates) remain; there is no inpaint-based
|
||||
removal here -- arbitrary-region inpainting lives in ``region_eraser`` / the
|
||||
``erase`` command. Fast, offline, no GPU.
|
||||
"""
|
||||
|
||||
# cv2/numpy boundary: third-party libs ship no usable element types; relax the
|
||||
@@ -33,7 +28,7 @@ from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import TYPE_CHECKING, Any, Literal
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
@@ -66,17 +61,63 @@ MAX_SATURATION = 55 # max channel spread to count a pixel as "grayish"
|
||||
LOGO_MIN_LUMA = 150 # glyphs are at least this bright in absolute terms
|
||||
TOPHAT_DELTA = 12 # glyph must exceed the local background by this many levels
|
||||
|
||||
# Detection: a genuine label fills a meaningful fraction of the box. Measured
|
||||
# coverage is >=0.20 on real Doubao outputs; random/textured corners stay <=0.06
|
||||
# on large images but can spike to ~0.15 on tiny ones (small box -> high variance),
|
||||
# so the threshold sits above that spike and below the real-mark floor.
|
||||
DETECT_MIN_COVERAGE = 0.16
|
||||
# Detection is reverse-alpha-consistent: the mark is recognized by matching the
|
||||
# bundled alpha-template glyph silhouette (assets/doubao_alpha.png -- the exact
|
||||
# shape we invert) against the extracted candidate mask via zero-mean normalized
|
||||
# correlation (cv2 TM_CCOEFF_NORMED). It keys on the actual "豆包AI生成" glyph
|
||||
# SHAPE, not on coverage/structure heuristics, so a merely-textured corner does
|
||||
# not fire (the old coverage detector false-positived on ~28% of images; #23).
|
||||
# Corpus-tuned: real marks score median ~0.61, arbitrary corners <=0.17 (p99);
|
||||
# threshold 0.4 -> false positives 7/1243 (0.6%). A small coverage floor skips
|
||||
# the template match on a near-empty candidate box.
|
||||
DETECT_MIN_COVERAGE = 0.04
|
||||
DETECT_NCC_THRESHOLD = 0.4
|
||||
|
||||
# Safety: a text strip fills a modest slice of the (generous) box. When the box
|
||||
# is over a dense-text / document background the mask explodes and cv2 inpainting
|
||||
# would smear the real content. Above this coverage we refuse to inpaint and
|
||||
# leave the image untouched -- that hard case needs the neural path, not a guess.
|
||||
MAX_INPAINT_COVERAGE = 0.50
|
||||
# ── Reverse-alpha (exact recovery, Gemini-style) ─────────────────────
|
||||
# The Doubao mark is a fixed semi-transparent white overlay, so given its alpha
|
||||
# map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a).
|
||||
# The alpha map + logo colour were solved from black+gray Doubao captures on a
|
||||
# controlled background (data/doubao_capture/): on black, captured = a*logo, and
|
||||
# the black/gray pair solves a per-pixel WITHOUT assuming the logo colour. The
|
||||
# bundled asset (assets/doubao_alpha.png) is the alpha template (a*255) at the
|
||||
# captured width. The mark scales with image WIDTH, but a pure width-scale is
|
||||
# only sub-pixel-accurate at the captured width and ghosts elsewhere, so removal
|
||||
# does NOT trust fixed geometry: `_aligned_alpha_map` registers the template to
|
||||
# the actual mark by a TM_CCOEFF_NORMED scale+position search, which makes the
|
||||
# single capture work at any resolution (verified clean on 1773x2364). Verified
|
||||
# 2026-05-29: white-capture cross-check -> mark vanishes to a flat fill; clean on
|
||||
# doubao-1.png (2048) and the 3:4 portrait corpus size.
|
||||
_ALPHA_NATIVE_WIDTH = 2048
|
||||
_ALPHA_LOGO_BGR: tuple[float, float, float] = (252.0, 255.0, 255.0)
|
||||
_ALPHA_WIDTH_FRAC = 0.1572 # glyph width / image width -- the alignment scale seed
|
||||
_ALPHA_HEIGHT_FRAC = 0.0347
|
||||
# Margins (of image WIDTH) of the captured mark -- the geometry record / where to
|
||||
# seed; alignment refines the actual position, so these are not load-bearing.
|
||||
_ALPHA_MARGIN_RIGHT_FRAC = 0.0166
|
||||
_ALPHA_MARGIN_BOTTOM_FRAC = 0.0195
|
||||
# Alignment scale search (np.linspace args) around the width-scaled glyph size.
|
||||
_ALPHA_ALIGN_SEARCH = (0.88, 1.12, 13)
|
||||
# At (near) the captured width the fixed geometry is pixel-exact, so we use it
|
||||
# directly there -- NCC alignment is integer-pixel and would land ~1px off,
|
||||
# degrading the otherwise-exact native recovery. Off this band, alignment wins.
|
||||
_ALPHA_NATIVE_BAND = 0.03
|
||||
_alpha_template_cache: NDArray[Any] | None = None
|
||||
|
||||
|
||||
def _alpha_template() -> NDArray[Any] | None:
|
||||
"""Lazily load the bundled Doubao alpha template (float [0,1]), or None."""
|
||||
global _alpha_template_cache
|
||||
if _alpha_template_cache is None:
|
||||
from pathlib import Path
|
||||
|
||||
from remove_ai_watermarks import image_io
|
||||
|
||||
path = Path(__file__).parent / "assets" / "doubao_alpha.png"
|
||||
img = image_io.imread(str(path), cv2.IMREAD_GRAYSCALE)
|
||||
if img is None:
|
||||
return None
|
||||
_alpha_template_cache = img.astype(np.float32) / 255.0
|
||||
return _alpha_template_cache
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
@@ -104,6 +145,39 @@ class DoubaoDetection:
|
||||
coverage: float = 0.0 # fraction of the box occupied by glyph pixels
|
||||
|
||||
|
||||
_silhouette_cache: NDArray[Any] | None = None
|
||||
|
||||
|
||||
def _glyph_silhouette() -> NDArray[Any] | None:
|
||||
"""Binary "豆包AI生成" silhouette (255 = glyph) from the bundled alpha map,
|
||||
used as the detection template. None if the alpha asset is missing."""
|
||||
global _silhouette_cache
|
||||
if _silhouette_cache is None:
|
||||
at = _alpha_template()
|
||||
if at is None:
|
||||
return None
|
||||
_silhouette_cache = (at > 0.15).astype(np.uint8) * 255
|
||||
return _silhouette_cache
|
||||
|
||||
|
||||
def _template_match_score(box_mask: NDArray[Any], image_width: int) -> float:
|
||||
"""Zero-mean normalized correlation of the alpha-template glyph silhouette
|
||||
(scaled to the mark's expected size) against the candidate ``box_mask``.
|
||||
|
||||
TM_CCOEFF_NORMED keys on glyph SHAPE, not coverage, so a dense textured
|
||||
corner does not score highly -- only the actual "豆包AI生成" shape does.
|
||||
"""
|
||||
sil = _glyph_silhouette()
|
||||
if sil is None or box_mask.size == 0:
|
||||
return 0.0
|
||||
gw = min(box_mask.shape[1] - 1, max(8, int(_ALPHA_WIDTH_FRAC * image_width)))
|
||||
gh = min(box_mask.shape[0] - 1, max(4, int(_ALPHA_HEIGHT_FRAC * image_width)))
|
||||
if gw < 8 or gh < 4:
|
||||
return 0.0
|
||||
template = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST)
|
||||
return float(cv2.matchTemplate(box_mask, template, cv2.TM_CCOEFF_NORMED).max())
|
||||
|
||||
|
||||
class DoubaoEngine:
|
||||
"""Remove the visible Doubao "豆包AI生成" watermark (locate -> mask -> inpaint)."""
|
||||
|
||||
@@ -176,10 +250,12 @@ class DoubaoEngine:
|
||||
# ── Detect ────────────────────────────────────────────────────────
|
||||
|
||||
def detect(self, image: NDArray[Any]) -> DoubaoDetection:
|
||||
"""Detect the visible Doubao mark by glyph coverage in the corner box.
|
||||
"""Detect the visible Doubao mark by matching the alpha-template glyph
|
||||
silhouette against the corner candidate (TM_CCOEFF_NORMED).
|
||||
|
||||
Heuristic: a genuine label fills a meaningful fraction of the box with
|
||||
text-like glyph pixels. Coverage maps to a confidence score.
|
||||
Keys on the "豆包AI生成" SHAPE, not coverage, so a textured corner does
|
||||
not fire. ``confidence`` is the correlation score; ``detected`` is it
|
||||
clearing ``DETECT_NCC_THRESHOLD``.
|
||||
"""
|
||||
det = DoubaoDetection()
|
||||
if image is None or image.size == 0:
|
||||
@@ -191,53 +267,113 @@ class DoubaoEngine:
|
||||
coverage = float((box > 0).sum()) / float(max(1, bw * bh))
|
||||
det.region = loc.bbox
|
||||
det.coverage = coverage
|
||||
# Map coverage to a 0-1 confidence: ~0.06 (noise floor) -> 0, ~0.26 -> 1.
|
||||
det.confidence = float(max(0.0, min(1.0, (coverage - 0.06) / 0.20)))
|
||||
det.detected = coverage >= DETECT_MIN_COVERAGE
|
||||
logger.debug("Doubao detect: coverage=%.3f conf=%.3f", coverage, det.confidence)
|
||||
if coverage >= DETECT_MIN_COVERAGE:
|
||||
score = _template_match_score(box, image.shape[1])
|
||||
det.confidence = score
|
||||
det.detected = score >= DETECT_NCC_THRESHOLD
|
||||
logger.debug("Doubao detect: coverage=%.3f ncc=%.2f detected=%s", coverage, score, det.detected)
|
||||
return det
|
||||
|
||||
# ── Remove ────────────────────────────────────────────────────────
|
||||
# ── Reverse-alpha (exact recovery) ────────────────────────────────
|
||||
|
||||
def remove_watermark(
|
||||
self,
|
||||
image: NDArray[Any],
|
||||
*,
|
||||
inpaint_method: Literal["telea", "ns"] = "telea",
|
||||
inpaint_radius: int = 6,
|
||||
dilate: int = 3,
|
||||
) -> NDArray[Any]:
|
||||
"""Remove the visible Doubao watermark by inpainting the glyph mask.
|
||||
def reverse_alpha_available(self, image: NDArray[Any]) -> bool:
|
||||
"""True if the bundled alpha map is loadable. Sub-pixel NCC alignment
|
||||
(see ``_aligned_alpha_map``) places it on the actual mark at ANY
|
||||
resolution, so there is no width gate -- the caller still gates on
|
||||
``detect`` so a clean corner is never touched."""
|
||||
return image is not None and image.size > 0 and _alpha_template() is not None
|
||||
|
||||
Returns an unmodified copy when no glyph pixels are found (so we never
|
||||
smear a clean corner). ``dilate`` grows the mask to cover anti-aliased
|
||||
glyph edges before inpainting.
|
||||
"""
|
||||
if image is None or image.size == 0:
|
||||
return image
|
||||
def _fixed_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None:
|
||||
"""Place the template by fixed width-relative geometry -- pixel-exact at
|
||||
the captured width (used there instead of integer-pixel NCC alignment)."""
|
||||
at = _alpha_template()
|
||||
if at is None:
|
||||
return None
|
||||
h, w = image.shape[:2]
|
||||
gw, gh = max(1, int(_ALPHA_WIDTH_FRAC * w)), max(1, int(_ALPHA_HEIGHT_FRAC * w))
|
||||
ax = max(0, w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw)
|
||||
ay = max(0, h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh)
|
||||
amap = np.zeros((h, w), np.float32)
|
||||
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
|
||||
return amap, (ax, ay, gw, gh)
|
||||
|
||||
def _aligned_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None:
|
||||
"""Build a full-image alpha map with the captured template registered to
|
||||
the actual mark via a TM_CCOEFF_NORMED scale + position search -- so the
|
||||
single capture works off the captured width (a pure width-scale ghosts).
|
||||
Returns ``(alpha_map, glyph_bbox)`` or None."""
|
||||
at = _alpha_template()
|
||||
sil = _glyph_silhouette()
|
||||
if at is None or sil is None:
|
||||
return None
|
||||
h, w = image.shape[:2]
|
||||
loc = self.locate(image)
|
||||
mask = self.extract_mask(image, loc)
|
||||
if not mask.any():
|
||||
logger.debug("Doubao remove: no glyph pixels found; returning copy")
|
||||
bx, by, bw, bh = loc.bbox
|
||||
box_mask = self.extract_mask(image, loc)[by : by + bh, bx : bx + bw]
|
||||
expected = _ALPHA_WIDTH_FRAC * w
|
||||
best: tuple[float, int, int, int, int] | None = None
|
||||
for scale in np.linspace(*_ALPHA_ALIGN_SEARCH):
|
||||
gw, gh = int(expected * scale), int(_ALPHA_HEIGHT_FRAC * w * scale)
|
||||
if gw < 8 or gh < 4 or gw >= bw or gh >= bh:
|
||||
continue
|
||||
t = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST)
|
||||
_, score, _, top_left = cv2.minMaxLoc(cv2.matchTemplate(box_mask, t, cv2.TM_CCOEFF_NORMED))
|
||||
if best is None or score > best[0]:
|
||||
best = (score, gw, gh, top_left[0], top_left[1])
|
||||
if best is None:
|
||||
return None
|
||||
_, gw, gh, ox, oy = best
|
||||
ax, ay = bx + ox, by + oy
|
||||
amap = np.zeros((h, w), np.float32)
|
||||
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
|
||||
return amap, (ax, ay, gw, gh)
|
||||
|
||||
def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``."""
|
||||
a3 = np.clip(amap, 0.0, 1.0)[:, :, None]
|
||||
logo = np.array(_ALPHA_LOGO_BGR, np.float32)
|
||||
return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
|
||||
|
||||
def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]:
|
||||
"""Recover the original pixels by inverting the alpha blend
|
||||
``original = (wm - a*logo)/(1-a)``.
|
||||
|
||||
Placement: at (near) the captured width the fixed geometry is pixel-exact,
|
||||
so the recovery is returned UNTOUCHED -- inpainting over exactly-recovered
|
||||
interior pixels only swaps them for a cv2 hallucination (measured worse on
|
||||
textured backgrounds: native error vs true bg 1.6 reverse-alpha-only vs
|
||||
2.6 with full-footprint inpaint). Off-native, NCC alignment registers the
|
||||
template to the real mark; the alignment is only sub-pixel-approximate, so
|
||||
the interior recovery is no longer exact and the seam can re-trip the
|
||||
detector. There we try BOTH placements and keep whichever leaves the least
|
||||
residual mark (on a faint/busy-background mark the NCC peak can wander a
|
||||
few px, where geometry wins; on a clear mark alignment wins) -- no magic
|
||||
threshold, it just picks the better removal -- then a residual inpaint over
|
||||
the glyph footprint cleans the seam (the interior is approximate anyway, so
|
||||
inpaint there costs nothing and reliably clears the mark).
|
||||
Call only when :meth:`reverse_alpha_available` and the mark is detected.
|
||||
"""
|
||||
at_native = abs(image.shape[1] / _ALPHA_NATIVE_WIDTH - 1.0) <= _ALPHA_NATIVE_BAND
|
||||
if at_native:
|
||||
amap = self._fixed_alpha_map(image)
|
||||
return self._apply_reverse_alpha(image, amap[0]) if amap is not None else image.copy()
|
||||
maps = [c for c in (self._fixed_alpha_map(image), self._aligned_alpha_map(image)) if c is not None]
|
||||
if not maps:
|
||||
return image.copy()
|
||||
|
||||
x, y, bw, bh = loc.bbox
|
||||
coverage = float((mask[y : y + bh, x : x + bw] > 0).sum()) / float(max(1, bw * bh))
|
||||
if coverage > MAX_INPAINT_COVERAGE:
|
||||
logger.warning(
|
||||
"Doubao remove: box coverage %.2f exceeds %.2f (dense-text/document "
|
||||
"background); leaving image untouched to avoid smearing content",
|
||||
coverage,
|
||||
MAX_INPAINT_COVERAGE,
|
||||
)
|
||||
best_out: NDArray[Any] | None = None
|
||||
best_amap: NDArray[Any] | None = None
|
||||
best_residual = float("inf")
|
||||
for amap, _region in maps:
|
||||
out = self._apply_reverse_alpha(image, amap)
|
||||
residual = self.detect(out).confidence
|
||||
if residual < best_residual:
|
||||
best_residual, best_out, best_amap = residual, out, amap
|
||||
if best_out is None or best_amap is None: # pragma: no cover - maps is non-empty
|
||||
return image.copy()
|
||||
|
||||
if dilate > 0:
|
||||
k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2 * dilate + 1, 2 * dilate + 1))
|
||||
mask = cv2.dilate(mask, k)
|
||||
|
||||
flag = cv2.INPAINT_TELEA if inpaint_method == "telea" else cv2.INPAINT_NS
|
||||
return cv2.inpaint(image, mask, inpaint_radius, flag)
|
||||
if residual_inpaint:
|
||||
rm = cv2.dilate((best_amap > 0.10).astype(np.uint8) * 255, np.ones((3, 3), np.uint8))
|
||||
best_out = cv2.inpaint(best_out, rm, 3, cv2.INPAINT_TELEA)
|
||||
return best_out
|
||||
|
||||
|
||||
def load_image_bgr(path: str | Path) -> NDArray[Any]:
|
||||
|
||||
@@ -25,14 +25,15 @@ from typing import TYPE_CHECKING
|
||||
from remove_ai_watermarks.metadata import (
|
||||
AI_METADATA_KEYS,
|
||||
AIGC_MARKERS,
|
||||
C2PA_UUID,
|
||||
IPTC_AI_FIELD_MARKERS,
|
||||
IPTC_AI_MARKERS,
|
||||
aigc_label,
|
||||
c2pa_marker_in,
|
||||
exif_generator,
|
||||
get_ai_metadata,
|
||||
huggingface_job,
|
||||
iptc_ai_system,
|
||||
samsung_genai,
|
||||
scan_head,
|
||||
xai_signature,
|
||||
)
|
||||
@@ -65,6 +66,8 @@ _ISSUER_PLATFORM: tuple[tuple[str, str], ...] = (
|
||||
("OpenAI", "OpenAI (ChatGPT / gpt-image / DALL-E / Sora)"),
|
||||
("Google", "Google (Gemini / Imagen)"),
|
||||
("Stability AI", "Stability AI (Stable Image / DreamStudio)"),
|
||||
("Black Forest Labs", "Black Forest Labs (FLUX)"),
|
||||
("ByteDance", "ByteDance (Doubao / Jimeng / Volcano Engine)"),
|
||||
)
|
||||
|
||||
# PNG-text / EXIF keys that indicate a local diffusion pipeline (vs. a hosted
|
||||
@@ -95,6 +98,12 @@ _HF_JOB_CAVEAT = (
|
||||
"generation) but names neither the model nor the content type, so it is a "
|
||||
"medium-confidence signal, not proof the pixels are AI-generated."
|
||||
)
|
||||
_SAMSUNG_GENAI_CAVEAT = (
|
||||
"Samsung's genAIType marker shows a Galaxy AI editing tool (Generative Edit, "
|
||||
"Sketch to Image, ...) touched the image; it is an undocumented proprietary "
|
||||
"field, so it is a medium-confidence signal of AI editing, not proof the "
|
||||
"whole image is AI-generated."
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -151,7 +160,9 @@ def _ai_tools_in(data: bytes) -> list[str]:
|
||||
# assert is_ai on their own (the verdict still comes from the digital-source-type:
|
||||
# the Pixel sample carries `computationalCapture`, not `trainedAlgorithmicMedia`).
|
||||
# Only tokens verified against a real signed file are listed (Leica, Nikon,
|
||||
# Truepic, Google Pixel); add Sony/Canon/Samsung/Bria as real samples are captured.
|
||||
# Sony, Truepic, Google Pixel); add Canon/Bria as real samples are captured.
|
||||
# Samsung Galaxy is an AI-capable editing device, not a pure-capture camera, so
|
||||
# it lives in `_SIGNER_C2PA_PLATFORM` below (it must not feed the camera clash).
|
||||
_DEVICE_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = (
|
||||
(b"lc_c2pa", "Leica (camera, C2PA capture)"),
|
||||
(b"Leica Camera", "Leica (camera, C2PA capture)"),
|
||||
@@ -177,6 +188,32 @@ def _device_platform(head: bytes) -> str | None:
|
||||
return None
|
||||
|
||||
|
||||
# C2PA signers that are an editing app or AI-capable device rather than a
|
||||
# verified-capture camera. Unlike `_DEVICE_C2PA_PLATFORM`, these do NOT feed the
|
||||
# camera-vs-AI integrity clash (rule 2 in `_integrity_clashes`): a Galaxy phone
|
||||
# legitimately stamps BOTH its device credentials AND a `trainedAlgorithmicMedia`
|
||||
# source type on a Generative-Edit image, so treating it as a "genuine camera
|
||||
# capture" would false-flag every Galaxy AI edit. They only resolve the platform
|
||||
# label; the AI verdict still comes from the digital-source-type / genAIType.
|
||||
# Tokens verified against real signed files (2026-05-29):
|
||||
# Samsung Galaxy -- cert org on Galaxy S23 FE / S24 / S25 C2PA JPEGs/PNGs
|
||||
# (distinct from the EXIF "SM-xxxx" model string on ordinary Samsung photos).
|
||||
# com.asus.gallery -- ASUS Gallery claim_generator (a C2PA-signed edit, no AI
|
||||
# source type or genAIType on the samples, so it never asserts is_ai).
|
||||
_SIGNER_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = (
|
||||
(b"Samsung Galaxy", "Samsung Galaxy (C2PA)"),
|
||||
(b"com.asus.gallery", "ASUS Gallery (C2PA signer)"),
|
||||
)
|
||||
|
||||
|
||||
def _signer_platform(head: bytes) -> str | None:
|
||||
"""Map a C2PA editing-app / AI-capable-device signer token to a platform."""
|
||||
for token, platform in _SIGNER_C2PA_PLATFORM:
|
||||
if token in head:
|
||||
return platform
|
||||
return None
|
||||
|
||||
|
||||
def _attribute_platform(issuers: list[str], *, is_ai: bool = True) -> str | None:
|
||||
"""Map a set of C2PA issuer names to a human-readable generating platform.
|
||||
|
||||
@@ -353,9 +390,10 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
# neither is a trustworthy "the generator stamped its identity" claim.
|
||||
ai_vendor_claims: dict[str, str] = {}
|
||||
camera_label = _device_platform(head)
|
||||
signer_label = _signer_platform(head)
|
||||
|
||||
# ── C2PA Content Credentials ────────────────────────────────────
|
||||
has_c2pa = bool(info) or b"c2pa" in head.lower() or C2PA_UUID in head
|
||||
has_c2pa = bool(info) or c2pa_marker_in(head)
|
||||
issuers = [info["issuer"]] if info.get("issuer") else _issuers_in(head)
|
||||
c2pa_is_ai = "trainedAlgorithmicMedia" in info.get("source_type", "") or any(
|
||||
m in head for m in (b"trainedAlgorithmicMedia", b"compositeWithTrainedAlgorithmicMedia")
|
||||
@@ -370,10 +408,11 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
or (", ".join(tools) if (tools := _ai_tools_in(head)) else None)
|
||||
)
|
||||
# Platform: a distinctive device/camera token in the manifest wins (it is the
|
||||
# signer/producer), with the issuer byte-scan only as fallback. The issuer
|
||||
# scan alone mis-attributed real samples (Leica->Truepic timestamp authority,
|
||||
# Nikon->Adobe namespace, Pixel->Google Gemini) -- the device scan fixes that.
|
||||
platform = (camera_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None
|
||||
# signer/producer), then an editing-app/AI-device signer (Samsung Galaxy,
|
||||
# ASUS Gallery), with the issuer byte-scan only as fallback. The issuer scan
|
||||
# alone mis-attributed real samples (Leica->Truepic timestamp authority,
|
||||
# Nikon->Adobe namespace, Pixel->Google Gemini) -- the token scans fix that.
|
||||
platform = (camera_label or signer_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None
|
||||
if has_c2pa:
|
||||
detail = ", ".join(filter(None, [", ".join(issuers), generator, info.get("source_type")]))
|
||||
signals.append(Signal("c2pa", detail or "C2PA manifest present", "high"))
|
||||
@@ -484,6 +523,22 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
if platform is None:
|
||||
platform = "HuggingFace-hosted job (model not identified)"
|
||||
|
||||
# ── Samsung Galaxy AI editing marker (genAIType) ─────────────────
|
||||
# Galaxy AI tools stamp a proprietary genAIType in PhotoEditor_Re_Edit_Data.
|
||||
# Medium confidence: it co-occurs with the C2PA trainedAlgorithmicMedia type
|
||||
# on Galaxy files that record one, and is the SOLE AI marker on a Galaxy S24
|
||||
# sample that omits the source type -- so it lifts an otherwise-Unknown
|
||||
# verdict, but the field is undocumented, so it never overrides a high-
|
||||
# confidence signal. The platform is usually already "Samsung Galaxy" via the
|
||||
# signer-token scan; the fallback covers a future file without the cert org.
|
||||
samsung_genai_type = samsung_genai(image_path)
|
||||
if samsung_genai_type is not None:
|
||||
signals.append(Signal("samsung_genai", f"Samsung genAIType={samsung_genai_type}", "medium"))
|
||||
watermarks.append("Samsung Galaxy AI editing marker (genAIType)")
|
||||
caveats.append(_SAMSUNG_GENAI_CAVEAT)
|
||||
if platform is None:
|
||||
platform = "Samsung Galaxy (Galaxy AI editing)"
|
||||
|
||||
# ── Open invisible watermark (SD / SDXL / FLUX, dwtDct) ──────────
|
||||
# Public decoder, no key -- a definitive embedded signal on pristine files.
|
||||
if check_invisible and (scheme := _invisible_watermark(image_path)) is not None:
|
||||
@@ -527,11 +582,12 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
|
||||
|
||||
visible_only = any(s.name == "visible_sparkle" for s in signals) and not ai_from_metadata
|
||||
hf_only = bool(hf_job) and not ai_from_metadata
|
||||
samsung_only = samsung_genai_type is not None and not ai_from_metadata
|
||||
|
||||
if ai_from_metadata:
|
||||
is_ai: bool | None = True
|
||||
confidence = "high"
|
||||
elif visible_only or hf_only:
|
||||
elif visible_only or hf_only or samsung_only:
|
||||
is_ai = True
|
||||
confidence = "medium"
|
||||
else:
|
||||
|
||||
@@ -65,6 +65,22 @@ AI_KEYWORDS: tuple[str, ...] = (
|
||||
# Reference: https://spec.c2pa.org/specifications/specifications/2.1/specs/C2PA_Specification.html
|
||||
C2PA_UUID: bytes = bytes.fromhex("d8fec3d61b0e483c92975828877ec481")
|
||||
|
||||
|
||||
def c2pa_marker_in(data: bytes) -> bool:
|
||||
"""True if ``data`` carries a real C2PA manifest marker, not just an
|
||||
incidental 4-byte ``c2pa`` substring.
|
||||
|
||||
A bare ``c2pa`` byte match false-positives on compressed pixel data -- a
|
||||
recompressed PNG IDAT (or any large binary) can contain the bytes ``c2pa``
|
||||
by chance (verified 2026-05-29: 4 cleaned PNGs re-flagged this way after
|
||||
their manifest was correctly stripped). Every real manifest is JUMBF-wrapped
|
||||
(the ``jumb`` box FourCC accompanies the ``c2pa`` content type) or uses the
|
||||
standalone C2PA ``uuid`` box in ISOBMFF, so we require one of those: the
|
||||
joint ``jumb`` + ``c2pa`` match has negligible random-collision probability.
|
||||
"""
|
||||
return C2PA_UUID in data or (b"jumb" in data and b"c2pa" in data.lower())
|
||||
|
||||
|
||||
# IPTC ``digitalSourceType`` values (IPTC 2025.1) that flag AI provenance.
|
||||
# Used by Instagram, Facebook, X (Twitter) to show "Made with AI" labels.
|
||||
IPTC_AI_MARKERS: tuple[bytes, ...] = (
|
||||
@@ -213,9 +229,7 @@ def has_ai_metadata(image_path: Path) -> bool:
|
||||
# Binary scan covers C2PA (PNG caBX, JPEG APP11, AVIF/HEIF/JXL uuid boxes)
|
||||
# and IPTC AI markers in XMP. First 512KB (plus late ISOBMFF provenance boxes).
|
||||
data = scan_head(image_path, 512 * 1024)
|
||||
if b"c2pa" in data.lower() or b"C2PA" in data:
|
||||
return True
|
||||
if C2PA_UUID in data:
|
||||
if c2pa_marker_in(data):
|
||||
return True
|
||||
if any(marker in data for marker in AIGC_MARKERS):
|
||||
return True
|
||||
@@ -310,6 +324,39 @@ def huggingface_job(image_path: Path) -> str | None:
|
||||
return None
|
||||
|
||||
|
||||
# Samsung Galaxy AI editing marker. Galaxy AI tools (Generative Edit, Sketch to
|
||||
# Image, Portrait Studio, Drawing Assist, ...) record their re-edit data as a
|
||||
# proprietary ``PhotoEditor_Re_Edit_Data`` JSON that carries a ``genAIType``
|
||||
# field; a non-zero value flags that a generative-AI tool produced or altered
|
||||
# the pixels. The field is undocumented by Samsung (verified 2026-05-29: absent
|
||||
# from the C2PA spec and Samsung's public docs/forums), so detection is
|
||||
# empirical -- on real Galaxy S23/S24/S25 files it co-occurs with the C2PA
|
||||
# ``trainedAlgorithmicMedia`` source type (3/3 of the verified files that record
|
||||
# that type), and on a Galaxy S24 sample it is the *only* AI marker (the C2PA
|
||||
# source type was absent there). Medium confidence: it signals Galaxy AI editing
|
||||
# without proving the whole image is AI-generated. Scoped to the Samsung editor
|
||||
# container to avoid matching a stray ``genAIType`` token elsewhere.
|
||||
_SAMSUNG_GENAI_RE = re.compile(rb'genAIType"\s*:\s*(-?\d+)')
|
||||
_SAMSUNG_EDITOR_MARKER = b"PhotoEditor_Re_Edit_Data"
|
||||
|
||||
|
||||
def samsung_genai(image_path: Path) -> int | None:
|
||||
"""Return Samsung's non-zero ``genAIType`` value if the image carries the
|
||||
Galaxy AI editing marker, else None.
|
||||
|
||||
See the module note above ``_SAMSUNG_GENAI_RE``: detection is empirical and
|
||||
gated on the ``PhotoEditor_Re_Edit_Data`` container so an incidental
|
||||
``genAIType`` token cannot false-positive.
|
||||
"""
|
||||
head = scan_head(image_path, 512 * 1024)
|
||||
if _SAMSUNG_EDITOR_MARKER not in head:
|
||||
return None
|
||||
m = _SAMSUNG_GENAI_RE.search(head)
|
||||
if m is None:
|
||||
return None
|
||||
return int(m.group(1)) or None
|
||||
|
||||
|
||||
def iptc_ai_system(image_path: Path) -> str | None:
|
||||
"""Return an IPTC 2025.1 AI-disclosure note if the file carries those XMP
|
||||
properties, else None.
|
||||
@@ -360,7 +407,7 @@ def synthid_source(image_path: Path) -> str | None:
|
||||
# C2PA manifest where the PNG parser can't reach it. Binary-scan for the
|
||||
# same signal: a C2PA manifest from a SynthID-using issuer on AI content.
|
||||
data = scan_head(image_path)
|
||||
has_c2pa = b"c2pa" in data.lower() or C2PA_UUID in data
|
||||
has_c2pa = c2pa_marker_in(data)
|
||||
# Matches both "trainedAlgorithmicMedia" and "compositeWithTrainedAlgorithmicMedia".
|
||||
ai_source = b"trainedAlgorithmicMedia" in data or b"TrainedAlgorithmicMedia" in data
|
||||
if not (has_c2pa and ai_source):
|
||||
@@ -585,6 +632,9 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
|
||||
# HuggingFace-hosted job marker (hf-job-id PNG text chunk).
|
||||
if job := huggingface_job(image_path):
|
||||
result.setdefault("huggingface_job", f"HuggingFace-hosted job ({job})")
|
||||
# Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data).
|
||||
if (genai := samsung_genai(image_path)) is not None:
|
||||
result.setdefault("samsung_genai", f"Samsung Galaxy AI editing marker (genAIType={genai})")
|
||||
return result
|
||||
|
||||
|
||||
|
||||
@@ -88,6 +88,14 @@ C2PA_ISSUERS = {
|
||||
# Stability AI signs C2PA as "Stability AI" (cert org "Stability AI Ltd").
|
||||
# Verified on a live Brand Studio (DreamStudio successor) output, 2026-05-24.
|
||||
b"Stability AI": "Stability AI",
|
||||
# Black Forest Labs (FLUX) API output: claim_generator_info "Black Forest
|
||||
# Labs API" + a c2pa.ai_generated_content assertion + trainedAlgorithmicMedia.
|
||||
# Verified on a real signed FLUX JPEG, 2026-05-29.
|
||||
b"Black Forest Labs": "Black Forest Labs",
|
||||
# ByteDance's Volcano Engine (Volcengine) signs its AI image output with a
|
||||
# cert from certificate_center@volcengine.com -- the platform behind Doubao /
|
||||
# Jimeng. Verified on two real signed JPEGs, 2026-05-29.
|
||||
b"volcengine": "ByteDance (Volcano Engine)",
|
||||
}
|
||||
|
||||
# C2PA issuers whose signed outputs also carry an invisible SynthID pixel
|
||||
|
||||
@@ -51,12 +51,31 @@ def _decoder() -> Any:
|
||||
return _tm
|
||||
|
||||
|
||||
# JPEG quality for the false-positive durability gate (see detect_trustmark).
|
||||
# Deliberately mild: a genuine TrustMark survives far harsher, while every
|
||||
# observed false positive collapsed even at this quality.
|
||||
_REENCODE_QUALITY = 95
|
||||
|
||||
|
||||
def detect_trustmark(image_path: Path) -> str | None:
|
||||
"""Return a TrustMark scheme note if a TrustMark watermark is decoded, else None.
|
||||
"""Return a TrustMark scheme note if a *durable* TrustMark watermark is
|
||||
decoded, else None.
|
||||
|
||||
Returns e.g. ``"Adobe TrustMark (variant P, schema 0)"`` when the decoder
|
||||
reports the watermark present, or None if it is absent, the optional
|
||||
``trustmark`` package is not installed, or the image cannot be read/decoded.
|
||||
reports the watermark present AND it survives a mild JPEG re-encode, or None
|
||||
if it is absent, the optional ``trustmark`` package is not installed, or the
|
||||
image cannot be read/decoded.
|
||||
|
||||
**False-positive gate.** TrustMark's ``wm_present`` flag is a BCH
|
||||
error-correction validity check, which spuriously validates on a small
|
||||
fraction of un-watermarked images -- content-correlated, so AI-generated
|
||||
textures trip it more often than camera photos (verified 2026-05-29 on real
|
||||
files: the false "detections" were on Gemini / OpenAI / Doubao output that
|
||||
cannot carry Adobe's watermark, and decoded a random-bytes secret). A genuine
|
||||
TrustMark is a *durable* soft binding engineered to survive re-encoding (that
|
||||
is its entire purpose once C2PA is stripped), so we re-decode after a mild
|
||||
JPEG round-trip and require the same schema both times. Every observed false
|
||||
positive collapsed under this gate.
|
||||
"""
|
||||
if not is_available():
|
||||
return None
|
||||
@@ -65,8 +84,30 @@ def detect_trustmark(image_path: Path) -> str | None:
|
||||
|
||||
with Image.open(image_path) as img:
|
||||
cover = img.convert("RGB")
|
||||
_wm_secret, wm_present, wm_schema = _decoder().decode(cover)
|
||||
decoder = _decoder()
|
||||
_wm_secret, wm_present, wm_schema = decoder.decode(cover)
|
||||
if not wm_present:
|
||||
return None
|
||||
if not _survives_reencode(decoder, cover, wm_schema):
|
||||
log.debug("TrustMark decode for %s did not survive re-encode; treating as false positive", image_path)
|
||||
return None
|
||||
except Exception as exc: # model download / decode failure / unreadable image
|
||||
log.debug("TrustMark decode failed for %s: %s", image_path, exc)
|
||||
return None
|
||||
return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})" if wm_present else None
|
||||
return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})"
|
||||
|
||||
|
||||
def _survives_reencode(decoder: Any, cover: Any, schema: int) -> bool:
|
||||
"""True if the watermark re-decodes with the same schema after a mild JPEG
|
||||
round-trip -- the durability a genuine TrustMark guarantees, which a BCH
|
||||
false positive (content noise) does not."""
|
||||
import io
|
||||
|
||||
from PIL import Image
|
||||
|
||||
buffer = io.BytesIO()
|
||||
cover.save(buffer, "JPEG", quality=_REENCODE_QUALITY)
|
||||
buffer.seek(0)
|
||||
with Image.open(buffer) as reencoded:
|
||||
_secret, present, reencoded_schema = decoder.decode(reencoded.convert("RGB"))
|
||||
return bool(present) and reencoded_schema == schema
|
||||
|
||||
@@ -0,0 +1,202 @@
|
||||
"""Registry of known visible watermarks.
|
||||
|
||||
A single catalog that ties each known visible mark to (a) where it usually sits,
|
||||
(b) how to recognize it there, and (c) how to remove it. One pass over the
|
||||
registry detects every known mark in its usual place and removes the ones
|
||||
present.
|
||||
|
||||
**Reverse-alpha only.** A known mark is a fixed semi-transparent overlay, so it
|
||||
is removed by inverting the alpha blend against a captured alpha map
|
||||
(``original = (wm - a*logo)/(1-a)``) -- exact recovery of the true pixels, not an
|
||||
inpaint guess. Detection is consistent with that: each mark is recognized by
|
||||
matching its known shape/template (the thing we invert), not by heuristics. A
|
||||
mark is therefore listed here only once a real alpha map has been captured for
|
||||
it; everything else (arbitrary logos/objects) is the user-directed
|
||||
``erase --region`` tool, not this catalog.
|
||||
|
||||
Entries:
|
||||
- ``gemini`` -- Google Gemini / Nano Banana sparkle, bottom-right.
|
||||
- ``doubao`` -- ByteDance Doubao "豆包AI生成" text strip, bottom-right.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import TYPE_CHECKING, Any, Literal
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Callable
|
||||
|
||||
from numpy.typing import NDArray
|
||||
|
||||
# cv2 method for the Gemini reverse-alpha edge-residual cleanup (not a standalone
|
||||
# remover): "ns" / "telea".
|
||||
InpaintMethod = Literal["telea", "ns"]
|
||||
Region = tuple[int, int, int, int]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MarkDetection:
|
||||
"""Uniform detection result for a known mark (across heterogeneous engines)."""
|
||||
|
||||
key: str
|
||||
label: str
|
||||
location: str
|
||||
detected: bool
|
||||
confidence: float
|
||||
region: Region
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class KnownMark:
|
||||
"""A known visible watermark: where it lives, how to find and remove it."""
|
||||
|
||||
key: str
|
||||
label: str
|
||||
location: str # usual place, human-readable ("bottom-right")
|
||||
in_auto: bool # participate in `--mark auto` scanning
|
||||
recovery: str # removal strategy (all reverse-alpha today)
|
||||
_detect: Callable[[NDArray[Any]], MarkDetection]
|
||||
_remove: Callable[..., tuple[NDArray[Any], Region | None]]
|
||||
|
||||
def detect(self, image: NDArray[Any]) -> MarkDetection:
|
||||
return self._detect(image)
|
||||
|
||||
def remove(
|
||||
self,
|
||||
image: NDArray[Any],
|
||||
*,
|
||||
inpaint_method: InpaintMethod = "ns",
|
||||
inpaint: bool = True,
|
||||
inpaint_strength: float = 0.85,
|
||||
force: bool = False,
|
||||
) -> tuple[NDArray[Any], Region | None]:
|
||||
"""Remove this mark by reverse-alpha; returns ``(result, cleared_region)``
|
||||
(region for clearing alpha on save, or None if nothing was removed).
|
||||
|
||||
``inpaint`` / ``inpaint_strength`` / ``inpaint_method`` tune the Gemini
|
||||
reverse-alpha edge-residual cleanup only. ``force`` removes at the mark's
|
||||
usual location even without a positive detection (the ``--no-detect`` path).
|
||||
"""
|
||||
return self._remove(image, inpaint_method, inpaint, inpaint_strength, force)
|
||||
|
||||
|
||||
# Gemini-sparkle confidence above which the registry treats it as a confident
|
||||
# detection for arbitration. Matches identify's corpus-validated sparkle
|
||||
# threshold (0.5): the gemini engine's own detect flag uses a looser internal
|
||||
# threshold and weakly fires (~0.36) on unrelated bottom-right text (e.g. the
|
||||
# Doubao mark), which would otherwise let it hijack `--mark auto`. 0.5 gives 0
|
||||
# false positives on the corpus.
|
||||
_GEMINI_AUTO_MIN_CONF = 0.5
|
||||
|
||||
# ── Engine adapters (lazy singletons; engines are cv2-only, no model load) ──
|
||||
|
||||
_engines: dict[str, Any] = {}
|
||||
|
||||
|
||||
def _engine(key: str) -> Any:
|
||||
if key not in _engines:
|
||||
if key == "gemini":
|
||||
from remove_ai_watermarks.gemini_engine import GeminiEngine
|
||||
|
||||
_engines[key] = GeminiEngine()
|
||||
elif key == "doubao":
|
||||
from remove_ai_watermarks.doubao_engine import DoubaoEngine
|
||||
|
||||
_engines[key] = DoubaoEngine()
|
||||
else: # pragma: no cover - guarded by the registry keys
|
||||
raise KeyError(key)
|
||||
return _engines[key]
|
||||
|
||||
|
||||
def _gemini_detect(image: NDArray[Any]) -> MarkDetection:
|
||||
d = _engine("gemini").detect_watermark(image)
|
||||
detected = bool(d.detected) and d.confidence >= _GEMINI_AUTO_MIN_CONF
|
||||
return MarkDetection("gemini", "Google Gemini sparkle", "bottom-right", detected, d.confidence, d.region)
|
||||
|
||||
|
||||
def _gemini_remove(
|
||||
image: NDArray[Any], inpaint_method: InpaintMethod, inpaint: bool, strength: float, force: bool
|
||||
) -> tuple[NDArray[Any], Region | None]:
|
||||
engine = _engine("gemini")
|
||||
det = engine.detect_watermark(image)
|
||||
if not det.detected:
|
||||
if not force:
|
||||
return image.copy(), None
|
||||
# Forced (--no-detect): remove at the default sparkle slot for the size.
|
||||
from remove_ai_watermarks.gemini_engine import get_watermark_config
|
||||
|
||||
h, w = image.shape[:2]
|
||||
cfg = get_watermark_config(w, h)
|
||||
px, py = cfg.get_position(w, h)
|
||||
region = (px, py, cfg.logo_size, cfg.logo_size)
|
||||
result = engine.remove_watermark_custom(image, region)
|
||||
if inpaint:
|
||||
result = engine.inpaint_residual(result, region, strength=strength, method=inpaint_method)
|
||||
return result, region
|
||||
result = engine.remove_watermark(image)
|
||||
# Reverse-alpha leaves a faint residual at the sparkle edge; the engine's
|
||||
# own residual inpaint cleans that seam (part of its reverse-alpha pipeline).
|
||||
if inpaint:
|
||||
result = engine.inpaint_residual(result, det.region, strength=strength, method=inpaint_method)
|
||||
return result, det.region
|
||||
|
||||
|
||||
def _doubao_detect(image: NDArray[Any]) -> MarkDetection:
|
||||
d = _engine("doubao").detect(image)
|
||||
return MarkDetection("doubao", "Doubao 豆包AI生成 text", "bottom-right", d.detected, d.confidence, d.region)
|
||||
|
||||
|
||||
def _doubao_remove(
|
||||
image: NDArray[Any], _inpaint_method: InpaintMethod, _inpaint: bool, _strength: float, force: bool
|
||||
) -> tuple[NDArray[Any], Region | None]:
|
||||
# Reverse-alpha only: apply when the mark is present AND the resolution is in
|
||||
# the alpha map's calibrated band. Outside it we do NOT inpaint (no
|
||||
# hallucination) -- removal is skipped until a capture for that resolution.
|
||||
engine = _engine("doubao")
|
||||
det = engine.detect(image)
|
||||
if (det.detected or force) and engine.reverse_alpha_available(image):
|
||||
return engine.remove_watermark_reverse_alpha(image), (det.region if det.detected else None)
|
||||
return image.copy(), None
|
||||
|
||||
|
||||
_REGISTRY: tuple[KnownMark, ...] = (
|
||||
KnownMark("gemini", "Google Gemini sparkle", "bottom-right", True, "reverse-alpha", _gemini_detect, _gemini_remove),
|
||||
KnownMark(
|
||||
"doubao", "Doubao 豆包AI生成 text", "bottom-right", True, "reverse-alpha", _doubao_detect, _doubao_remove
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def known_marks() -> tuple[KnownMark, ...]:
|
||||
"""All registered known visible watermarks."""
|
||||
return _REGISTRY
|
||||
|
||||
|
||||
def mark_keys() -> list[str]:
|
||||
"""Keys of all registered marks (for CLI choices)."""
|
||||
return [m.key for m in _REGISTRY]
|
||||
|
||||
|
||||
def get_mark(key: str) -> KnownMark:
|
||||
"""Look up a known mark by key (raises KeyError if unknown)."""
|
||||
for m in _REGISTRY:
|
||||
if m.key == key:
|
||||
return m
|
||||
raise KeyError(key)
|
||||
|
||||
|
||||
def detect_marks(image: NDArray[Any], *, include_explicit: bool = True) -> list[MarkDetection]:
|
||||
"""Detect every known mark in its usual place.
|
||||
|
||||
Returns one MarkDetection per scanned mark (``detected`` flags which fired).
|
||||
``include_explicit=False`` scans only the ``in_auto`` marks -- the set used
|
||||
by ``--mark auto``.
|
||||
"""
|
||||
return [m.detect(image) for m in _REGISTRY if include_explicit or m.in_auto]
|
||||
|
||||
|
||||
def best_auto_mark(image: NDArray[Any]) -> MarkDetection | None:
|
||||
"""The highest-confidence detected ``in_auto`` mark, or None if none fired."""
|
||||
fired = [d for d in detect_marks(image, include_explicit=False) if d.detected]
|
||||
return max(fired, key=lambda d: d.confidence) if fired else None
|
||||
+113
-48
@@ -1,4 +1,4 @@
|
||||
"""Tests for the Doubao visible-watermark engine."""
|
||||
"""Tests for the Doubao visible-watermark engine (reverse-alpha only)."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -8,91 +8,156 @@ import cv2
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
from remove_ai_watermarks.doubao_engine import DoubaoEngine, load_image_bgr
|
||||
from remove_ai_watermarks.doubao_engine import (
|
||||
_ALPHA_HEIGHT_FRAC,
|
||||
_ALPHA_LOGO_BGR,
|
||||
_ALPHA_MARGIN_BOTTOM_FRAC,
|
||||
_ALPHA_MARGIN_RIGHT_FRAC,
|
||||
_ALPHA_NATIVE_WIDTH,
|
||||
_ALPHA_WIDTH_FRAC,
|
||||
DETECT_NCC_THRESHOLD,
|
||||
DoubaoEngine,
|
||||
_alpha_template,
|
||||
_glyph_silhouette,
|
||||
_template_match_score,
|
||||
load_image_bgr,
|
||||
)
|
||||
|
||||
SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png"
|
||||
|
||||
|
||||
# ── Locate ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestLocate:
|
||||
def test_box_anchored_bottom_right(self):
|
||||
eng = DoubaoEngine()
|
||||
img = np.zeros((2048, 2048, 3), np.uint8)
|
||||
loc = eng.locate(img)
|
||||
# right and bottom edges sit close to the image corner (within margins)
|
||||
assert 2048 - (loc.x + loc.w) < int(2048 * 0.03)
|
||||
assert 2048 - (loc.y + loc.h) < int(2048 * 0.03)
|
||||
assert loc.is_fallback # geometry anchor, no bundled template yet
|
||||
|
||||
def test_box_scales_with_width(self):
|
||||
eng = DoubaoEngine()
|
||||
small = eng.locate(np.zeros((1024, 1024, 3), np.uint8))
|
||||
large = eng.locate(np.zeros((2048, 2048, 3), np.uint8))
|
||||
# width-relative geometry: 2x wider image -> ~2x wider box
|
||||
assert large.w == pytest.approx(small.w * 2, rel=0.1)
|
||||
|
||||
|
||||
# ── Detect + remove on the real sample ──────────────────────────────
|
||||
# ── Detection: alpha-template NCC ───────────────────────────────────
|
||||
|
||||
|
||||
class TestDetect:
|
||||
def test_clean_gradient_not_detected(self):
|
||||
eng = DoubaoEngine()
|
||||
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
|
||||
img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
|
||||
assert not eng.detect(img).detected
|
||||
|
||||
def test_solid_blob_corner_not_detected(self):
|
||||
"""A bright blob is not the glyph shape -> low correlation, not detected."""
|
||||
eng = DoubaoEngine()
|
||||
img = np.zeros((1024, 1024, 3), np.uint8)
|
||||
x, y, bw, bh = eng.locate(img).bbox
|
||||
img[y + bh // 4 : y + bh * 3 // 4, x : x + bw // 2] = 200
|
||||
assert not eng.detect(img).detected
|
||||
|
||||
def test_silhouette_loads(self):
|
||||
sil = _glyph_silhouette()
|
||||
assert sil is not None
|
||||
assert set(np.unique(sil)).issubset({0, 255})
|
||||
|
||||
def test_match_score_shape_sensitive(self):
|
||||
"""The glyph silhouette correlates with itself, not with a filled block."""
|
||||
sil = _glyph_silhouette()
|
||||
h, w = sil.shape
|
||||
# box that contains the silhouette -> high score
|
||||
box = np.zeros((h + 8, int(w / _ALPHA_WIDTH_FRAC * 0.2) + w), np.uint8)
|
||||
box[4 : 4 + h, 4 : 4 + w] = sil
|
||||
assert _template_match_score(box, _ALPHA_NATIVE_WIDTH) >= DETECT_NCC_THRESHOLD
|
||||
# a uniformly filled box has no glyph structure -> low score
|
||||
solid = np.full_like(box, 255)
|
||||
assert _template_match_score(solid, _ALPHA_NATIVE_WIDTH) < DETECT_NCC_THRESHOLD
|
||||
|
||||
|
||||
@pytest.mark.skipif(not SAMPLE.exists(), reason="sample image not present")
|
||||
class TestRealSample:
|
||||
def test_detects_watermark(self):
|
||||
eng = DoubaoEngine()
|
||||
det = eng.detect(load_image_bgr(SAMPLE))
|
||||
det = DoubaoEngine().detect(load_image_bgr(SAMPLE))
|
||||
assert det.detected
|
||||
assert det.confidence > 0.0
|
||||
assert det.coverage > 0.04
|
||||
assert det.confidence >= DETECT_NCC_THRESHOLD
|
||||
|
||||
def test_remove_reduces_glyph_coverage(self):
|
||||
def test_reverse_alpha_removes_mark(self):
|
||||
eng = DoubaoEngine()
|
||||
img = load_image_bgr(SAMPLE)
|
||||
before = eng.detect(img).coverage
|
||||
out = eng.remove_watermark(img)
|
||||
after = eng.detect(out).coverage
|
||||
# the inpaint should clear most glyph pixels from the corner box
|
||||
assert after < before * 0.5
|
||||
assert eng.reverse_alpha_available(img) # sample is at the captured width
|
||||
out = eng.remove_watermark_reverse_alpha(img)
|
||||
assert not eng.detect(out).detected # mark gone after recovery
|
||||
|
||||
def test_pixels_outside_box_untouched(self):
|
||||
def test_far_region_untouched(self):
|
||||
eng = DoubaoEngine()
|
||||
img = load_image_bgr(SAMPLE)
|
||||
out = eng.remove_watermark(img)
|
||||
# top-left quadrant is far from the bottom-right mark: must be identical
|
||||
out = eng.remove_watermark_reverse_alpha(img)
|
||||
h, w = img.shape[:2]
|
||||
assert np.array_equal(img[: h // 2, : w // 2], out[: h // 2, : w // 2])
|
||||
|
||||
|
||||
# ── Negative + safety guard ─────────────────────────────────────────
|
||||
# ── Reverse-alpha (exact recovery) ──────────────────────────────────
|
||||
|
||||
|
||||
class TestNegativeAndGuard:
|
||||
def test_clean_image_not_detected(self):
|
||||
class TestReverseAlpha:
|
||||
def test_alpha_asset_loads(self):
|
||||
at = _alpha_template()
|
||||
assert at is not None
|
||||
assert at.dtype.kind == "f"
|
||||
assert float(at.min()) >= 0.0
|
||||
assert float(at.max()) <= 1.0
|
||||
|
||||
def test_available_whenever_asset_present(self):
|
||||
# NCC alignment generalizes to any resolution, so availability is just
|
||||
# "asset loadable" (any non-empty image); the caller gates on detect.
|
||||
eng = DoubaoEngine()
|
||||
# smooth gradient, no watermark
|
||||
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
|
||||
img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
|
||||
det = eng.detect(img)
|
||||
assert not det.detected
|
||||
assert eng.reverse_alpha_available(np.zeros((1024, 1024, 3), np.uint8))
|
||||
assert eng.reverse_alpha_available(np.zeros((1773, 1535, 3), np.uint8))
|
||||
assert not eng.reverse_alpha_available(np.zeros((0, 0, 3), np.uint8))
|
||||
|
||||
def test_clean_image_returned_unchanged(self):
|
||||
eng = DoubaoEngine()
|
||||
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
|
||||
img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
|
||||
out = eng.remove_watermark(img)
|
||||
assert np.array_equal(img, out)
|
||||
@staticmethod
|
||||
def _compose(w: int, h: int, bg: float = 100.0):
|
||||
"""Composite the real alpha (scaled to width ``w``) onto a flat bg.
|
||||
Returns ``(watermarked_uint8, mark_bool_mask)``."""
|
||||
img = np.full((h, w, 3), bg, np.float32)
|
||||
at = _alpha_template()
|
||||
gw, gh = int(_ALPHA_WIDTH_FRAC * w), int(_ALPHA_HEIGHT_FRAC * w)
|
||||
ax = w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw
|
||||
ay = h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh
|
||||
amap = np.zeros((h, w), np.float32)
|
||||
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh))
|
||||
a3 = amap[:, :, None]
|
||||
wm = (a3 * np.array(_ALPHA_LOGO_BGR, np.float32) + (1 - a3) * img).clip(0, 255).astype(np.uint8)
|
||||
return wm, amap > 0.2
|
||||
|
||||
def test_document_background_guard(self):
|
||||
"""A dense high-frequency corner (document-like) trips the coverage
|
||||
guard, so the image is left untouched rather than smeared."""
|
||||
def test_native_returns_exact_reverse_alpha_no_inpaint(self):
|
||||
"""At native width the recovery is exact, so it must be returned untouched
|
||||
-- inpainting over exactly-recovered interior pixels degrades quality
|
||||
(regression: native textured error 1.6 reverse-alpha-only vs 2.6 with the
|
||||
old full-footprint inpaint). The output must equal pure reverse-alpha."""
|
||||
eng = DoubaoEngine()
|
||||
rng = np.random.default_rng(0)
|
||||
img = np.full((1024, 1024, 3), 255, np.uint8)
|
||||
# fill the bottom-right box area with random grayish text-like noise
|
||||
loc = eng.locate(img)
|
||||
x, y, bw, bh = loc.bbox
|
||||
noise = rng.integers(150, 246, size=(bh, bw), dtype=np.uint8)
|
||||
img[y : y + bh, x : x + bw] = noise[:, :, None]
|
||||
out = eng.remove_watermark(img)
|
||||
assert np.array_equal(img, out)
|
||||
wm, _mark = self._compose(_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH)
|
||||
out = eng.remove_watermark_reverse_alpha(wm)
|
||||
amap = eng._fixed_alpha_map(wm)
|
||||
assert amap is not None
|
||||
expected = eng._apply_reverse_alpha(wm, amap[0])
|
||||
assert np.array_equal(out, expected) # no inpaint touched the recovery
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
("w", "h", "max_err"),
|
||||
[
|
||||
(_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH, 5.0), # native 1:1 -> fixed geometry, ~exact
|
||||
(1773, 2364, 8.0), # 3:4 portrait -> NCC alignment generalizes the single capture
|
||||
],
|
||||
)
|
||||
def test_recovers_flat_background(self, w, h, max_err):
|
||||
"""Recovers the flat background at native (fixed geometry, exact) AND a
|
||||
non-native resolution (NCC alignment generalizes the single capture)."""
|
||||
eng = DoubaoEngine()
|
||||
wm, mark = self._compose(w, h)
|
||||
assert float(np.abs(wm.astype(np.float32)[mark] - 100.0).mean()) > 15 # mark visible
|
||||
out = eng.remove_watermark_reverse_alpha(wm).astype(np.float32)
|
||||
assert float(np.abs(out[mark] - 100.0).mean()) < max_err
|
||||
|
||||
@@ -113,6 +113,18 @@ class TestIdentifyNonPng:
|
||||
r = identify(path, check_visible=False)
|
||||
assert any("SynthID" in w for w in r.watermarks)
|
||||
|
||||
def test_black_forest_labs_flux_attributed(self, tmp_path: Path):
|
||||
path = self._c2pa_jpeg(tmp_path, b"Black Forest Labs API ... trainedAlgorithmicMedia")
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is True
|
||||
assert r.platform == "Black Forest Labs (FLUX)"
|
||||
|
||||
def test_bytedance_volcengine_attributed(self, tmp_path: Path):
|
||||
path = self._c2pa_jpeg(tmp_path, b"certificate_center@volcengine.com ... trainedAlgorithmicMedia")
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is True
|
||||
assert "ByteDance" in (r.platform or "")
|
||||
|
||||
def test_stability_ai_issuer_attributed_no_synthid(self, tmp_path: Path):
|
||||
path = self._c2pa_jpeg(tmp_path, b"Stability AI ... trainedAlgorithmicMedia")
|
||||
r = identify(path, check_visible=False)
|
||||
@@ -132,6 +144,50 @@ class TestIdentifyNonPng:
|
||||
assert not any("SynthID" in w for w in r.watermarks)
|
||||
|
||||
|
||||
class TestIdentifySamsungGalaxy:
|
||||
"""Samsung Galaxy / ASUS Gallery C2PA signers (verified on real signed files
|
||||
2026-05-29; synthetic byte blobs here since the originals are private).
|
||||
|
||||
Galaxy AI edits stamp BOTH the device cert AND an AI source-type / genAIType,
|
||||
so the signer attribution must NOT trip the camera-vs-AI integrity clash.
|
||||
"""
|
||||
|
||||
def _jpeg(self, tmp_path: Path, name: str, blob: bytes) -> Path:
|
||||
path = tmp_path / name
|
||||
path.write_bytes(b"\xff\xd8\xff\xe1jumbc2pa" + blob + b"\xff\xd9")
|
||||
return path
|
||||
|
||||
def test_galaxy_trained_source_is_high_ai(self, tmp_path: Path):
|
||||
path = self._jpeg(tmp_path, "s25.jpg", b"Samsung Galaxy Galaxy S25 c2pa-rs trainedAlgorithmicMedia")
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is True
|
||||
assert r.confidence == "high"
|
||||
assert r.platform == "Samsung Galaxy (C2PA)"
|
||||
assert r.integrity_clashes == [] # device cert + AI source-type is legitimate, not a clash
|
||||
|
||||
def test_galaxy_genai_only_is_medium_ai(self, tmp_path: Path):
|
||||
# The Galaxy S24 case: no trainedAlgorithmicMedia, genAIType is the only
|
||||
# AI marker -- previously missed, now a medium-confidence verdict.
|
||||
path = self._jpeg(
|
||||
tmp_path, "s24.jpg", b'Samsung Galaxy Galaxy S24 c2pa-rs PhotoEditor_Re_Edit_Data{"genAIType":1}'
|
||||
)
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is True
|
||||
assert r.confidence == "medium"
|
||||
assert r.platform == "Samsung Galaxy (C2PA)"
|
||||
assert any(s.name == "samsung_genai" for s in r.signals)
|
||||
assert r.integrity_clashes == []
|
||||
|
||||
def test_asus_gallery_signer_not_ai(self, tmp_path: Path):
|
||||
# ASUS Gallery signs edited photos; no AI source-type or genAIType, so the
|
||||
# platform is attributed but the verdict stays unknown.
|
||||
path = self._jpeg(tmp_path, "asus.jpg", b"/com.asus.gallery/3.8.0.98 c2pa-rs no ai marker")
|
||||
r = identify(path, check_visible=False, check_invisible=False)
|
||||
assert r.is_ai_generated is None
|
||||
assert r.platform == "ASUS Gallery (C2PA signer)"
|
||||
assert any("C2PA" in w for w in r.watermarks)
|
||||
|
||||
|
||||
# ── End-to-end verdicts on real fixtures ────────────────────────────
|
||||
|
||||
|
||||
|
||||
@@ -12,12 +12,15 @@ from PIL import Image
|
||||
from PIL.PngImagePlugin import PngInfo
|
||||
|
||||
from remove_ai_watermarks.metadata import (
|
||||
C2PA_UUID,
|
||||
_is_ai_key,
|
||||
c2pa_marker_in,
|
||||
exif_generator,
|
||||
get_ai_metadata,
|
||||
has_ai_metadata,
|
||||
iptc_ai_system,
|
||||
remove_ai_metadata,
|
||||
samsung_genai,
|
||||
synthid_source,
|
||||
xai_signature,
|
||||
)
|
||||
@@ -135,6 +138,71 @@ class TestHasAiMetadata:
|
||||
assert has_ai_metadata(path)
|
||||
|
||||
|
||||
class TestC2paMarkerIn:
|
||||
"""The C2PA presence check requires a JUMBF wrapper or the C2PA uuid box, so
|
||||
a bare 4-byte ``c2pa`` substring (e.g. random compressed pixel data) does not
|
||||
false-positive -- the regression behind 4 cleaned PNGs re-flagging C2PA."""
|
||||
|
||||
def test_jumbf_wrapped_c2pa_detected(self):
|
||||
assert c2pa_marker_in(b"....jumbc2pa....manifest....") is True
|
||||
|
||||
def test_c2pa_uuid_box_detected(self):
|
||||
assert c2pa_marker_in(b"\x00\x00\x00\x18uuid" + C2PA_UUID + b"payload") is True
|
||||
|
||||
def test_bare_c2pa_substring_not_detected(self):
|
||||
# The exact false positive: "c2pa" appears in noise but no JUMBF/uuid box.
|
||||
assert c2pa_marker_in(b"\x9c\xc3\xa7B1\x11c2pa\x80b\x804\xc5\xf9random idat") is False
|
||||
|
||||
def test_jumb_without_c2pa_not_detected(self):
|
||||
assert c2pa_marker_in(b"some jumb box but no manifest label") is False
|
||||
|
||||
def test_empty_not_detected(self):
|
||||
assert c2pa_marker_in(b"") is False
|
||||
|
||||
|
||||
class TestSamsungGenai:
|
||||
"""Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data).
|
||||
|
||||
Synthetic byte blobs -- real Galaxy files are user content and not shipped
|
||||
(public repo), same discipline as the Grok/Doubao fixtures.
|
||||
"""
|
||||
|
||||
@staticmethod
|
||||
def _samsung_jpeg(tmp_path: Path, name: str, payload: bytes) -> Path:
|
||||
path = tmp_path / name
|
||||
path.write_bytes(b"\xff\xd8\xff\xe1" + payload + b"\xff\xd9")
|
||||
return path
|
||||
|
||||
def test_nonzero_genai_type_detected(self, tmp_path: Path):
|
||||
p = self._samsung_jpeg(
|
||||
tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"connectorType":"srvg","genAIType":1}'
|
||||
)
|
||||
assert samsung_genai(p) == 1
|
||||
|
||||
def test_other_nonzero_value_detected(self, tmp_path: Path):
|
||||
p = self._samsung_jpeg(tmp_path, "galaxy5.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":5}')
|
||||
assert samsung_genai(p) == 5
|
||||
|
||||
def test_zero_genai_type_is_none(self, tmp_path: Path):
|
||||
"""genAIType:0 means no generative AI was used -- not a positive signal."""
|
||||
p = self._samsung_jpeg(tmp_path, "edit.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":0}')
|
||||
assert samsung_genai(p) is None
|
||||
|
||||
def test_genai_without_editor_container_ignored(self, tmp_path: Path):
|
||||
"""An incidental genAIType token outside Samsung's editor JSON is ignored."""
|
||||
p = self._samsung_jpeg(tmp_path, "stray.jpg", b'some other blob "genAIType":1 elsewhere')
|
||||
assert samsung_genai(p) is None
|
||||
|
||||
def test_clean_image_is_none(self, tmp_clean_png):
|
||||
assert samsung_genai(tmp_clean_png) is None
|
||||
|
||||
def test_surfaced_in_get_ai_metadata(self, tmp_path: Path):
|
||||
p = self._samsung_jpeg(tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":1}')
|
||||
meta = get_ai_metadata(p)
|
||||
assert "samsung_genai" in meta
|
||||
assert "genAIType=1" in meta["samsung_genai"]
|
||||
|
||||
|
||||
class TestGetAiMetadata:
|
||||
"""Tests for extracting AI metadata."""
|
||||
|
||||
|
||||
@@ -12,12 +12,28 @@ from typing import TYPE_CHECKING
|
||||
|
||||
import pytest
|
||||
|
||||
from remove_ai_watermarks import trustmark_detector
|
||||
from remove_ai_watermarks.trustmark_detector import detect_trustmark, is_available
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class _FakeDecoder:
|
||||
"""A TrustMark decoder whose successive ``decode`` calls return scripted
|
||||
``(secret, present, schema)`` tuples -- the first for the original image, the
|
||||
second for the re-encoded copy used by the false-positive durability gate."""
|
||||
|
||||
def __init__(self, *results: tuple[bytes, bool, int]):
|
||||
self._results = list(results)
|
||||
self.calls = 0
|
||||
|
||||
def decode(self, _img: object) -> tuple[bytes, bool, int]:
|
||||
result = self._results[min(self.calls, len(self._results) - 1)]
|
||||
self.calls += 1
|
||||
return result
|
||||
|
||||
|
||||
def test_detect_never_raises(tmp_clean_png: Path):
|
||||
# Whether or not trustmark is installed, a clean image must yield None
|
||||
# (no watermark) without raising. When absent, the import guard returns None.
|
||||
@@ -34,3 +50,40 @@ def test_unreadable_file_returns_none(tmp_path: Path):
|
||||
def test_clean_image_reports_no_watermark(tmp_clean_png: Path):
|
||||
# With the decoder present, an un-watermarked image must report absent.
|
||||
assert detect_trustmark(tmp_clean_png) is None
|
||||
|
||||
|
||||
class TestFalsePositiveGate:
|
||||
"""The re-encode durability gate keeps real (durable) TrustMarks and drops
|
||||
BCH false positives that collapse under a mild JPEG round-trip."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _force_available(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setattr(trustmark_detector, "is_available", lambda: True)
|
||||
|
||||
def _patch_decoder(self, monkeypatch: pytest.MonkeyPatch, decoder: _FakeDecoder) -> None:
|
||||
monkeypatch.setattr(trustmark_detector, "_decoder", lambda: decoder)
|
||||
|
||||
def test_durable_watermark_survives_and_is_reported(self, monkeypatch, tmp_clean_png: Path):
|
||||
decoder = _FakeDecoder((b"secret", True, 2), (b"secret", True, 2))
|
||||
self._patch_decoder(monkeypatch, decoder)
|
||||
result = detect_trustmark(tmp_clean_png)
|
||||
assert result == "Adobe TrustMark (variant P, schema 2)"
|
||||
assert decoder.calls == 2 # original + re-encode
|
||||
|
||||
def test_false_positive_collapsing_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path):
|
||||
# Present on the original, absent after re-encode -> content-noise FP.
|
||||
decoder = _FakeDecoder((b"\x00\x01", True, 3), (b"", False, -1))
|
||||
self._patch_decoder(monkeypatch, decoder)
|
||||
assert detect_trustmark(tmp_clean_png) is None
|
||||
|
||||
def test_schema_drift_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path):
|
||||
# Present both times but the schema changes -> not a stable watermark.
|
||||
decoder = _FakeDecoder((b"\x00", True, 2), (b"\x00", True, 3))
|
||||
self._patch_decoder(monkeypatch, decoder)
|
||||
assert detect_trustmark(tmp_clean_png) is None
|
||||
|
||||
def test_absent_skips_reencode(self, monkeypatch, tmp_clean_png: Path):
|
||||
decoder = _FakeDecoder((b"", False, -1))
|
||||
self._patch_decoder(monkeypatch, decoder)
|
||||
assert detect_trustmark(tmp_clean_png) is None
|
||||
assert decoder.calls == 1 # no second decode when the first is absent
|
||||
|
||||
@@ -0,0 +1,70 @@
|
||||
"""Tests for the known-visible-watermark registry (reverse-alpha only)."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
from remove_ai_watermarks import watermark_registry as reg
|
||||
|
||||
DOUBAO_SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png"
|
||||
|
||||
|
||||
class TestCatalog:
|
||||
def test_keys(self):
|
||||
assert reg.mark_keys() == ["gemini", "doubao"]
|
||||
|
||||
def test_all_in_auto(self):
|
||||
assert all(m.in_auto for m in reg.known_marks())
|
||||
|
||||
def test_recovery_is_reverse_alpha(self):
|
||||
# Every catalogued mark is removed by exact reverse-alpha (no inpaint).
|
||||
assert all(m.recovery == "reverse-alpha" for m in reg.known_marks())
|
||||
|
||||
def test_locations(self):
|
||||
by_key = {m.key: m for m in reg.known_marks()}
|
||||
assert by_key["gemini"].location == "bottom-right"
|
||||
assert by_key["doubao"].location == "bottom-right"
|
||||
|
||||
def test_get_mark_unknown_raises(self):
|
||||
with pytest.raises(KeyError):
|
||||
reg.get_mark("nope")
|
||||
|
||||
|
||||
class TestScan:
|
||||
def test_detect_marks_scans_all(self):
|
||||
img = np.zeros((256, 256, 3), np.uint8)
|
||||
keys = {d.key for d in reg.detect_marks(img)}
|
||||
assert keys == {"gemini", "doubao"}
|
||||
|
||||
def test_blank_image_no_auto_mark(self):
|
||||
assert reg.best_auto_mark(np.zeros((256, 256, 3), np.uint8)) is None
|
||||
|
||||
|
||||
@pytest.mark.skipif(not DOUBAO_SAMPLE.exists(), reason="doubao sample not present")
|
||||
class TestRealSample:
|
||||
def test_doubao_sample_wins_auto(self):
|
||||
from remove_ai_watermarks.image_io import imread
|
||||
|
||||
best = reg.best_auto_mark(imread(DOUBAO_SAMPLE))
|
||||
assert best is not None
|
||||
assert best.key == "doubao"
|
||||
|
||||
def test_doubao_remove_returns_region(self):
|
||||
from remove_ai_watermarks.image_io import imread
|
||||
|
||||
img = imread(DOUBAO_SAMPLE) # 2048 wide -> reverse-alpha applies
|
||||
result, region = reg.get_mark("doubao").remove(img)
|
||||
assert region is not None
|
||||
assert result.shape == img.shape
|
||||
|
||||
|
||||
class TestReverseAlphaOnly:
|
||||
def test_doubao_off_resolution_is_skipped(self):
|
||||
# No alpha capture for this width -> no inpaint fallback, image untouched.
|
||||
img = np.zeros((512, 512, 3), np.uint8)
|
||||
result, region = reg.get_mark("doubao").remove(img)
|
||||
assert region is None
|
||||
assert np.array_equal(result, img)
|
||||
Reference in New Issue
Block a user