Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28)

* fix(trustmark): gate detection on re-encode durability to kill false positives

TrustMark's wm_present flag is a BCH validity check that spuriously
validates on a content-correlated fraction of un-watermarked images
(AI textures trip it more than camera photos). On a 1343-image set all
20 raw detections were false, several on Gemini/OpenAI/Doubao output that
cannot carry Adobe's watermark, with random-bytes secrets.

A genuine TrustMark is a durable soft binding that survives re-encoding,
so detect_trustmark now re-decodes after a mild JPEG round-trip and
requires the same schema both times. Every observed false positive
collapsed under this gate; the second decode runs only on the rare hit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP

Detection extensions verified on real signed files (2026-05-29):

- Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM
  (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera
  _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type)
  does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai:
  the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium-
  confidence AI-editing signal (samsung_only branch).
- Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng)
  added as C2PA issuers + issuer->platform mappings.
- fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which
  false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged
  C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires
  the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify +
  metadata. Verified: all 535 real C2PA files carry jumb.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(doubao): gate detection on text structure to cut ~95% of false positives (#23)

Coverage alone over-fired: any textured bottom-right corner cleared the
threshold, so the detector false-positived on ~28% of arbitrary images.
The real '豆包AI生成' mark is six glyphs in one row, so detect now also
requires the text-structure signature (_glyph_structure): many connected
components, no single dominant blob, concentration in a thin horizontal
band. False positives dropped 343 -> 17 across the corpus while keeping
real-mark recall and the doubao-1.png sample. Also accept a no-op force
kwarg for remover-interface symmetry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(samsung): add Samsung Galaxy AI visible-badge remover

New samsung_engine.py removes the bottom-left sparkle + localized
'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the
Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity
top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates
on a band + left-anchor signature (the Doubao CJK-component gate does not
transfer: Latin badge letters connect into few blobs). Explicit-only --
tuned on few real badges with a ~4% FP floor, so it is not used in auto.
Synthetic byte-blob fixtures (real badges are user content, not shipped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(visible): unified known-watermark registry + LaMa inpaint backend

watermark_registry.py is a single catalog of known visible marks, each
tying {usual location, in_auto flag, recovery strategy, detect adapter,
remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible
is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the
CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone.

Cross-engine confidences are not comparable, so the gemini adapter applies the
corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is
loose and weakly fired ~0.36 on Doubao text, hijacking auto).

--backend auto|cv2|lama chooses background reconstruction for the mask-based
marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is
the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges
behind). cv2 stays the zero-dependency fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(doubao): exact reverse-alpha removal from captured alpha map

The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so
given its alpha map the original pixels are recovered exactly:
original = (wm - a*logo)/(1-a) -- no inpaint hallucination.

The alpha map + logo colour were solved from real black+gray Doubao captures
on a controlled background: on black captured = a*logo, and the black/gray pair
solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white);
the white capture cross-validates (mark vanishes to a flat fill). Bundled as
assets/doubao_alpha.png + geometry constants.

remove_watermark_reverse_alpha applies it scaled to image width; exact at the
captured width, so the registry routes doubao through it only when
reverse_alpha_available (width within the calibrated band) and the mark is
detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual
inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions
to widen exact coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection

Per the principle that we only remove/detect what we can do exactly, the
visible-mark path is now reverse-alpha only:

- Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph
  silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4)
  -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP
  7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate.
- Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove,
  _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao
  outside the alpha resolution band is skipped, never inpainted.
- Removed samsung_engine.py + tests + --mark samsung (no alpha map captured;
  Samsung C2PA/genAIType metadata detection in identify is unaffected).
- The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region
  inpainting stays a user-directed tool, separate from the known-mark registry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution

A pure width-scale of the captured alpha map is only sub-pixel-accurate at the
captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha
now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED
scale+position search (_aligned_alpha_map) before inverting the blend, so the
single 2048 capture works at any resolution -- verified clean on the 1773x2364
(3:4) corpus size, the biggest coverage gap (23 files).

reverse_alpha_available is now just 'asset present' (no width band); the registry
still gates removal on detect so a clean corner is never touched. Drops the
_ALPHA_WIDTH_TOLERANCE gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(doubao): keep native recovery exact -- fixed geometry at captured width

Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the
otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39).
remove_watermark_reverse_alpha now uses exact width-relative geometry within
_ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best
of both: native back to 0.94, other resolutions still aligned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56)

On a faint/busy-background mark the NCC alignment peak can wander a few px off
the true mark and leave a residual (2/56 real corpus files). Off the captured
width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the
NCC-aligned alpha map, applies each, and keeps whichever leaves the least
residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins
on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip
now removes 56/56 detected Doubao clean across every corpus resolution (was 54).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* perf(doubao): skip residual inpaint at native width for exact recovery

At the captured width the fixed-geometry reverse-alpha is pixel-exact, so
inpainting over it only replaced exactly-recovered interior pixels with a
cv2 hallucination -- measured worse on a textured background (native error
vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on
full-footprint inpaint). Native now returns the bare recovery untouched;
off-native, where NCC alignment is only sub-pixel-approximate, the footprint
inpaint stays to clean the seam. Real round-trip still 56/56 across all
corpus resolutions; negatives 0/60, Gemini unaffected.

Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression
guard. Sync CLAUDE.md + README (the table cell and prose described the
pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the
session scheduled_tasks.lock, and add the text-protection research note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-05-29 19:49:09 -07:00
committed by GitHub
parent ef6fdaeeec
commit 58bdf51c59
17 changed files with 1148 additions and 266 deletions
+1
View File
@@ -34,6 +34,7 @@ yolov8n.pt
# Claude Code local settings
.claude/settings.local.json
.claude/scheduled_tasks.lock
# Doubao watermark calibration (local only; ship only the derived alpha-map asset).
# Synthetic seeds + raw Doubao captures are regenerable and not committed.
+9 -6
View File
File diff suppressed because one or more lines are too long
+10 -8
View File
@@ -17,7 +17,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
## Features
- **Visible watermark removal**Gemini / Nano Banana sparkle logo (reverse alpha blending) and the Doubao "豆包AI生成" text strip (locate + mask + inpaint); fast, offline, deterministic, no GPU. `visible --mark auto` picks the right one
- **Visible watermark removal**a registry of known marks in their usual places: the Gemini / Nano Banana sparkle and the Doubao "豆包AI生成" text strip. Each is removed by **exact reverse-alpha blending** against a captured alpha map (`original = (wm α·logo)/(1−α)`), recovering the true pixels rather than inpainting a guess. Fast, offline, no GPU. `visible --mark auto` finds and removes the strongest detected mark. (For arbitrary logos/objects, see `erase`.)
- **Universal region eraser (`erase`)** — remove any logo / watermark / object inside boxes you specify, regardless of position or colour. Default cv2 inpainting (CPU, instant); optional big-LaMa via onnxruntime (`lama` extra) for higher quality
- **Invisible watermark removal** — SynthID, StableSignature, TreeRing via diffusion-based regeneration (needs a local GPU, or run it with no setup on [raiw.cc](https://raiw.cc))
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
@@ -49,7 +49,9 @@ If this tool saves you time, consider [sponsoring its development](https://githu
| **xAI Grok (Aurora)** | — | — | ✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist` | Detected (`identify`); metadata strip |
| **Midjourney** | — | — | ✅ EXIF + XMP (prompt, model, seed) | Metadata strip |
| **Meta AI** | — | — | ✅ IPTC "Made with AI" (digitalSourceType) | Metadata strip (removes the label) |
| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label `<TC260:AIGC>` XMP **or** `AIGC` PNG chunk (China's mandatory AI labeling) | Locate + mask + inpaint (cv2, CPU) + metadata strip |
| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label (`<TC260:AIGC>` XMP **or** `AIGC` PNG chunk) **+ C2PA** signed by ByteDance Volcano Engine (`volcengine`) | Exact reverse-alpha (captured α map): pixel-exact at native width, NCC-aligned at other resolutions, + metadata strip |
| **Samsung Galaxy AI** (Generative Edit, Sketch to Image, ...) | — | — | ✅ C2PA (signer "Samsung Galaxy") + `trainedAlgorithmicMedia` / proprietary `genAIType` marker | Detected (`identify`) + metadata strip |
| **Black Forest Labs** (FLUX API) | — | — | ✅ C2PA (`Black Forest Labs API` + `c2pa.ai_generated_content` + `trainedAlgorithmicMedia`) | Metadata strip |
| **StableSignature** (Meta) | — | ✅ In-model watermark | — | Diffusion regeneration |
| **TreeRing** | — | ✅ Latent space watermark | — | Diffusion regeneration |
@@ -79,9 +81,9 @@ A three-stage NCC (Normalized Cross-Correlation) detector finds the watermark po
### Removing the Doubao "豆包AI生成" text watermark
Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. Unlike the fixed-size Gemini sparkle, it is a text strip that scales with image width, so we anchor a generous bottom-right box by geometry, extract the light low-saturation glyph pixels with a polarity-aware white top-hat mask, and inpaint them (cv2 Telea/NS). The mask is background-relative, so it leaves white-paper documents untouched instead of smearing their text. On dense-text backgrounds where the mask would explode, removal is skipped rather than guessed.
Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. It is a fixed semi-transparent white overlay, so — like the Gemini sparkle it is removed by **exact reverse-alpha blending**: `original = (watermarked - α·logo) / (1 - α)`, recovering the true pixels instead of hallucinating them. The α map and logo colour were solved from controlled black + gray captures (on black, `captured = α·logo`; the black/gray pair solves α per-pixel). At the captured width the placement is exact, so the recovery is returned untouched (inpainting over exactly-recovered pixels only degrades them). The single capture generalizes to any resolution: off the captured width an NCC scale-and-position search registers the α template to the actual mark, and a light residual inpaint cleans the sub-pixel seam there. Detection is consistent with removal: it matches the same alpha glyph silhouette against the corner (normalized correlation), so it keys on the actual "豆包AI生成" shape, not on textured corners.
**Speed**: ~0.03s per image. No GPU needed. Best on photo / illustration backgrounds; on high-contrast edges a faint residue can remain (use `erase --backend lama` for neural-quality fill).
**Speed**: ~0.05s, no GPU needed. Reverse-alpha at the captured resolution recovers the true background pixels exactly.
### Universal region eraser
@@ -237,9 +239,9 @@ remove-ai-watermarks batch ./images/ --mode all
# of a clean origin. Add --json for machine-readable output.
remove-ai-watermarks identify image.png
# Visible watermark only — fast, offline, CPU. --mark auto (default) picks
# between the Gemini sparkle and the Doubao "豆包AI生成" text strip; force one
# with --mark gemini / --mark doubao.
# Visible watermark only — fast, offline, CPU. --mark auto (default) finds the
# strongest known mark (Gemini sparkle / Doubao "豆包AI生成" text); force one
# with --mark gemini / doubao. Removed by exact reverse-alpha (true-pixel recovery).
remove-ai-watermarks visible image.png -o clean.png
# Erase arbitrary region(s) — universal, any logo/watermark/object, any position.
@@ -329,7 +331,7 @@ Tracked but not yet implemented:
- **Real non-PNG C2PA fixtures**. SynthID-source detection for JPEG / WebP / AVIF is currently covered only by synthetic byte blobs; replace with real vendor-emitted files to ground the binary-scan path.
- **Maintenance debt**. Strict pyright is now clean across `src/` (0 errors): pure-logic files are fully typed, the cv2 / torch / diffusers boundary files carry a documented per-file relax pragma, and a local `typings/piexif` stub covers piexif. Remaining: full-project `pyright` (no path) still OOMs node on this ML-heavy repo, so it must be scoped to `src/`; narrowing the boundary pragmas back toward full strict (as upstream stubs improve) is the long tail. (`uv-secure` is already clean since `idna` was bumped to 3.16.)
- **AVIF / HEIF `Exif` item inside the `meta` box**. An AI-label *XMP* packet in a `meta`-box item is now blanked in place (v0.6.9), but EXIF stored as a `meta`-box `Exif` *item* is still not removed — it needs full `iinf`/`iloc` surgery (offset rewrite, corruption risk) or `exiftool` (a non-bundled binary dependency). Low priority: the AI labels we target are XMP, not EXIF, so an EXIF-only meta-box case is rare.
- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic are mapped (each verified against a real signed file). Canon and Samsung Galaxy (AI-edit) are deferred until a real signed sample surfaces — no public direct-download C2PA file exists for them today (upload-to-verify / news-agency-licensed only).
- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic capture cameras are mapped (each verified against a real signed file); **Samsung Galaxy AI**, **Black Forest Labs (FLUX)**, and **ByteDance Volcano Engine** (Doubao / Jimeng) are now attributed too (verified on real signed files). Canon is still deferred until a real signed sample surfaces — no public direct-download C2PA file exists for it today (upload-to-verify / news-agency-licensed only).
- **Resemble PerTh audio detection** — evaluated, not feasible with the public API: `get_watermark()` returns a raw bit array with no presence/confidence flag, so watermarked vs. clean audio can't be reliably separated without Resemble's fixed payload or a confidence service. Same wall as the SynthID pixel detector.
- **Video pipeline (`noai-video`)**: per-frame inpainting and tracking for Sora 2 dynamic logo, Veo 3.1 badge, Kling, Runway. Separate package, not folded into this repo.
+138
View File
@@ -0,0 +1,138 @@
# Text protection research: crisp text under a "watermark removed everywhere" constraint
Date: 2026-05-29. Source: a deep-research run (104 agents, 5 search angles, sources
fetched and 3-vote adversarially verified). Not committed automatically — saved as a
research note for the next session.
## The constraint that frames everything
The invisible watermark (Google SynthID) must be removed **everywhere, including inside
text regions**. Therefore any technique that keeps or composites the **original
(watermarked) text pixels** is disqualified — the text must be *regenerated / freshly
synthesized* enough to scrub the watermark, yet rendered crisply. This single rule is the
filter applied to every candidate below.
## Problem recap
The `invisible` pipeline is SDXL base 1.0 img2img at low strength (~0.05) to defeat
SynthID with minimal visible change. Text is protected via Differential Diffusion with a
per-pixel change map (`preserve` ~0.9) driven by the PP-OCRv3 DB detector
(`text_protector.py`). Large text survives; **small text (sub ~8 px strokes) softens or
garbles** (issue #14, confirmed on real content).
## Executive summary
The fine-text softening is an **architectural consequence of latent-space processing, not
a tuning problem**: SDXL's 4-channel VAE (~48x compression) discards high-frequency signal
on encode, and Differential Diffusion blends in latent space with the change map
downsampled by 8x, so any stroke under ~8 px sits inside one latent cell and cannot be
preserved or edited cleanly **regardless of `preserve`** (the Differential Diffusion
authors state this limit explicitly). Two structurally sound directions keep the
"watermark removed everywhere" guarantee because they **synthesize fresh glyph pixels**
rather than compositing originals: (1) glyph/text-conditioned diffusion re-render of
detected text (AnyText2, EasyText), and (2) a two-stage architecture — global scrub, then
a dedicated text-restoration / text-aware super-resolution pass over detected regions
(TIGER, TextSR, TeReDiff/TAIR). **EasyText** and **TextSR** are the most promising for this
CJK-first pipeline (both multilingual via DiT/ByT5, both regenerate from glyph or
character-shape priors). The deepest fix — a 16-channel (SD3/FLUX) VAE — materially reduces
the softening but means switching the base model, not a drop-in VAE swap.
## Constraint reconciliation (important)
The generic research "quick win: bump `preserve` toward 1.0" is **invalid under our hard
constraint**: raising `preserve` freezes the text region, so SynthID there is **not
scrubbed**. Likewise, pixel paste-back of the original text is disqualified. The only
constraint-compatible quick win is **higher resolution / tiled diffusion** (strokes span
more latent cells, less VAE softening, while the text is still fully regenerated and thus
scrubbed). The real answer is **regenerate text crisply**, not freeze it.
## Findings (with confidence and sources)
### Finding 1 — confidence: high
**Claim.** The small-text softening is an architectural latent-space limit, not a tuning issue. SDXL's VAE compressively encodes (losing exact color and fine detail on every round-trip), and Differential Diffusion blends in latent space with the change map downsampled to latent resolution (8x), so the method explicitly caps edit/preserve granularity at ~8 px under SD settings. Text strokes below one latent cell cannot be cleanly preserved even at preserve ~0.9.
**Evidence.** Differential Diffusion's paper states a "cap on the resolution of the change map ... can limit the ability to precisely edit small objects (less than 8 pixels for Stable-Diffusion's settings)"; the official SDXL pipeline downsamples the map by `vae_scale_factor=8` and blends `latents = original*mask + latents*(1-mask)` in latent space. The VAE encode is "compressive ... exact color qualities and exact visual fine-details are lost." arXiv:2512.05198 confirms "resizing the pixel mask to latent resolution discards fine structure ... downsamples by 1/8" and that linear latent blending "cannot be pixel-equivalent." Higher compression = more high-frequency loss (arXiv:2305.02541).
**Sources.** https://onlinelibrary.wiley.com/doi/10.1111/cgf.70040 · https://differential-diffusion.github.io/ · https://github.com/exx8/differential-diffusion · https://arxiv.org/abs/2512.05198 · https://omriavrahami.com/blended-latent-diffusion-page/ · https://arxiv.org/pdf/2305.02541
### Finding 2 — confidence: low (do not build on it yet)
**Claim.** Pixel-space differential / blended-latent variants exist as a research direction, but the specific full-resolution-mask solution (PELC/DecFormer, arXiv:2512.05198) was NOT verified to deliver its claimed seam/edge improvements.
**Evidence.** arXiv:2512.05198 argues linear latent blending is not pixel-equivalent and proposes decoder-equivariant compositing; PixPerfect (arXiv:2512.03247) does pixel-space refinement of chromatic shifts at edit boundaries. But the specific PELC full-resolution-mask and DecFormer "53% error reduction" claims were **refuted on adversarial vote (0-3 and 1-2)**. Treat pixel-equivalent latent compositing as an emerging idea to watch, not a production fix.
**Sources.** https://arxiv.org/abs/2512.05198 · https://arxiv.org/abs/2512.03247
### Finding 3 — confidence: high
**Claim.** Glyph/text-conditioned diffusion can re-render detected text as freshly synthesized pixels (not copied), which inherently scrubs any watermark in the text region while rendering glyphs crisply. AnyText/AnyText2 inject text-rendering into a pretrained T2I model and support generation AND editing of existing scene images; multilingual including CJK and English.
**Evidence.** AnyText2 "enables precise control over multilingual text attributes in natural scene image generation and editing" (WriteNet+AttnX); +3.3% (Chinese) / +9.3% (English) accuracy over AnyText v1. AnyText "can be plugged into existing diffusion models ... for rendering or editing text" and synthesizes text latent features through diffusion (fresh pixels), supporting zh/en/ja/ko/ar/bn/hi. **Caveat:** both are SD1.5-based, so NOT a drop-in into the SDXL scrub (separate base model); AnyText's own limitation: "the inpainting manner ... impedes editing quality on small text," and it ranks weak on STRICT (EMNLP 2025) — small-text crispness not guaranteed.
**Sources.** https://github.com/tyxsspa/AnyText2 · https://arxiv.org/abs/2411.15245 · https://arxiv.org/abs/2311.03054
### Finding 4 — confidence: high
**Claim.** EasyText is a strong glyph-conditioned re-render candidate: built on the FLUX-dev DiT framework with LoRA tuning, renders compact per-character glyph patches (64px-high adaptive for alphabetic, 64x64 for logographic) concatenated in latent space, supports 10+ languages including Chinese, Japanese, Korean, Thai, Vietnamese, Greek, and Latin.
**Evidence.** AAAI 2025 + arXiv:2505.24417: "implemented based on the open-source FLUX-dev framework with LoRA-based parameter-efficient tuning," VAE and text encoder frozen, two-stage 512->1024 training. Glyph conditioning via "64-pixel-high images ... adaptive widths for alphabetic; fixed 64x64 for logographic," VAE-encoded and concatenated with denoised latents, "less than one-tenth the spatial size of layout-matching methods." FLUX-based (16-channel VAE, DiT) also sidesteps the SDXL 4-channel wall. Fresh-pixel generation preserves the watermark-removal guarantee. Cyrillic/Arabic crispness not separately benchmarked.
**Sources.** https://arxiv.org/html/2505.24417 · https://ojs.aaai.org/index.php/AAAI/article/view/37697
### Finding 5 — confidence: high
**Claim.** A two-stage "global watermark scrub then text-restoration pass" architecture is validated by recent literature, and the restoration stage can synthesize glyph pixels from priors (no original-pixel reintroduction). TIGER reconstructs stroke geometry then injects it as guidance into full-image super-resolution; TextSR uses a detector + multilingual OCR to regenerate text from character-shape priors; TeReDiff/TAIR couples a jointly-trained text-spotter with diffusion.
**Evidence.** TIGER (arXiv:2510.21590): "a diffusion-based local text refiner ... reconstructing fine-grained stroke geometry ... injected as conditional guidance into the subsequent full-image restoration." TextSR (arXiv:2505.23119, Google): "leverages a text detector ... then employs OCR to extract multilingual text," regenerating from "multilingual character-to-shape diffusion priors" that "produce character shapes solely based on text prompts, even without visual input" — fresh pixels. TAIR/TeReDiff (ICLR 2026): standard restoration "frequently generates plausible but incorrect textures"; TeReDiff feeds text-spotter outputs back as prompts. **Caveat:** TIGER orders text-first then global (reverse of scrub-then-text); these target degraded-input super-resolution, not watermark removal, so the SynthID-scrub of the restoration stage must be verified empirically (the stages are themselves diffusion-based, so fresh-pixel = no SynthID is plausible but unproven here).
**Sources.** https://arxiv.org/html/2510.21590v1 · https://arxiv.org/html/2505.23119v1 · https://cvlab-kaist.github.io/TAIR/ · https://arxiv.org/abs/2506.09993
### Finding 6 — confidence: high
**Claim.** Switching to a 16-channel VAE (SD3/FLUX class) materially reduces small-text/latent softening vs SDXL's 4-channel VAE, but it requires switching the base model — not a drop-in latent swap into an SDXL UNet img2img pipeline. RAE approaches are DiT-native and likewise not drop-in.
**Evidence.** SD3/FLUX moved from 4-channel (48x) to 16-channel (12x) VAEs specifically to preserve fine detail (diffusers Discussion #8713; madebyollin VAE notes; arXiv:2305.02541). RAE (arXiv:2510.11690) "should be the new default for diffusion transformer training" but produces high-dimensional latents needing a DiT wide-DDT head — NOT compatible with an SDXL 4-channel UNet. EasyText shows the practical path: adopt a FLUX-DiT base rather than retrofit SDXL. The VAE upgrade couples to a base-model migration.
**Sources.** https://arxiv.org/abs/2510.11690 · https://arxiv.org/pdf/2305.02541 · https://arxiv.org/html/2505.24417
## Recommendation
Under the hard constraint, the correct architecture is **not "protect text during the
scrub" (Differential Diffusion)** but **"scrub everywhere, then restore text crisply by
regeneration"**:
1. Global SDXL scrub with text protection OFF (text region is scrubbed too).
2. On detected text regions, a **glyph-conditioned restoration** that re-renders the same
glyphs as fresh pixels (no original reused).
This is the only path that delivers both "watermark everywhere" and crisp text.
**Top-2 to prototype:**
- **TextSR** — detector + multilingual OCR + character-shape diffusion priors; closest to
the existing detector-driven pipeline.
- **EasyText** — FLUX-DiT glyph re-render, multilingual incl. CJK; also gets the 16-channel
VAE for free.
**Honest costs / unknowns:** this is a re-architecture, not a quick fix. It needs a new
**OCR-recognition** step (we currently only detect text; we must know *what* to re-render).
Models are FLUX/DiT-class (heavy) -> serverless GPU. Maturity is research-grade; CJK is
covered, Cyrillic/Arabic crispness is not separately benchmarked -> a prototype must
measure real fidelity. The restoration stage being diffusion-based makes "fresh pixels =
no SynthID" plausible but **must be verified empirically** (run the SynthID oracle on the
restored output).
**Constraint-compatible quick win to try first:** run the global scrub at **higher
resolution / tiled** so strokes exceed the latent cell — less softening, full scrub, no
freezing. Cheap to test; quantify recall/quality vs cost.
**Do not pursue:** raising `preserve` toward 1.0 or pixel paste-back (both leave original
watermarked pixels in text); PELC/DecFormer pixel-equivalent latent compositing (refuted,
not production-ready).
## Provenance
Deep-research workflow run `wf_118b9a03-3eb` (2026-05-29). Findings adversarially verified
(2/3 refutes required to kill a claim). This note records research only; no code change is
implied until a prototype validates fidelity and the SynthID-scrub guarantee on the
restored output.
Binary file not shown.

After

Width:  |  Height:  |  Size: 8.0 KiB

+46 -113
View File
@@ -20,12 +20,12 @@ from rich.panel import Panel
from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeElapsedColumn
from rich.table import Table
from remove_ai_watermarks import __version__
from remove_ai_watermarks import __version__, watermark_registry
if TYPE_CHECKING:
from numpy.typing import NDArray
from remove_ai_watermarks.gemini_engine import DetectionResult, GeminiEngine
from remove_ai_watermarks.gemini_engine import DetectionResult
console = Console()
@@ -133,72 +133,6 @@ def _write_bgr_with_alpha(
image_io.imwrite(path, bgra)
def _run_doubao_if_selected(
ctx: click.Context,
image: NDArray[Any],
alpha: NDArray[Any] | None,
output: Path,
mark: str,
gemini_engine: GeminiEngine,
detect: bool,
detect_threshold: float,
inpaint_method: str,
strip_metadata: bool,
) -> bool:
"""Run the Doubao text-strip removal path when it is the selected mark.
Returns True when this path handled the image (caller should stop). In
``auto`` mode the Doubao detector competes with the Gemini detector and wins
only when it is both positive and at least as confident.
"""
from remove_ai_watermarks.doubao_engine import DoubaoEngine
doubao = DoubaoEngine()
d_det = doubao.detect(image)
if mark == "auto":
g_det = gemini_engine.detect_watermark(image)
use_doubao = d_det.detected and d_det.confidence >= g_det.confidence
console.print(
f" [dim]Mark auto:[/] gemini={g_det.confidence:.2f} doubao={d_det.confidence:.2f} "
f"-> {'doubao' if use_doubao else 'gemini'}"
)
else:
use_doubao = mark == "doubao"
if not use_doubao:
return False
if detect and not d_det.detected and d_det.confidence < detect_threshold:
console.print(
f" [yellow]⚠[/] Doubao mark not detected [dim](coverage {d_det.coverage:.1%}). "
f"Use --no-detect to force.[/]"
)
raise SystemExit(0)
method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
t0 = time.monotonic()
with console.status("[cyan]Removing Doubao watermark…[/]"):
result = doubao.remove_watermark(image, inpaint_method=method)
elapsed = time.monotonic() - t0
output.parent.mkdir(parents=True, exist_ok=True)
_write_bgr_with_alpha(output, result, alpha, clear_region=d_det.region)
if strip_metadata:
try:
from remove_ai_watermarks.metadata import remove_ai_metadata
remove_ai_metadata(output, output)
except Exception as e:
if ctx.obj.get("verbose"):
console.print(f" [yellow]⚠[/] Failed to strip metadata: {e}")
size_kb = output.stat().st_size / 1024
console.print(f" [green]✓[/] Doubao mark removed → {output} [dim]({size_kb:.0f} KB, {elapsed:.2f}s)[/]")
return True
# ── Main group ───────────────────────────────────────────────────────
@@ -238,9 +172,10 @@ def main(ctx: click.Context, verbose: bool) -> None:
@click.option("--detect-threshold", type=float, default=0.25, help="Detection confidence threshold.")
@click.option(
"--mark",
type=click.Choice(["auto", "gemini", "doubao"]),
type=click.Choice(["auto", *watermark_registry.mark_keys()]),
default="auto",
help="Which visible mark to target. auto picks the stronger of the two detectors.",
help="Which known visible mark to target (auto picks the strongest detected). "
"All marks are removed by exact reverse-alpha against a captured alpha map.",
)
@click.option("--strip-metadata/--keep-metadata", default=True, help="Strip AI metadata from output.")
@click.pass_context
@@ -256,13 +191,14 @@ def cmd_visible(
mark: str,
strip_metadata: bool,
) -> None:
"""Remove a visible AI watermark from an image.
"""Remove a known visible AI watermark from an image.
Targets the Gemini sparkle logo (reverse alpha blending) or the Doubao
"豆包AI生成" text strip (locate -> mask -> inpaint). Fast, deterministic,
offline. ``--mark auto`` picks whichever detector fires stronger.
Finds a known mark in its usual place (Gemini sparkle / Doubao text) via the
watermark registry and removes it by exact reverse-alpha against a captured
alpha map -- recovering the true pixels, not an inpaint guess. ``--mark auto``
picks the strongest detected mark. For arbitrary logos/objects, use ``erase``.
"""
from remove_ai_watermarks.gemini_engine import GeminiEngine
from remove_ai_watermarks import watermark_registry as registry
_banner()
source = _validate_image(source)
@@ -270,8 +206,6 @@ def cmd_visible(
if output is None:
output = source.with_stem(source.stem + "_clean")
engine = GeminiEngine()
# Load image (preserving any alpha channel separately)
image, alpha = _read_bgr_and_alpha(source)
if image is None:
@@ -281,45 +215,44 @@ def cmd_visible(
h, w = image.shape[:2]
console.print(f" [dim]Input:[/] {source.name} ({w}x{h})")
# Resolve which visible mark to target, then run the Doubao path if chosen.
if _run_doubao_if_selected(
ctx, image, alpha, output, mark, engine, detect, detect_threshold, inpaint_method, strip_metadata
):
return
# Detection (we always detect softly, to find dynamic region for inpainting)
with console.status("[cyan]Detecting watermark…[/]"):
det = engine.detect_watermark(image)
if detect:
if det.detected:
console.print(
f" [green]✓[/] Watermark detected "
f"[dim](confidence: {det.confidence:.1%}, "
f"spatial: {det.spatial_score:.3f}, "
f"gradient: {det.gradient_score:.3f})[/]"
)
else:
console.print(f" [yellow]⚠[/] Watermark not detected [dim](confidence: {det.confidence:.1%})[/]")
if det.confidence < detect_threshold:
console.print(" [dim]Skipping. Use --no-detect to force removal.[/]")
# Resolve the target mark from the known-watermark registry. ``auto`` scans
# every in-auto mark in its usual place and picks the strongest; an explicit
# ``--mark <key>`` targets that one (the user asserts its presence).
if mark == "auto":
best = registry.best_auto_mark(image)
if best is None:
console.print(" [yellow]⚠[/] No known visible mark detected (gemini / doubao).")
if detect:
console.print(" [dim]Skipping. Use --mark <name> --no-detect to force.[/]")
raise SystemExit(0)
target = "gemini" # forced (no-detect): fall back to the default mark
else:
target = best.key
console.print(f" [dim]Mark auto:[/] {best.label} [dim]({best.location}, conf {best.confidence:.2f})[/]")
else:
target = mark
# Removal
chosen = registry.get_mark(target)
det = chosen.detect(image)
if detect and not det.detected:
console.print(
f" [yellow]⚠[/] {chosen.label} not detected "
f"[dim](conf {det.confidence:.2f}). Use --no-detect to force.[/]"
)
raise SystemExit(0)
if det.detected:
console.print(f" [green]✓[/] {chosen.label} detected [dim]({chosen.location}, conf {det.confidence:.2f})[/]")
method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
t0 = time.monotonic()
region: tuple[int, int, int, int] | None = None
with console.status("[cyan]Removing watermark…[/]"):
result = engine.remove_watermark(image)
if inpaint:
region = _watermark_region(det, w, h)
result = engine.inpaint_residual(
result,
region,
strength=inpaint_strength,
method=inpaint_method,
)
with console.status(f"[cyan]Removing {chosen.label}… ({chosen.recovery})[/]"):
result, region = chosen.remove(
image,
inpaint_method=method,
inpaint=inpaint,
inpaint_strength=inpaint_strength,
force=not detect,
)
elapsed = time.monotonic() - t0
# Save (preserves transparency by clearing alpha in the watermark region)
+210 -74
View File
@@ -1,29 +1,24 @@
"""Doubao visible watermark removal engine.
Doubao (ByteDance) stamps every generated image with a visible "豆包AI生成"
(Doubao AI generated) text strip in the bottom-right corner. This is the
explicit AIGC label mandated by China's TC260 standard, rendered as a
near-white / light-gray, low-saturation text overlay.
(Doubao AI generated) text strip in the bottom-right corner -- the explicit AIGC
label mandated by China's TC260 standard, a near-white semi-transparent overlay.
Unlike the Gemini sparkle (a fixed square logo removed by reverse alpha
blending against a captured alpha map), the Doubao mark is a text strip whose
exact alpha map we do not yet have. This engine therefore removes it by:
Like the Gemini sparkle, it is a fixed overlay, so it is removed by **exact
reverse-alpha blending** against a captured alpha map (``remove_watermark_reverse_alpha``):
``original = (wm - a*logo)/(1-a)`` -- recovering the true pixels, not an inpaint
guess. The alpha map + logo colour were solved from black+gray Doubao captures
(see data/doubao_capture/ and the reverse-alpha section below) and bundled as
``assets/doubao_alpha.png``.
locate -> mask -> inpaint
Detection (``detect``) is reverse-alpha-consistent: it matches that same alpha
glyph silhouette against the corner via normalized correlation, so it keys on
the actual "豆包AI生成" shape rather than coverage/structure heuristics.
1. Locate: the mark scales with image WIDTH and sits in the bottom-right at a
fixed margin, so we anchor a generous box there (geometry only -- no bundled
template). Constants below are derived from measured Doubao output.
2. Mask: within the box, extract the light, low-saturation glyph pixels with a
polarity-aware rule (the mark is brighter than dark backgrounds and a
distinct off-white gray against light backgrounds).
3. Inpaint: cv2 inpainting (TELEA / NS) reconstructs the covered pixels.
This is fast, offline, deterministic, and needs no GPU. A future upgrade path
is per-pixel reverse alpha blending once a Doubao alpha map is captured on a
controlled black background (see data/doubao_capture/), which would recover the
true pixels instead of hallucinating them -- the same approach as the Gemini
engine.
``locate`` (geometry box, scales with image WIDTH) and ``extract_mask`` (the
candidate glyph mask the detector correlates) remain; there is no inpaint-based
removal here -- arbitrary-region inpainting lives in ``region_eraser`` / the
``erase`` command. Fast, offline, no GPU.
"""
# cv2/numpy boundary: third-party libs ship no usable element types; relax the
@@ -33,7 +28,7 @@ from __future__ import annotations
import logging
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any, Literal
from typing import TYPE_CHECKING, Any
import cv2
import numpy as np
@@ -66,17 +61,63 @@ MAX_SATURATION = 55 # max channel spread to count a pixel as "grayish"
LOGO_MIN_LUMA = 150 # glyphs are at least this bright in absolute terms
TOPHAT_DELTA = 12 # glyph must exceed the local background by this many levels
# Detection: a genuine label fills a meaningful fraction of the box. Measured
# coverage is >=0.20 on real Doubao outputs; random/textured corners stay <=0.06
# on large images but can spike to ~0.15 on tiny ones (small box -> high variance),
# so the threshold sits above that spike and below the real-mark floor.
DETECT_MIN_COVERAGE = 0.16
# Detection is reverse-alpha-consistent: the mark is recognized by matching the
# bundled alpha-template glyph silhouette (assets/doubao_alpha.png -- the exact
# shape we invert) against the extracted candidate mask via zero-mean normalized
# correlation (cv2 TM_CCOEFF_NORMED). It keys on the actual "豆包AI生成" glyph
# SHAPE, not on coverage/structure heuristics, so a merely-textured corner does
# not fire (the old coverage detector false-positived on ~28% of images; #23).
# Corpus-tuned: real marks score median ~0.61, arbitrary corners <=0.17 (p99);
# threshold 0.4 -> false positives 7/1243 (0.6%). A small coverage floor skips
# the template match on a near-empty candidate box.
DETECT_MIN_COVERAGE = 0.04
DETECT_NCC_THRESHOLD = 0.4
# Safety: a text strip fills a modest slice of the (generous) box. When the box
# is over a dense-text / document background the mask explodes and cv2 inpainting
# would smear the real content. Above this coverage we refuse to inpaint and
# leave the image untouched -- that hard case needs the neural path, not a guess.
MAX_INPAINT_COVERAGE = 0.50
# ── Reverse-alpha (exact recovery, Gemini-style) ─────────────────────
# The Doubao mark is a fixed semi-transparent white overlay, so given its alpha
# map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a).
# The alpha map + logo colour were solved from black+gray Doubao captures on a
# controlled background (data/doubao_capture/): on black, captured = a*logo, and
# the black/gray pair solves a per-pixel WITHOUT assuming the logo colour. The
# bundled asset (assets/doubao_alpha.png) is the alpha template (a*255) at the
# captured width. The mark scales with image WIDTH, but a pure width-scale is
# only sub-pixel-accurate at the captured width and ghosts elsewhere, so removal
# does NOT trust fixed geometry: `_aligned_alpha_map` registers the template to
# the actual mark by a TM_CCOEFF_NORMED scale+position search, which makes the
# single capture work at any resolution (verified clean on 1773x2364). Verified
# 2026-05-29: white-capture cross-check -> mark vanishes to a flat fill; clean on
# doubao-1.png (2048) and the 3:4 portrait corpus size.
_ALPHA_NATIVE_WIDTH = 2048
_ALPHA_LOGO_BGR: tuple[float, float, float] = (252.0, 255.0, 255.0)
_ALPHA_WIDTH_FRAC = 0.1572 # glyph width / image width -- the alignment scale seed
_ALPHA_HEIGHT_FRAC = 0.0347
# Margins (of image WIDTH) of the captured mark -- the geometry record / where to
# seed; alignment refines the actual position, so these are not load-bearing.
_ALPHA_MARGIN_RIGHT_FRAC = 0.0166
_ALPHA_MARGIN_BOTTOM_FRAC = 0.0195
# Alignment scale search (np.linspace args) around the width-scaled glyph size.
_ALPHA_ALIGN_SEARCH = (0.88, 1.12, 13)
# At (near) the captured width the fixed geometry is pixel-exact, so we use it
# directly there -- NCC alignment is integer-pixel and would land ~1px off,
# degrading the otherwise-exact native recovery. Off this band, alignment wins.
_ALPHA_NATIVE_BAND = 0.03
_alpha_template_cache: NDArray[Any] | None = None
def _alpha_template() -> NDArray[Any] | None:
"""Lazily load the bundled Doubao alpha template (float [0,1]), or None."""
global _alpha_template_cache
if _alpha_template_cache is None:
from pathlib import Path
from remove_ai_watermarks import image_io
path = Path(__file__).parent / "assets" / "doubao_alpha.png"
img = image_io.imread(str(path), cv2.IMREAD_GRAYSCALE)
if img is None:
return None
_alpha_template_cache = img.astype(np.float32) / 255.0
return _alpha_template_cache
@dataclass(frozen=True)
@@ -104,6 +145,39 @@ class DoubaoDetection:
coverage: float = 0.0 # fraction of the box occupied by glyph pixels
_silhouette_cache: NDArray[Any] | None = None
def _glyph_silhouette() -> NDArray[Any] | None:
"""Binary "豆包AI生成" silhouette (255 = glyph) from the bundled alpha map,
used as the detection template. None if the alpha asset is missing."""
global _silhouette_cache
if _silhouette_cache is None:
at = _alpha_template()
if at is None:
return None
_silhouette_cache = (at > 0.15).astype(np.uint8) * 255
return _silhouette_cache
def _template_match_score(box_mask: NDArray[Any], image_width: int) -> float:
"""Zero-mean normalized correlation of the alpha-template glyph silhouette
(scaled to the mark's expected size) against the candidate ``box_mask``.
TM_CCOEFF_NORMED keys on glyph SHAPE, not coverage, so a dense textured
corner does not score highly -- only the actual "豆包AI生成" shape does.
"""
sil = _glyph_silhouette()
if sil is None or box_mask.size == 0:
return 0.0
gw = min(box_mask.shape[1] - 1, max(8, int(_ALPHA_WIDTH_FRAC * image_width)))
gh = min(box_mask.shape[0] - 1, max(4, int(_ALPHA_HEIGHT_FRAC * image_width)))
if gw < 8 or gh < 4:
return 0.0
template = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST)
return float(cv2.matchTemplate(box_mask, template, cv2.TM_CCOEFF_NORMED).max())
class DoubaoEngine:
"""Remove the visible Doubao "豆包AI生成" watermark (locate -> mask -> inpaint)."""
@@ -176,10 +250,12 @@ class DoubaoEngine:
# ── Detect ────────────────────────────────────────────────────────
def detect(self, image: NDArray[Any]) -> DoubaoDetection:
"""Detect the visible Doubao mark by glyph coverage in the corner box.
"""Detect the visible Doubao mark by matching the alpha-template glyph
silhouette against the corner candidate (TM_CCOEFF_NORMED).
Heuristic: a genuine label fills a meaningful fraction of the box with
text-like glyph pixels. Coverage maps to a confidence score.
Keys on the "豆包AI生成" SHAPE, not coverage, so a textured corner does
not fire. ``confidence`` is the correlation score; ``detected`` is it
clearing ``DETECT_NCC_THRESHOLD``.
"""
det = DoubaoDetection()
if image is None or image.size == 0:
@@ -191,53 +267,113 @@ class DoubaoEngine:
coverage = float((box > 0).sum()) / float(max(1, bw * bh))
det.region = loc.bbox
det.coverage = coverage
# Map coverage to a 0-1 confidence: ~0.06 (noise floor) -> 0, ~0.26 -> 1.
det.confidence = float(max(0.0, min(1.0, (coverage - 0.06) / 0.20)))
det.detected = coverage >= DETECT_MIN_COVERAGE
logger.debug("Doubao detect: coverage=%.3f conf=%.3f", coverage, det.confidence)
if coverage >= DETECT_MIN_COVERAGE:
score = _template_match_score(box, image.shape[1])
det.confidence = score
det.detected = score >= DETECT_NCC_THRESHOLD
logger.debug("Doubao detect: coverage=%.3f ncc=%.2f detected=%s", coverage, score, det.detected)
return det
# ── Remove ────────────────────────────────────────────────────────
# ── Reverse-alpha (exact recovery) ────────────────────────────────
def remove_watermark(
self,
image: NDArray[Any],
*,
inpaint_method: Literal["telea", "ns"] = "telea",
inpaint_radius: int = 6,
dilate: int = 3,
) -> NDArray[Any]:
"""Remove the visible Doubao watermark by inpainting the glyph mask.
def reverse_alpha_available(self, image: NDArray[Any]) -> bool:
"""True if the bundled alpha map is loadable. Sub-pixel NCC alignment
(see ``_aligned_alpha_map``) places it on the actual mark at ANY
resolution, so there is no width gate -- the caller still gates on
``detect`` so a clean corner is never touched."""
return image is not None and image.size > 0 and _alpha_template() is not None
Returns an unmodified copy when no glyph pixels are found (so we never
smear a clean corner). ``dilate`` grows the mask to cover anti-aliased
glyph edges before inpainting.
"""
if image is None or image.size == 0:
return image
def _fixed_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None:
"""Place the template by fixed width-relative geometry -- pixel-exact at
the captured width (used there instead of integer-pixel NCC alignment)."""
at = _alpha_template()
if at is None:
return None
h, w = image.shape[:2]
gw, gh = max(1, int(_ALPHA_WIDTH_FRAC * w)), max(1, int(_ALPHA_HEIGHT_FRAC * w))
ax = max(0, w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw)
ay = max(0, h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh)
amap = np.zeros((h, w), np.float32)
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
return amap, (ax, ay, gw, gh)
def _aligned_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None:
"""Build a full-image alpha map with the captured template registered to
the actual mark via a TM_CCOEFF_NORMED scale + position search -- so the
single capture works off the captured width (a pure width-scale ghosts).
Returns ``(alpha_map, glyph_bbox)`` or None."""
at = _alpha_template()
sil = _glyph_silhouette()
if at is None or sil is None:
return None
h, w = image.shape[:2]
loc = self.locate(image)
mask = self.extract_mask(image, loc)
if not mask.any():
logger.debug("Doubao remove: no glyph pixels found; returning copy")
bx, by, bw, bh = loc.bbox
box_mask = self.extract_mask(image, loc)[by : by + bh, bx : bx + bw]
expected = _ALPHA_WIDTH_FRAC * w
best: tuple[float, int, int, int, int] | None = None
for scale in np.linspace(*_ALPHA_ALIGN_SEARCH):
gw, gh = int(expected * scale), int(_ALPHA_HEIGHT_FRAC * w * scale)
if gw < 8 or gh < 4 or gw >= bw or gh >= bh:
continue
t = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST)
_, score, _, top_left = cv2.minMaxLoc(cv2.matchTemplate(box_mask, t, cv2.TM_CCOEFF_NORMED))
if best is None or score > best[0]:
best = (score, gw, gh, top_left[0], top_left[1])
if best is None:
return None
_, gw, gh, ox, oy = best
ax, ay = bx + ox, by + oy
amap = np.zeros((h, w), np.float32)
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
return amap, (ax, ay, gw, gh)
def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]:
"""Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``."""
a3 = np.clip(amap, 0.0, 1.0)[:, :, None]
logo = np.array(_ALPHA_LOGO_BGR, np.float32)
return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]:
"""Recover the original pixels by inverting the alpha blend
``original = (wm - a*logo)/(1-a)``.
Placement: at (near) the captured width the fixed geometry is pixel-exact,
so the recovery is returned UNTOUCHED -- inpainting over exactly-recovered
interior pixels only swaps them for a cv2 hallucination (measured worse on
textured backgrounds: native error vs true bg 1.6 reverse-alpha-only vs
2.6 with full-footprint inpaint). Off-native, NCC alignment registers the
template to the real mark; the alignment is only sub-pixel-approximate, so
the interior recovery is no longer exact and the seam can re-trip the
detector. There we try BOTH placements and keep whichever leaves the least
residual mark (on a faint/busy-background mark the NCC peak can wander a
few px, where geometry wins; on a clear mark alignment wins) -- no magic
threshold, it just picks the better removal -- then a residual inpaint over
the glyph footprint cleans the seam (the interior is approximate anyway, so
inpaint there costs nothing and reliably clears the mark).
Call only when :meth:`reverse_alpha_available` and the mark is detected.
"""
at_native = abs(image.shape[1] / _ALPHA_NATIVE_WIDTH - 1.0) <= _ALPHA_NATIVE_BAND
if at_native:
amap = self._fixed_alpha_map(image)
return self._apply_reverse_alpha(image, amap[0]) if amap is not None else image.copy()
maps = [c for c in (self._fixed_alpha_map(image), self._aligned_alpha_map(image)) if c is not None]
if not maps:
return image.copy()
x, y, bw, bh = loc.bbox
coverage = float((mask[y : y + bh, x : x + bw] > 0).sum()) / float(max(1, bw * bh))
if coverage > MAX_INPAINT_COVERAGE:
logger.warning(
"Doubao remove: box coverage %.2f exceeds %.2f (dense-text/document "
"background); leaving image untouched to avoid smearing content",
coverage,
MAX_INPAINT_COVERAGE,
)
best_out: NDArray[Any] | None = None
best_amap: NDArray[Any] | None = None
best_residual = float("inf")
for amap, _region in maps:
out = self._apply_reverse_alpha(image, amap)
residual = self.detect(out).confidence
if residual < best_residual:
best_residual, best_out, best_amap = residual, out, amap
if best_out is None or best_amap is None: # pragma: no cover - maps is non-empty
return image.copy()
if dilate > 0:
k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2 * dilate + 1, 2 * dilate + 1))
mask = cv2.dilate(mask, k)
flag = cv2.INPAINT_TELEA if inpaint_method == "telea" else cv2.INPAINT_NS
return cv2.inpaint(image, mask, inpaint_radius, flag)
if residual_inpaint:
rm = cv2.dilate((best_amap > 0.10).astype(np.uint8) * 255, np.ones((3, 3), np.uint8))
best_out = cv2.inpaint(best_out, rm, 3, cv2.INPAINT_TELEA)
return best_out
def load_image_bgr(path: str | Path) -> NDArray[Any]:
+64 -8
View File
@@ -25,14 +25,15 @@ from typing import TYPE_CHECKING
from remove_ai_watermarks.metadata import (
AI_METADATA_KEYS,
AIGC_MARKERS,
C2PA_UUID,
IPTC_AI_FIELD_MARKERS,
IPTC_AI_MARKERS,
aigc_label,
c2pa_marker_in,
exif_generator,
get_ai_metadata,
huggingface_job,
iptc_ai_system,
samsung_genai,
scan_head,
xai_signature,
)
@@ -65,6 +66,8 @@ _ISSUER_PLATFORM: tuple[tuple[str, str], ...] = (
("OpenAI", "OpenAI (ChatGPT / gpt-image / DALL-E / Sora)"),
("Google", "Google (Gemini / Imagen)"),
("Stability AI", "Stability AI (Stable Image / DreamStudio)"),
("Black Forest Labs", "Black Forest Labs (FLUX)"),
("ByteDance", "ByteDance (Doubao / Jimeng / Volcano Engine)"),
)
# PNG-text / EXIF keys that indicate a local diffusion pipeline (vs. a hosted
@@ -95,6 +98,12 @@ _HF_JOB_CAVEAT = (
"generation) but names neither the model nor the content type, so it is a "
"medium-confidence signal, not proof the pixels are AI-generated."
)
_SAMSUNG_GENAI_CAVEAT = (
"Samsung's genAIType marker shows a Galaxy AI editing tool (Generative Edit, "
"Sketch to Image, ...) touched the image; it is an undocumented proprietary "
"field, so it is a medium-confidence signal of AI editing, not proof the "
"whole image is AI-generated."
)
@dataclass
@@ -151,7 +160,9 @@ def _ai_tools_in(data: bytes) -> list[str]:
# assert is_ai on their own (the verdict still comes from the digital-source-type:
# the Pixel sample carries `computationalCapture`, not `trainedAlgorithmicMedia`).
# Only tokens verified against a real signed file are listed (Leica, Nikon,
# Truepic, Google Pixel); add Sony/Canon/Samsung/Bria as real samples are captured.
# Sony, Truepic, Google Pixel); add Canon/Bria as real samples are captured.
# Samsung Galaxy is an AI-capable editing device, not a pure-capture camera, so
# it lives in `_SIGNER_C2PA_PLATFORM` below (it must not feed the camera clash).
_DEVICE_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = (
(b"lc_c2pa", "Leica (camera, C2PA capture)"),
(b"Leica Camera", "Leica (camera, C2PA capture)"),
@@ -177,6 +188,32 @@ def _device_platform(head: bytes) -> str | None:
return None
# C2PA signers that are an editing app or AI-capable device rather than a
# verified-capture camera. Unlike `_DEVICE_C2PA_PLATFORM`, these do NOT feed the
# camera-vs-AI integrity clash (rule 2 in `_integrity_clashes`): a Galaxy phone
# legitimately stamps BOTH its device credentials AND a `trainedAlgorithmicMedia`
# source type on a Generative-Edit image, so treating it as a "genuine camera
# capture" would false-flag every Galaxy AI edit. They only resolve the platform
# label; the AI verdict still comes from the digital-source-type / genAIType.
# Tokens verified against real signed files (2026-05-29):
# Samsung Galaxy -- cert org on Galaxy S23 FE / S24 / S25 C2PA JPEGs/PNGs
# (distinct from the EXIF "SM-xxxx" model string on ordinary Samsung photos).
# com.asus.gallery -- ASUS Gallery claim_generator (a C2PA-signed edit, no AI
# source type or genAIType on the samples, so it never asserts is_ai).
_SIGNER_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = (
(b"Samsung Galaxy", "Samsung Galaxy (C2PA)"),
(b"com.asus.gallery", "ASUS Gallery (C2PA signer)"),
)
def _signer_platform(head: bytes) -> str | None:
"""Map a C2PA editing-app / AI-capable-device signer token to a platform."""
for token, platform in _SIGNER_C2PA_PLATFORM:
if token in head:
return platform
return None
def _attribute_platform(issuers: list[str], *, is_ai: bool = True) -> str | None:
"""Map a set of C2PA issuer names to a human-readable generating platform.
@@ -353,9 +390,10 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
# neither is a trustworthy "the generator stamped its identity" claim.
ai_vendor_claims: dict[str, str] = {}
camera_label = _device_platform(head)
signer_label = _signer_platform(head)
# ── C2PA Content Credentials ────────────────────────────────────
has_c2pa = bool(info) or b"c2pa" in head.lower() or C2PA_UUID in head
has_c2pa = bool(info) or c2pa_marker_in(head)
issuers = [info["issuer"]] if info.get("issuer") else _issuers_in(head)
c2pa_is_ai = "trainedAlgorithmicMedia" in info.get("source_type", "") or any(
m in head for m in (b"trainedAlgorithmicMedia", b"compositeWithTrainedAlgorithmicMedia")
@@ -370,10 +408,11 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
or (", ".join(tools) if (tools := _ai_tools_in(head)) else None)
)
# Platform: a distinctive device/camera token in the manifest wins (it is the
# signer/producer), with the issuer byte-scan only as fallback. The issuer
# scan alone mis-attributed real samples (Leica->Truepic timestamp authority,
# Nikon->Adobe namespace, Pixel->Google Gemini) -- the device scan fixes that.
platform = (camera_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None
# signer/producer), then an editing-app/AI-device signer (Samsung Galaxy,
# ASUS Gallery), with the issuer byte-scan only as fallback. The issuer scan
# alone mis-attributed real samples (Leica->Truepic timestamp authority,
# Nikon->Adobe namespace, Pixel->Google Gemini) -- the token scans fix that.
platform = (camera_label or signer_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None
if has_c2pa:
detail = ", ".join(filter(None, [", ".join(issuers), generator, info.get("source_type")]))
signals.append(Signal("c2pa", detail or "C2PA manifest present", "high"))
@@ -484,6 +523,22 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
if platform is None:
platform = "HuggingFace-hosted job (model not identified)"
# ── Samsung Galaxy AI editing marker (genAIType) ─────────────────
# Galaxy AI tools stamp a proprietary genAIType in PhotoEditor_Re_Edit_Data.
# Medium confidence: it co-occurs with the C2PA trainedAlgorithmicMedia type
# on Galaxy files that record one, and is the SOLE AI marker on a Galaxy S24
# sample that omits the source type -- so it lifts an otherwise-Unknown
# verdict, but the field is undocumented, so it never overrides a high-
# confidence signal. The platform is usually already "Samsung Galaxy" via the
# signer-token scan; the fallback covers a future file without the cert org.
samsung_genai_type = samsung_genai(image_path)
if samsung_genai_type is not None:
signals.append(Signal("samsung_genai", f"Samsung genAIType={samsung_genai_type}", "medium"))
watermarks.append("Samsung Galaxy AI editing marker (genAIType)")
caveats.append(_SAMSUNG_GENAI_CAVEAT)
if platform is None:
platform = "Samsung Galaxy (Galaxy AI editing)"
# ── Open invisible watermark (SD / SDXL / FLUX, dwtDct) ──────────
# Public decoder, no key -- a definitive embedded signal on pristine files.
if check_invisible and (scheme := _invisible_watermark(image_path)) is not None:
@@ -527,11 +582,12 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
visible_only = any(s.name == "visible_sparkle" for s in signals) and not ai_from_metadata
hf_only = bool(hf_job) and not ai_from_metadata
samsung_only = samsung_genai_type is not None and not ai_from_metadata
if ai_from_metadata:
is_ai: bool | None = True
confidence = "high"
elif visible_only or hf_only:
elif visible_only or hf_only or samsung_only:
is_ai = True
confidence = "medium"
else:
+54 -4
View File
@@ -65,6 +65,22 @@ AI_KEYWORDS: tuple[str, ...] = (
# Reference: https://spec.c2pa.org/specifications/specifications/2.1/specs/C2PA_Specification.html
C2PA_UUID: bytes = bytes.fromhex("d8fec3d61b0e483c92975828877ec481")
def c2pa_marker_in(data: bytes) -> bool:
"""True if ``data`` carries a real C2PA manifest marker, not just an
incidental 4-byte ``c2pa`` substring.
A bare ``c2pa`` byte match false-positives on compressed pixel data -- a
recompressed PNG IDAT (or any large binary) can contain the bytes ``c2pa``
by chance (verified 2026-05-29: 4 cleaned PNGs re-flagged this way after
their manifest was correctly stripped). Every real manifest is JUMBF-wrapped
(the ``jumb`` box FourCC accompanies the ``c2pa`` content type) or uses the
standalone C2PA ``uuid`` box in ISOBMFF, so we require one of those: the
joint ``jumb`` + ``c2pa`` match has negligible random-collision probability.
"""
return C2PA_UUID in data or (b"jumb" in data and b"c2pa" in data.lower())
# IPTC ``digitalSourceType`` values (IPTC 2025.1) that flag AI provenance.
# Used by Instagram, Facebook, X (Twitter) to show "Made with AI" labels.
IPTC_AI_MARKERS: tuple[bytes, ...] = (
@@ -213,9 +229,7 @@ def has_ai_metadata(image_path: Path) -> bool:
# Binary scan covers C2PA (PNG caBX, JPEG APP11, AVIF/HEIF/JXL uuid boxes)
# and IPTC AI markers in XMP. First 512KB (plus late ISOBMFF provenance boxes).
data = scan_head(image_path, 512 * 1024)
if b"c2pa" in data.lower() or b"C2PA" in data:
return True
if C2PA_UUID in data:
if c2pa_marker_in(data):
return True
if any(marker in data for marker in AIGC_MARKERS):
return True
@@ -310,6 +324,39 @@ def huggingface_job(image_path: Path) -> str | None:
return None
# Samsung Galaxy AI editing marker. Galaxy AI tools (Generative Edit, Sketch to
# Image, Portrait Studio, Drawing Assist, ...) record their re-edit data as a
# proprietary ``PhotoEditor_Re_Edit_Data`` JSON that carries a ``genAIType``
# field; a non-zero value flags that a generative-AI tool produced or altered
# the pixels. The field is undocumented by Samsung (verified 2026-05-29: absent
# from the C2PA spec and Samsung's public docs/forums), so detection is
# empirical -- on real Galaxy S23/S24/S25 files it co-occurs with the C2PA
# ``trainedAlgorithmicMedia`` source type (3/3 of the verified files that record
# that type), and on a Galaxy S24 sample it is the *only* AI marker (the C2PA
# source type was absent there). Medium confidence: it signals Galaxy AI editing
# without proving the whole image is AI-generated. Scoped to the Samsung editor
# container to avoid matching a stray ``genAIType`` token elsewhere.
_SAMSUNG_GENAI_RE = re.compile(rb'genAIType"\s*:\s*(-?\d+)')
_SAMSUNG_EDITOR_MARKER = b"PhotoEditor_Re_Edit_Data"
def samsung_genai(image_path: Path) -> int | None:
"""Return Samsung's non-zero ``genAIType`` value if the image carries the
Galaxy AI editing marker, else None.
See the module note above ``_SAMSUNG_GENAI_RE``: detection is empirical and
gated on the ``PhotoEditor_Re_Edit_Data`` container so an incidental
``genAIType`` token cannot false-positive.
"""
head = scan_head(image_path, 512 * 1024)
if _SAMSUNG_EDITOR_MARKER not in head:
return None
m = _SAMSUNG_GENAI_RE.search(head)
if m is None:
return None
return int(m.group(1)) or None
def iptc_ai_system(image_path: Path) -> str | None:
"""Return an IPTC 2025.1 AI-disclosure note if the file carries those XMP
properties, else None.
@@ -360,7 +407,7 @@ def synthid_source(image_path: Path) -> str | None:
# C2PA manifest where the PNG parser can't reach it. Binary-scan for the
# same signal: a C2PA manifest from a SynthID-using issuer on AI content.
data = scan_head(image_path)
has_c2pa = b"c2pa" in data.lower() or C2PA_UUID in data
has_c2pa = c2pa_marker_in(data)
# Matches both "trainedAlgorithmicMedia" and "compositeWithTrainedAlgorithmicMedia".
ai_source = b"trainedAlgorithmicMedia" in data or b"TrainedAlgorithmicMedia" in data
if not (has_c2pa and ai_source):
@@ -585,6 +632,9 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
# HuggingFace-hosted job marker (hf-job-id PNG text chunk).
if job := huggingface_job(image_path):
result.setdefault("huggingface_job", f"HuggingFace-hosted job ({job})")
# Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data).
if (genai := samsung_genai(image_path)) is not None:
result.setdefault("samsung_genai", f"Samsung Galaxy AI editing marker (genAIType={genai})")
return result
@@ -88,6 +88,14 @@ C2PA_ISSUERS = {
# Stability AI signs C2PA as "Stability AI" (cert org "Stability AI Ltd").
# Verified on a live Brand Studio (DreamStudio successor) output, 2026-05-24.
b"Stability AI": "Stability AI",
# Black Forest Labs (FLUX) API output: claim_generator_info "Black Forest
# Labs API" + a c2pa.ai_generated_content assertion + trainedAlgorithmicMedia.
# Verified on a real signed FLUX JPEG, 2026-05-29.
b"Black Forest Labs": "Black Forest Labs",
# ByteDance's Volcano Engine (Volcengine) signs its AI image output with a
# cert from certificate_center@volcengine.com -- the platform behind Doubao /
# Jimeng. Verified on two real signed JPEGs, 2026-05-29.
b"volcengine": "ByteDance (Volcano Engine)",
}
# C2PA issuers whose signed outputs also carry an invisible SynthID pixel
+46 -5
View File
@@ -51,12 +51,31 @@ def _decoder() -> Any:
return _tm
# JPEG quality for the false-positive durability gate (see detect_trustmark).
# Deliberately mild: a genuine TrustMark survives far harsher, while every
# observed false positive collapsed even at this quality.
_REENCODE_QUALITY = 95
def detect_trustmark(image_path: Path) -> str | None:
"""Return a TrustMark scheme note if a TrustMark watermark is decoded, else None.
"""Return a TrustMark scheme note if a *durable* TrustMark watermark is
decoded, else None.
Returns e.g. ``"Adobe TrustMark (variant P, schema 0)"`` when the decoder
reports the watermark present, or None if it is absent, the optional
``trustmark`` package is not installed, or the image cannot be read/decoded.
reports the watermark present AND it survives a mild JPEG re-encode, or None
if it is absent, the optional ``trustmark`` package is not installed, or the
image cannot be read/decoded.
**False-positive gate.** TrustMark's ``wm_present`` flag is a BCH
error-correction validity check, which spuriously validates on a small
fraction of un-watermarked images -- content-correlated, so AI-generated
textures trip it more often than camera photos (verified 2026-05-29 on real
files: the false "detections" were on Gemini / OpenAI / Doubao output that
cannot carry Adobe's watermark, and decoded a random-bytes secret). A genuine
TrustMark is a *durable* soft binding engineered to survive re-encoding (that
is its entire purpose once C2PA is stripped), so we re-decode after a mild
JPEG round-trip and require the same schema both times. Every observed false
positive collapsed under this gate.
"""
if not is_available():
return None
@@ -65,8 +84,30 @@ def detect_trustmark(image_path: Path) -> str | None:
with Image.open(image_path) as img:
cover = img.convert("RGB")
_wm_secret, wm_present, wm_schema = _decoder().decode(cover)
decoder = _decoder()
_wm_secret, wm_present, wm_schema = decoder.decode(cover)
if not wm_present:
return None
if not _survives_reencode(decoder, cover, wm_schema):
log.debug("TrustMark decode for %s did not survive re-encode; treating as false positive", image_path)
return None
except Exception as exc: # model download / decode failure / unreadable image
log.debug("TrustMark decode failed for %s: %s", image_path, exc)
return None
return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})" if wm_present else None
return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})"
def _survives_reencode(decoder: Any, cover: Any, schema: int) -> bool:
"""True if the watermark re-decodes with the same schema after a mild JPEG
round-trip -- the durability a genuine TrustMark guarantees, which a BCH
false positive (content noise) does not."""
import io
from PIL import Image
buffer = io.BytesIO()
cover.save(buffer, "JPEG", quality=_REENCODE_QUALITY)
buffer.seek(0)
with Image.open(buffer) as reencoded:
_secret, present, reencoded_schema = decoder.decode(reencoded.convert("RGB"))
return bool(present) and reencoded_schema == schema
@@ -0,0 +1,202 @@
"""Registry of known visible watermarks.
A single catalog that ties each known visible mark to (a) where it usually sits,
(b) how to recognize it there, and (c) how to remove it. One pass over the
registry detects every known mark in its usual place and removes the ones
present.
**Reverse-alpha only.** A known mark is a fixed semi-transparent overlay, so it
is removed by inverting the alpha blend against a captured alpha map
(``original = (wm - a*logo)/(1-a)``) -- exact recovery of the true pixels, not an
inpaint guess. Detection is consistent with that: each mark is recognized by
matching its known shape/template (the thing we invert), not by heuristics. A
mark is therefore listed here only once a real alpha map has been captured for
it; everything else (arbitrary logos/objects) is the user-directed
``erase --region`` tool, not this catalog.
Entries:
- ``gemini`` -- Google Gemini / Nano Banana sparkle, bottom-right.
- ``doubao`` -- ByteDance Doubao "豆包AI生成" text strip, bottom-right.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any, Literal
if TYPE_CHECKING:
from collections.abc import Callable
from numpy.typing import NDArray
# cv2 method for the Gemini reverse-alpha edge-residual cleanup (not a standalone
# remover): "ns" / "telea".
InpaintMethod = Literal["telea", "ns"]
Region = tuple[int, int, int, int]
@dataclass(frozen=True)
class MarkDetection:
"""Uniform detection result for a known mark (across heterogeneous engines)."""
key: str
label: str
location: str
detected: bool
confidence: float
region: Region
@dataclass(frozen=True)
class KnownMark:
"""A known visible watermark: where it lives, how to find and remove it."""
key: str
label: str
location: str # usual place, human-readable ("bottom-right")
in_auto: bool # participate in `--mark auto` scanning
recovery: str # removal strategy (all reverse-alpha today)
_detect: Callable[[NDArray[Any]], MarkDetection]
_remove: Callable[..., tuple[NDArray[Any], Region | None]]
def detect(self, image: NDArray[Any]) -> MarkDetection:
return self._detect(image)
def remove(
self,
image: NDArray[Any],
*,
inpaint_method: InpaintMethod = "ns",
inpaint: bool = True,
inpaint_strength: float = 0.85,
force: bool = False,
) -> tuple[NDArray[Any], Region | None]:
"""Remove this mark by reverse-alpha; returns ``(result, cleared_region)``
(region for clearing alpha on save, or None if nothing was removed).
``inpaint`` / ``inpaint_strength`` / ``inpaint_method`` tune the Gemini
reverse-alpha edge-residual cleanup only. ``force`` removes at the mark's
usual location even without a positive detection (the ``--no-detect`` path).
"""
return self._remove(image, inpaint_method, inpaint, inpaint_strength, force)
# Gemini-sparkle confidence above which the registry treats it as a confident
# detection for arbitration. Matches identify's corpus-validated sparkle
# threshold (0.5): the gemini engine's own detect flag uses a looser internal
# threshold and weakly fires (~0.36) on unrelated bottom-right text (e.g. the
# Doubao mark), which would otherwise let it hijack `--mark auto`. 0.5 gives 0
# false positives on the corpus.
_GEMINI_AUTO_MIN_CONF = 0.5
# ── Engine adapters (lazy singletons; engines are cv2-only, no model load) ──
_engines: dict[str, Any] = {}
def _engine(key: str) -> Any:
if key not in _engines:
if key == "gemini":
from remove_ai_watermarks.gemini_engine import GeminiEngine
_engines[key] = GeminiEngine()
elif key == "doubao":
from remove_ai_watermarks.doubao_engine import DoubaoEngine
_engines[key] = DoubaoEngine()
else: # pragma: no cover - guarded by the registry keys
raise KeyError(key)
return _engines[key]
def _gemini_detect(image: NDArray[Any]) -> MarkDetection:
d = _engine("gemini").detect_watermark(image)
detected = bool(d.detected) and d.confidence >= _GEMINI_AUTO_MIN_CONF
return MarkDetection("gemini", "Google Gemini sparkle", "bottom-right", detected, d.confidence, d.region)
def _gemini_remove(
image: NDArray[Any], inpaint_method: InpaintMethod, inpaint: bool, strength: float, force: bool
) -> tuple[NDArray[Any], Region | None]:
engine = _engine("gemini")
det = engine.detect_watermark(image)
if not det.detected:
if not force:
return image.copy(), None
# Forced (--no-detect): remove at the default sparkle slot for the size.
from remove_ai_watermarks.gemini_engine import get_watermark_config
h, w = image.shape[:2]
cfg = get_watermark_config(w, h)
px, py = cfg.get_position(w, h)
region = (px, py, cfg.logo_size, cfg.logo_size)
result = engine.remove_watermark_custom(image, region)
if inpaint:
result = engine.inpaint_residual(result, region, strength=strength, method=inpaint_method)
return result, region
result = engine.remove_watermark(image)
# Reverse-alpha leaves a faint residual at the sparkle edge; the engine's
# own residual inpaint cleans that seam (part of its reverse-alpha pipeline).
if inpaint:
result = engine.inpaint_residual(result, det.region, strength=strength, method=inpaint_method)
return result, det.region
def _doubao_detect(image: NDArray[Any]) -> MarkDetection:
d = _engine("doubao").detect(image)
return MarkDetection("doubao", "Doubao 豆包AI生成 text", "bottom-right", d.detected, d.confidence, d.region)
def _doubao_remove(
image: NDArray[Any], _inpaint_method: InpaintMethod, _inpaint: bool, _strength: float, force: bool
) -> tuple[NDArray[Any], Region | None]:
# Reverse-alpha only: apply when the mark is present AND the resolution is in
# the alpha map's calibrated band. Outside it we do NOT inpaint (no
# hallucination) -- removal is skipped until a capture for that resolution.
engine = _engine("doubao")
det = engine.detect(image)
if (det.detected or force) and engine.reverse_alpha_available(image):
return engine.remove_watermark_reverse_alpha(image), (det.region if det.detected else None)
return image.copy(), None
_REGISTRY: tuple[KnownMark, ...] = (
KnownMark("gemini", "Google Gemini sparkle", "bottom-right", True, "reverse-alpha", _gemini_detect, _gemini_remove),
KnownMark(
"doubao", "Doubao 豆包AI生成 text", "bottom-right", True, "reverse-alpha", _doubao_detect, _doubao_remove
),
)
def known_marks() -> tuple[KnownMark, ...]:
"""All registered known visible watermarks."""
return _REGISTRY
def mark_keys() -> list[str]:
"""Keys of all registered marks (for CLI choices)."""
return [m.key for m in _REGISTRY]
def get_mark(key: str) -> KnownMark:
"""Look up a known mark by key (raises KeyError if unknown)."""
for m in _REGISTRY:
if m.key == key:
return m
raise KeyError(key)
def detect_marks(image: NDArray[Any], *, include_explicit: bool = True) -> list[MarkDetection]:
"""Detect every known mark in its usual place.
Returns one MarkDetection per scanned mark (``detected`` flags which fired).
``include_explicit=False`` scans only the ``in_auto`` marks -- the set used
by ``--mark auto``.
"""
return [m.detect(image) for m in _REGISTRY if include_explicit or m.in_auto]
def best_auto_mark(image: NDArray[Any]) -> MarkDetection | None:
"""The highest-confidence detected ``in_auto`` mark, or None if none fired."""
fired = [d for d in detect_marks(image, include_explicit=False) if d.detected]
return max(fired, key=lambda d: d.confidence) if fired else None
+113 -48
View File
@@ -1,4 +1,4 @@
"""Tests for the Doubao visible-watermark engine."""
"""Tests for the Doubao visible-watermark engine (reverse-alpha only)."""
from __future__ import annotations
@@ -8,91 +8,156 @@ import cv2
import numpy as np
import pytest
from remove_ai_watermarks.doubao_engine import DoubaoEngine, load_image_bgr
from remove_ai_watermarks.doubao_engine import (
_ALPHA_HEIGHT_FRAC,
_ALPHA_LOGO_BGR,
_ALPHA_MARGIN_BOTTOM_FRAC,
_ALPHA_MARGIN_RIGHT_FRAC,
_ALPHA_NATIVE_WIDTH,
_ALPHA_WIDTH_FRAC,
DETECT_NCC_THRESHOLD,
DoubaoEngine,
_alpha_template,
_glyph_silhouette,
_template_match_score,
load_image_bgr,
)
SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png"
# ── Locate ──────────────────────────────────────────────────────────
class TestLocate:
def test_box_anchored_bottom_right(self):
eng = DoubaoEngine()
img = np.zeros((2048, 2048, 3), np.uint8)
loc = eng.locate(img)
# right and bottom edges sit close to the image corner (within margins)
assert 2048 - (loc.x + loc.w) < int(2048 * 0.03)
assert 2048 - (loc.y + loc.h) < int(2048 * 0.03)
assert loc.is_fallback # geometry anchor, no bundled template yet
def test_box_scales_with_width(self):
eng = DoubaoEngine()
small = eng.locate(np.zeros((1024, 1024, 3), np.uint8))
large = eng.locate(np.zeros((2048, 2048, 3), np.uint8))
# width-relative geometry: 2x wider image -> ~2x wider box
assert large.w == pytest.approx(small.w * 2, rel=0.1)
# ── Detect + remove on the real sample ──────────────────────────────
# ── Detection: alpha-template NCC ───────────────────────────────────
class TestDetect:
def test_clean_gradient_not_detected(self):
eng = DoubaoEngine()
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
assert not eng.detect(img).detected
def test_solid_blob_corner_not_detected(self):
"""A bright blob is not the glyph shape -> low correlation, not detected."""
eng = DoubaoEngine()
img = np.zeros((1024, 1024, 3), np.uint8)
x, y, bw, bh = eng.locate(img).bbox
img[y + bh // 4 : y + bh * 3 // 4, x : x + bw // 2] = 200
assert not eng.detect(img).detected
def test_silhouette_loads(self):
sil = _glyph_silhouette()
assert sil is not None
assert set(np.unique(sil)).issubset({0, 255})
def test_match_score_shape_sensitive(self):
"""The glyph silhouette correlates with itself, not with a filled block."""
sil = _glyph_silhouette()
h, w = sil.shape
# box that contains the silhouette -> high score
box = np.zeros((h + 8, int(w / _ALPHA_WIDTH_FRAC * 0.2) + w), np.uint8)
box[4 : 4 + h, 4 : 4 + w] = sil
assert _template_match_score(box, _ALPHA_NATIVE_WIDTH) >= DETECT_NCC_THRESHOLD
# a uniformly filled box has no glyph structure -> low score
solid = np.full_like(box, 255)
assert _template_match_score(solid, _ALPHA_NATIVE_WIDTH) < DETECT_NCC_THRESHOLD
@pytest.mark.skipif(not SAMPLE.exists(), reason="sample image not present")
class TestRealSample:
def test_detects_watermark(self):
eng = DoubaoEngine()
det = eng.detect(load_image_bgr(SAMPLE))
det = DoubaoEngine().detect(load_image_bgr(SAMPLE))
assert det.detected
assert det.confidence > 0.0
assert det.coverage > 0.04
assert det.confidence >= DETECT_NCC_THRESHOLD
def test_remove_reduces_glyph_coverage(self):
def test_reverse_alpha_removes_mark(self):
eng = DoubaoEngine()
img = load_image_bgr(SAMPLE)
before = eng.detect(img).coverage
out = eng.remove_watermark(img)
after = eng.detect(out).coverage
# the inpaint should clear most glyph pixels from the corner box
assert after < before * 0.5
assert eng.reverse_alpha_available(img) # sample is at the captured width
out = eng.remove_watermark_reverse_alpha(img)
assert not eng.detect(out).detected # mark gone after recovery
def test_pixels_outside_box_untouched(self):
def test_far_region_untouched(self):
eng = DoubaoEngine()
img = load_image_bgr(SAMPLE)
out = eng.remove_watermark(img)
# top-left quadrant is far from the bottom-right mark: must be identical
out = eng.remove_watermark_reverse_alpha(img)
h, w = img.shape[:2]
assert np.array_equal(img[: h // 2, : w // 2], out[: h // 2, : w // 2])
# ── Negative + safety guard ─────────────────────────────────────────
# ── Reverse-alpha (exact recovery) ──────────────────────────────────
class TestNegativeAndGuard:
def test_clean_image_not_detected(self):
class TestReverseAlpha:
def test_alpha_asset_loads(self):
at = _alpha_template()
assert at is not None
assert at.dtype.kind == "f"
assert float(at.min()) >= 0.0
assert float(at.max()) <= 1.0
def test_available_whenever_asset_present(self):
# NCC alignment generalizes to any resolution, so availability is just
# "asset loadable" (any non-empty image); the caller gates on detect.
eng = DoubaoEngine()
# smooth gradient, no watermark
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
det = eng.detect(img)
assert not det.detected
assert eng.reverse_alpha_available(np.zeros((1024, 1024, 3), np.uint8))
assert eng.reverse_alpha_available(np.zeros((1773, 1535, 3), np.uint8))
assert not eng.reverse_alpha_available(np.zeros((0, 0, 3), np.uint8))
def test_clean_image_returned_unchanged(self):
eng = DoubaoEngine()
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
out = eng.remove_watermark(img)
assert np.array_equal(img, out)
@staticmethod
def _compose(w: int, h: int, bg: float = 100.0):
"""Composite the real alpha (scaled to width ``w``) onto a flat bg.
Returns ``(watermarked_uint8, mark_bool_mask)``."""
img = np.full((h, w, 3), bg, np.float32)
at = _alpha_template()
gw, gh = int(_ALPHA_WIDTH_FRAC * w), int(_ALPHA_HEIGHT_FRAC * w)
ax = w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw
ay = h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh
amap = np.zeros((h, w), np.float32)
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh))
a3 = amap[:, :, None]
wm = (a3 * np.array(_ALPHA_LOGO_BGR, np.float32) + (1 - a3) * img).clip(0, 255).astype(np.uint8)
return wm, amap > 0.2
def test_document_background_guard(self):
"""A dense high-frequency corner (document-like) trips the coverage
guard, so the image is left untouched rather than smeared."""
def test_native_returns_exact_reverse_alpha_no_inpaint(self):
"""At native width the recovery is exact, so it must be returned untouched
-- inpainting over exactly-recovered interior pixels degrades quality
(regression: native textured error 1.6 reverse-alpha-only vs 2.6 with the
old full-footprint inpaint). The output must equal pure reverse-alpha."""
eng = DoubaoEngine()
rng = np.random.default_rng(0)
img = np.full((1024, 1024, 3), 255, np.uint8)
# fill the bottom-right box area with random grayish text-like noise
loc = eng.locate(img)
x, y, bw, bh = loc.bbox
noise = rng.integers(150, 246, size=(bh, bw), dtype=np.uint8)
img[y : y + bh, x : x + bw] = noise[:, :, None]
out = eng.remove_watermark(img)
assert np.array_equal(img, out)
wm, _mark = self._compose(_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH)
out = eng.remove_watermark_reverse_alpha(wm)
amap = eng._fixed_alpha_map(wm)
assert amap is not None
expected = eng._apply_reverse_alpha(wm, amap[0])
assert np.array_equal(out, expected) # no inpaint touched the recovery
@pytest.mark.parametrize(
("w", "h", "max_err"),
[
(_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH, 5.0), # native 1:1 -> fixed geometry, ~exact
(1773, 2364, 8.0), # 3:4 portrait -> NCC alignment generalizes the single capture
],
)
def test_recovers_flat_background(self, w, h, max_err):
"""Recovers the flat background at native (fixed geometry, exact) AND a
non-native resolution (NCC alignment generalizes the single capture)."""
eng = DoubaoEngine()
wm, mark = self._compose(w, h)
assert float(np.abs(wm.astype(np.float32)[mark] - 100.0).mean()) > 15 # mark visible
out = eng.remove_watermark_reverse_alpha(wm).astype(np.float32)
assert float(np.abs(out[mark] - 100.0).mean()) < max_err
+56
View File
@@ -113,6 +113,18 @@ class TestIdentifyNonPng:
r = identify(path, check_visible=False)
assert any("SynthID" in w for w in r.watermarks)
def test_black_forest_labs_flux_attributed(self, tmp_path: Path):
path = self._c2pa_jpeg(tmp_path, b"Black Forest Labs API ... trainedAlgorithmicMedia")
r = identify(path, check_visible=False, check_invisible=False)
assert r.is_ai_generated is True
assert r.platform == "Black Forest Labs (FLUX)"
def test_bytedance_volcengine_attributed(self, tmp_path: Path):
path = self._c2pa_jpeg(tmp_path, b"certificate_center@volcengine.com ... trainedAlgorithmicMedia")
r = identify(path, check_visible=False, check_invisible=False)
assert r.is_ai_generated is True
assert "ByteDance" in (r.platform or "")
def test_stability_ai_issuer_attributed_no_synthid(self, tmp_path: Path):
path = self._c2pa_jpeg(tmp_path, b"Stability AI ... trainedAlgorithmicMedia")
r = identify(path, check_visible=False)
@@ -132,6 +144,50 @@ class TestIdentifyNonPng:
assert not any("SynthID" in w for w in r.watermarks)
class TestIdentifySamsungGalaxy:
"""Samsung Galaxy / ASUS Gallery C2PA signers (verified on real signed files
2026-05-29; synthetic byte blobs here since the originals are private).
Galaxy AI edits stamp BOTH the device cert AND an AI source-type / genAIType,
so the signer attribution must NOT trip the camera-vs-AI integrity clash.
"""
def _jpeg(self, tmp_path: Path, name: str, blob: bytes) -> Path:
path = tmp_path / name
path.write_bytes(b"\xff\xd8\xff\xe1jumbc2pa" + blob + b"\xff\xd9")
return path
def test_galaxy_trained_source_is_high_ai(self, tmp_path: Path):
path = self._jpeg(tmp_path, "s25.jpg", b"Samsung Galaxy Galaxy S25 c2pa-rs trainedAlgorithmicMedia")
r = identify(path, check_visible=False, check_invisible=False)
assert r.is_ai_generated is True
assert r.confidence == "high"
assert r.platform == "Samsung Galaxy (C2PA)"
assert r.integrity_clashes == [] # device cert + AI source-type is legitimate, not a clash
def test_galaxy_genai_only_is_medium_ai(self, tmp_path: Path):
# The Galaxy S24 case: no trainedAlgorithmicMedia, genAIType is the only
# AI marker -- previously missed, now a medium-confidence verdict.
path = self._jpeg(
tmp_path, "s24.jpg", b'Samsung Galaxy Galaxy S24 c2pa-rs PhotoEditor_Re_Edit_Data{"genAIType":1}'
)
r = identify(path, check_visible=False, check_invisible=False)
assert r.is_ai_generated is True
assert r.confidence == "medium"
assert r.platform == "Samsung Galaxy (C2PA)"
assert any(s.name == "samsung_genai" for s in r.signals)
assert r.integrity_clashes == []
def test_asus_gallery_signer_not_ai(self, tmp_path: Path):
# ASUS Gallery signs edited photos; no AI source-type or genAIType, so the
# platform is attributed but the verdict stays unknown.
path = self._jpeg(tmp_path, "asus.jpg", b"/com.asus.gallery/3.8.0.98 c2pa-rs no ai marker")
r = identify(path, check_visible=False, check_invisible=False)
assert r.is_ai_generated is None
assert r.platform == "ASUS Gallery (C2PA signer)"
assert any("C2PA" in w for w in r.watermarks)
# ── End-to-end verdicts on real fixtures ────────────────────────────
+68
View File
@@ -12,12 +12,15 @@ from PIL import Image
from PIL.PngImagePlugin import PngInfo
from remove_ai_watermarks.metadata import (
C2PA_UUID,
_is_ai_key,
c2pa_marker_in,
exif_generator,
get_ai_metadata,
has_ai_metadata,
iptc_ai_system,
remove_ai_metadata,
samsung_genai,
synthid_source,
xai_signature,
)
@@ -135,6 +138,71 @@ class TestHasAiMetadata:
assert has_ai_metadata(path)
class TestC2paMarkerIn:
"""The C2PA presence check requires a JUMBF wrapper or the C2PA uuid box, so
a bare 4-byte ``c2pa`` substring (e.g. random compressed pixel data) does not
false-positive -- the regression behind 4 cleaned PNGs re-flagging C2PA."""
def test_jumbf_wrapped_c2pa_detected(self):
assert c2pa_marker_in(b"....jumbc2pa....manifest....") is True
def test_c2pa_uuid_box_detected(self):
assert c2pa_marker_in(b"\x00\x00\x00\x18uuid" + C2PA_UUID + b"payload") is True
def test_bare_c2pa_substring_not_detected(self):
# The exact false positive: "c2pa" appears in noise but no JUMBF/uuid box.
assert c2pa_marker_in(b"\x9c\xc3\xa7B1\x11c2pa\x80b\x804\xc5\xf9random idat") is False
def test_jumb_without_c2pa_not_detected(self):
assert c2pa_marker_in(b"some jumb box but no manifest label") is False
def test_empty_not_detected(self):
assert c2pa_marker_in(b"") is False
class TestSamsungGenai:
"""Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data).
Synthetic byte blobs -- real Galaxy files are user content and not shipped
(public repo), same discipline as the Grok/Doubao fixtures.
"""
@staticmethod
def _samsung_jpeg(tmp_path: Path, name: str, payload: bytes) -> Path:
path = tmp_path / name
path.write_bytes(b"\xff\xd8\xff\xe1" + payload + b"\xff\xd9")
return path
def test_nonzero_genai_type_detected(self, tmp_path: Path):
p = self._samsung_jpeg(
tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"connectorType":"srvg","genAIType":1}'
)
assert samsung_genai(p) == 1
def test_other_nonzero_value_detected(self, tmp_path: Path):
p = self._samsung_jpeg(tmp_path, "galaxy5.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":5}')
assert samsung_genai(p) == 5
def test_zero_genai_type_is_none(self, tmp_path: Path):
"""genAIType:0 means no generative AI was used -- not a positive signal."""
p = self._samsung_jpeg(tmp_path, "edit.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":0}')
assert samsung_genai(p) is None
def test_genai_without_editor_container_ignored(self, tmp_path: Path):
"""An incidental genAIType token outside Samsung's editor JSON is ignored."""
p = self._samsung_jpeg(tmp_path, "stray.jpg", b'some other blob "genAIType":1 elsewhere')
assert samsung_genai(p) is None
def test_clean_image_is_none(self, tmp_clean_png):
assert samsung_genai(tmp_clean_png) is None
def test_surfaced_in_get_ai_metadata(self, tmp_path: Path):
p = self._samsung_jpeg(tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":1}')
meta = get_ai_metadata(p)
assert "samsung_genai" in meta
assert "genAIType=1" in meta["samsung_genai"]
class TestGetAiMetadata:
"""Tests for extracting AI metadata."""
+53
View File
@@ -12,12 +12,28 @@ from typing import TYPE_CHECKING
import pytest
from remove_ai_watermarks import trustmark_detector
from remove_ai_watermarks.trustmark_detector import detect_trustmark, is_available
if TYPE_CHECKING:
from pathlib import Path
class _FakeDecoder:
"""A TrustMark decoder whose successive ``decode`` calls return scripted
``(secret, present, schema)`` tuples -- the first for the original image, the
second for the re-encoded copy used by the false-positive durability gate."""
def __init__(self, *results: tuple[bytes, bool, int]):
self._results = list(results)
self.calls = 0
def decode(self, _img: object) -> tuple[bytes, bool, int]:
result = self._results[min(self.calls, len(self._results) - 1)]
self.calls += 1
return result
def test_detect_never_raises(tmp_clean_png: Path):
# Whether or not trustmark is installed, a clean image must yield None
# (no watermark) without raising. When absent, the import guard returns None.
@@ -34,3 +50,40 @@ def test_unreadable_file_returns_none(tmp_path: Path):
def test_clean_image_reports_no_watermark(tmp_clean_png: Path):
# With the decoder present, an un-watermarked image must report absent.
assert detect_trustmark(tmp_clean_png) is None
class TestFalsePositiveGate:
"""The re-encode durability gate keeps real (durable) TrustMarks and drops
BCH false positives that collapse under a mild JPEG round-trip."""
@pytest.fixture(autouse=True)
def _force_available(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr(trustmark_detector, "is_available", lambda: True)
def _patch_decoder(self, monkeypatch: pytest.MonkeyPatch, decoder: _FakeDecoder) -> None:
monkeypatch.setattr(trustmark_detector, "_decoder", lambda: decoder)
def test_durable_watermark_survives_and_is_reported(self, monkeypatch, tmp_clean_png: Path):
decoder = _FakeDecoder((b"secret", True, 2), (b"secret", True, 2))
self._patch_decoder(monkeypatch, decoder)
result = detect_trustmark(tmp_clean_png)
assert result == "Adobe TrustMark (variant P, schema 2)"
assert decoder.calls == 2 # original + re-encode
def test_false_positive_collapsing_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path):
# Present on the original, absent after re-encode -> content-noise FP.
decoder = _FakeDecoder((b"\x00\x01", True, 3), (b"", False, -1))
self._patch_decoder(monkeypatch, decoder)
assert detect_trustmark(tmp_clean_png) is None
def test_schema_drift_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path):
# Present both times but the schema changes -> not a stable watermark.
decoder = _FakeDecoder((b"\x00", True, 2), (b"\x00", True, 3))
self._patch_decoder(monkeypatch, decoder)
assert detect_trustmark(tmp_clean_png) is None
def test_absent_skips_reencode(self, monkeypatch, tmp_clean_png: Path):
decoder = _FakeDecoder((b"", False, -1))
self._patch_decoder(monkeypatch, decoder)
assert detect_trustmark(tmp_clean_png) is None
assert decoder.calls == 1 # no second decode when the first is absent
+70
View File
@@ -0,0 +1,70 @@
"""Tests for the known-visible-watermark registry (reverse-alpha only)."""
from __future__ import annotations
from pathlib import Path
import numpy as np
import pytest
from remove_ai_watermarks import watermark_registry as reg
DOUBAO_SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png"
class TestCatalog:
def test_keys(self):
assert reg.mark_keys() == ["gemini", "doubao"]
def test_all_in_auto(self):
assert all(m.in_auto for m in reg.known_marks())
def test_recovery_is_reverse_alpha(self):
# Every catalogued mark is removed by exact reverse-alpha (no inpaint).
assert all(m.recovery == "reverse-alpha" for m in reg.known_marks())
def test_locations(self):
by_key = {m.key: m for m in reg.known_marks()}
assert by_key["gemini"].location == "bottom-right"
assert by_key["doubao"].location == "bottom-right"
def test_get_mark_unknown_raises(self):
with pytest.raises(KeyError):
reg.get_mark("nope")
class TestScan:
def test_detect_marks_scans_all(self):
img = np.zeros((256, 256, 3), np.uint8)
keys = {d.key for d in reg.detect_marks(img)}
assert keys == {"gemini", "doubao"}
def test_blank_image_no_auto_mark(self):
assert reg.best_auto_mark(np.zeros((256, 256, 3), np.uint8)) is None
@pytest.mark.skipif(not DOUBAO_SAMPLE.exists(), reason="doubao sample not present")
class TestRealSample:
def test_doubao_sample_wins_auto(self):
from remove_ai_watermarks.image_io import imread
best = reg.best_auto_mark(imread(DOUBAO_SAMPLE))
assert best is not None
assert best.key == "doubao"
def test_doubao_remove_returns_region(self):
from remove_ai_watermarks.image_io import imread
img = imread(DOUBAO_SAMPLE) # 2048 wide -> reverse-alpha applies
result, region = reg.get_mark("doubao").remove(img)
assert region is not None
assert result.shape == img.shape
class TestReverseAlphaOnly:
def test_doubao_off_resolution_is_skipped(self):
# No alpha capture for this width -> no inpaint fallback, image untouched.
img = np.zeros((512, 512, 3), np.uint8)
result, region = reg.get_mark("doubao").remove(img)
assert region is None
assert np.array_equal(result, img)