Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28)

* fix(trustmark): gate detection on re-encode durability to kill false positives TrustMark's wm_present flag is a BCH validity check that spuriously validates on a content-correlated fraction of un-watermarked images (AI textures trip it more than camera photos). On a 1343-image set all 20 raw detections were false, several on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with random-bytes secrets. A genuine TrustMark is a durable soft binding that survives re-encoding, so detect_trustmark now re-decodes after a mild JPEG round-trip and requires the same schema both times. Every observed false positive collapsed under this gate; the second decode runs only on the rare hit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP Detection extensions verified on real signed files (2026-05-29): - Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type) does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai: the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium- confidence AI-editing signal (samsung_only branch). - Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng) added as C2PA issuers + issuer->platform mappings. - fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify + metadata. Verified: all 535 real C2PA files carry jumb. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): gate detection on text structure to cut ~95% of false positives (#23) Coverage alone over-fired: any textured bottom-right corner cleared the threshold, so the detector false-positived on ~28% of arbitrary images. The real '豆包AI生成' mark is six glyphs in one row, so detect now also requires the text-structure signature (_glyph_structure): many connected components, no single dominant blob, concentration in a thin horizontal band. False positives dropped 343 -> 17 across the corpus while keeping real-mark recall and the doubao-1.png sample. Also accept a no-op force kwarg for remover-interface symmetry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(samsung): add Samsung Galaxy AI visible-badge remover New samsung_engine.py removes the bottom-left sparkle + localized 'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates on a band + left-anchor signature (the Doubao CJK-component gate does not transfer: Latin badge letters connect into few blobs). Explicit-only -- tuned on few real badges with a ~4% FP floor, so it is not used in auto. Synthetic byte-blob fixtures (real badges are user content, not shipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(visible): unified known-watermark registry + LaMa inpaint backend watermark_registry.py is a single catalog of known visible marks, each tying {usual location, in_auto flag, recovery strategy, detect adapter, remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone. Cross-engine confidences are not comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is loose and weakly fired ~0.36 on Doubao text, hijacking auto). --backend auto|cv2|lama chooses background reconstruction for the mask-based marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges behind). cv2 stays the zero-dependency fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): exact reverse-alpha removal from captured alpha map The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so given its alpha map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a) -- no inpaint hallucination. The alpha map + logo colour were solved from real black+gray Doubao captures on a controlled background: on black captured = a*logo, and the black/gray pair solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white); the white capture cross-validates (mark vanishes to a flat fill). Bundled as assets/doubao_alpha.png + geometry constants. remove_watermark_reverse_alpha applies it scaled to image width; exact at the captured width, so the registry routes doubao through it only when reverse_alpha_available (width within the calibrated band) and the mark is detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions to widen exact coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection Per the principle that we only remove/detect what we can do exactly, the visible-mark path is now reverse-alpha only: - Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4) -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP 7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate. - Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove, _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao outside the alpha resolution band is skipped, never inpainted. - Removed samsung_engine.py + tests + --mark samsung (no alpha map captured; Samsung C2PA/genAIType metadata detection in identify is unaffected). - The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region inpainting stays a user-directed tool, separate from the known-mark registry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution A pure width-scale of the captured alpha map is only sub-pixel-accurate at the captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED scale+position search (_aligned_alpha_map) before inverting the blend, so the single 2048 capture works at any resolution -- verified clean on the 1773x2364 (3:4) corpus size, the biggest coverage gap (23 files). reverse_alpha_available is now just 'asset present' (no width band); the registry still gates removal on detect so a clean corner is never touched. Drops the _ALPHA_WIDTH_TOLERANCE gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): keep native recovery exact -- fixed geometry at captured width Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39). remove_watermark_reverse_alpha now uses exact width-relative geometry within _ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best of both: native back to 0.94, other resolutions still aligned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56) On a faint/busy-background mark the NCC alignment peak can wander a few px off the true mark and leave a residual (2/56 real corpus files). Off the captured width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the NCC-aligned alpha map, applies each, and keeps whichever leaves the least residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip now removes 56/56 detected Doubao clean across every corpus resolution (was 54). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(doubao): skip residual inpaint at native width for exact recovery At the captured width the fixed-geometry reverse-alpha is pixel-exact, so inpainting over it only replaced exactly-recovered interior pixels with a cv2 hallucination -- measured worse on a textured background (native error vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint). Native now returns the bare recovery untouched; off-native, where NCC alignment is only sub-pixel-approximate, the footprint inpaint stays to clean the seam. Real round-trip still 56/56 across all corpus resolutions; negatives 0/60, Gemini unaffected. Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression guard. Sync CLAUDE.md + README (the table cell and prose described the pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the session scheduled_tasks.lock, and add the text-protection research note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-20 14:40:52 +02:00 · 2026-05-29 19:49:09 -07:00
parent ef6fdaeeec
commit 58bdf51c59
17 changed files with 1148 additions and 266 deletions
@@ -34,6 +34,7 @@ yolov8n.pt

 # Claude Code local settings
 .claude/settings.local.json
+.claude/scheduled_tasks.lock

 # Doubao watermark calibration (local only; ship only the derived alpha-map asset).
 # Synthetic seeds + raw Doubao captures are regenerable and not committed.
@@ -17,7 +17,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu

 ## Features

- **Visible watermark removal** — Gemini / Nano Banana sparkle logo (reverse alpha blending) and the Doubao "豆包AI生成" text strip (locate + mask + inpaint); fast, offline, deterministic, no GPU. `visible --mark auto` picks the right one
+- **Visible watermark removal** — a registry of known marks in their usual places: the Gemini / Nano Banana sparkle and the Doubao "豆包AI生成" text strip. Each is removed by **exact reverse-alpha blending** against a captured alpha map (`original = (wm − α·logo)/(1−α)`), recovering the true pixels rather than inpainting a guess. Fast, offline, no GPU. `visible --mark auto` finds and removes the strongest detected mark. (For arbitrary logos/objects, see `erase`.)
 - **Universal region eraser (`erase`)** — remove any logo / watermark / object inside boxes you specify, regardless of position or colour. Default cv2 inpainting (CPU, instant); optional big-LaMa via onnxruntime (`lama` extra) for higher quality
 - **Invisible watermark removal** — SynthID, StableSignature, TreeRing via diffusion-based regeneration (needs a local GPU, or run it with no setup on [raiw.cc](https://raiw.cc))
 - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
@@ -49,7 +49,9 @@ If this tool saves you time, consider [sponsoring its development](https://githu
 | **xAI Grok (Aurora)** | — | — | ✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist` | Detected (`identify`); metadata strip |
 | **Midjourney** | — | — | ✅ EXIF + XMP (prompt, model, seed) | Metadata strip |
 | **Meta AI** | — | — | ✅ IPTC "Made with AI" (digitalSourceType) | Metadata strip (removes the label) |
-| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label — `<TC260:AIGC>` XMP **or** `AIGC` PNG chunk (China's mandatory AI labeling) | Locate + mask + inpaint (cv2, CPU) + metadata strip |
+| **Doubao** (ByteDance) / China AIGC generators | ✅ "豆包AI生成" text strip (bottom-right) | — | ✅ TC260 AIGC label (`<TC260:AIGC>` XMP **or** `AIGC` PNG chunk) **+ C2PA** signed by ByteDance Volcano Engine (`volcengine`) | Exact reverse-alpha (captured α map): pixel-exact at native width, NCC-aligned at other resolutions, + metadata strip |
+| **Samsung Galaxy AI** (Generative Edit, Sketch to Image, ...) | — | — | ✅ C2PA (signer "Samsung Galaxy") + `trainedAlgorithmicMedia` / proprietary `genAIType` marker | Detected (`identify`) + metadata strip |
+| **Black Forest Labs** (FLUX API) | — | — | ✅ C2PA (`Black Forest Labs API` + `c2pa.ai_generated_content` + `trainedAlgorithmicMedia`) | Metadata strip |
 | **StableSignature** (Meta) | — | ✅ In-model watermark | — | Diffusion regeneration |
 | **TreeRing** | — | ✅ Latent space watermark | — | Diffusion regeneration |

@@ -79,9 +81,9 @@ A three-stage NCC (Normalized Cross-Correlation) detector finds the watermark po

 ### Removing the Doubao "豆包AI生成" text watermark

-Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. Unlike the fixed-size Gemini sparkle, it is a text strip that scales with image width, so we anchor a generous bottom-right box by geometry, extract the light low-saturation glyph pixels with a polarity-aware white top-hat mask, and inpaint them (cv2 Telea/NS). The mask is background-relative, so it leaves white-paper documents untouched instead of smearing their text. On dense-text backgrounds where the mask would explode, removal is skipped rather than guessed.
+Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. It is a fixed semi-transparent white overlay, so — like the Gemini sparkle — it is removed by **exact reverse-alpha blending**: `original = (watermarked - α·logo) / (1 - α)`, recovering the true pixels instead of hallucinating them. The α map and logo colour were solved from controlled black + gray captures (on black, `captured = α·logo`; the black/gray pair solves α per-pixel). At the captured width the placement is exact, so the recovery is returned untouched (inpainting over exactly-recovered pixels only degrades them). The single capture generalizes to any resolution: off the captured width an NCC scale-and-position search registers the α template to the actual mark, and a light residual inpaint cleans the sub-pixel seam there. Detection is consistent with removal: it matches the same alpha glyph silhouette against the corner (normalized correlation), so it keys on the actual "豆包AI生成" shape, not on textured corners.

-**Speed**: ~0.03s per image. No GPU needed. Best on photo / illustration backgrounds; on high-contrast edges a faint residue can remain (use `erase --backend lama` for neural-quality fill).
+**Speed**: ~0.05s, no GPU needed. Reverse-alpha at the captured resolution recovers the true background pixels exactly.

 ### Universal region eraser

@@ -237,9 +239,9 @@ remove-ai-watermarks batch ./images/ --mode all
 # of a clean origin. Add --json for machine-readable output.
 remove-ai-watermarks identify image.png

-# Visible watermark only — fast, offline, CPU. --mark auto (default) picks
-# between the Gemini sparkle and the Doubao "豆包AI生成" text strip; force one
-# with --mark gemini / --mark doubao.
+# Visible watermark only — fast, offline, CPU. --mark auto (default) finds the
+# strongest known mark (Gemini sparkle / Doubao "豆包AI生成" text); force one
+# with --mark gemini / doubao. Removed by exact reverse-alpha (true-pixel recovery).
 remove-ai-watermarks visible image.png -o clean.png

 # Erase arbitrary region(s) — universal, any logo/watermark/object, any position.
@@ -329,7 +331,7 @@ Tracked but not yet implemented:
 - **Real non-PNG C2PA fixtures**. SynthID-source detection for JPEG / WebP / AVIF is currently covered only by synthetic byte blobs; replace with real vendor-emitted files to ground the binary-scan path.
 - **Maintenance debt**. Strict pyright is now clean across `src/` (0 errors): pure-logic files are fully typed, the cv2 / torch / diffusers boundary files carry a documented per-file relax pragma, and a local `typings/piexif` stub covers piexif. Remaining: full-project `pyright` (no path) still OOMs node on this ML-heavy repo, so it must be scoped to `src/`; narrowing the boundary pragmas back toward full strict (as upstream stubs improve) is the long tail. (`uv-secure` is already clean since `idna` was bumped to 3.16.)
 - **AVIF / HEIF `Exif` item inside the `meta` box**. An AI-label *XMP* packet in a `meta`-box item is now blanked in place (v0.6.9), but EXIF stored as a `meta`-box `Exif` *item* is still not removed — it needs full `iinf`/`iloc` surgery (offset rewrite, corruption risk) or `exiftool` (a non-bundled binary dependency). Low priority: the AI labels we target are XMP, not EXIF, so an EXIF-only meta-box case is rare.
- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic are mapped (each verified against a real signed file). Canon and Samsung Galaxy (AI-edit) are deferred until a real signed sample surfaces — no public direct-download C2PA file exists for them today (upload-to-verify / news-agency-licensed only).
+- **More C2PA device signers**. Leica, Nikon, Google Pixel, Sony, and Truepic capture cameras are mapped (each verified against a real signed file); **Samsung Galaxy AI**, **Black Forest Labs (FLUX)**, and **ByteDance Volcano Engine** (Doubao / Jimeng) are now attributed too (verified on real signed files). Canon is still deferred until a real signed sample surfaces — no public direct-download C2PA file exists for it today (upload-to-verify / news-agency-licensed only).
 - **Resemble PerTh audio detection** — evaluated, not feasible with the public API: `get_watermark()` returns a raw bit array with no presence/confidence flag, so watermarked vs. clean audio can't be reliably separated without Resemble's fixed payload or a confidence service. Same wall as the SynthID pixel detector.
 - **Video pipeline (`noai-video`)**: per-frame inpainting and tracking for Sora 2 dynamic logo, Veo 3.1 badge, Kling, Runway. Separate package, not folded into this repo.

@@ -0,0 +1,138 @@
+# Text protection research: crisp text under a "watermark removed everywhere" constraint
+
+Date: 2026-05-29. Source: a deep-research run (104 agents, 5 search angles, sources
+fetched and 3-vote adversarially verified). Not committed automatically — saved as a
+research note for the next session.
+
+## The constraint that frames everything
+
+The invisible watermark (Google SynthID) must be removed **everywhere, including inside
+text regions**. Therefore any technique that keeps or composites the **original
+(watermarked) text pixels** is disqualified — the text must be *regenerated / freshly
+synthesized* enough to scrub the watermark, yet rendered crisply. This single rule is the
+filter applied to every candidate below.
+
+## Problem recap
+
+The `invisible` pipeline is SDXL base 1.0 img2img at low strength (~0.05) to defeat
+SynthID with minimal visible change. Text is protected via Differential Diffusion with a
+per-pixel change map (`preserve` ~0.9) driven by the PP-OCRv3 DB detector
+(`text_protector.py`). Large text survives; **small text (sub ~8 px strokes) softens or
+garbles** (issue #14, confirmed on real content).
+
+## Executive summary
+
+The fine-text softening is an **architectural consequence of latent-space processing, not
+a tuning problem**: SDXL's 4-channel VAE (~48x compression) discards high-frequency signal
+on encode, and Differential Diffusion blends in latent space with the change map
+downsampled by 8x, so any stroke under ~8 px sits inside one latent cell and cannot be
+preserved or edited cleanly **regardless of `preserve`** (the Differential Diffusion
+authors state this limit explicitly). Two structurally sound directions keep the
+"watermark removed everywhere" guarantee because they **synthesize fresh glyph pixels**
+rather than compositing originals: (1) glyph/text-conditioned diffusion re-render of
+detected text (AnyText2, EasyText), and (2) a two-stage architecture — global scrub, then
+a dedicated text-restoration / text-aware super-resolution pass over detected regions
+(TIGER, TextSR, TeReDiff/TAIR). **EasyText** and **TextSR** are the most promising for this
+CJK-first pipeline (both multilingual via DiT/ByT5, both regenerate from glyph or
+character-shape priors). The deepest fix — a 16-channel (SD3/FLUX) VAE — materially reduces
+the softening but means switching the base model, not a drop-in VAE swap.
+
+## Constraint reconciliation (important)
+
+The generic research "quick win: bump `preserve` toward 1.0" is **invalid under our hard
+constraint**: raising `preserve` freezes the text region, so SynthID there is **not
+scrubbed**. Likewise, pixel paste-back of the original text is disqualified. The only
+constraint-compatible quick win is **higher resolution / tiled diffusion** (strokes span
+more latent cells, less VAE softening, while the text is still fully regenerated and thus
+scrubbed). The real answer is **regenerate text crisply**, not freeze it.
+
+## Findings (with confidence and sources)
+
+### Finding 1 — confidence: high
+
+**Claim.** The small-text softening is an architectural latent-space limit, not a tuning issue. SDXL's VAE compressively encodes (losing exact color and fine detail on every round-trip), and Differential Diffusion blends in latent space with the change map downsampled to latent resolution (8x), so the method explicitly caps edit/preserve granularity at ~8 px under SD settings. Text strokes below one latent cell cannot be cleanly preserved even at preserve ~0.9.
+
+**Evidence.** Differential Diffusion's paper states a "cap on the resolution of the change map ... can limit the ability to precisely edit small objects (less than 8 pixels for Stable-Diffusion's settings)"; the official SDXL pipeline downsamples the map by `vae_scale_factor=8` and blends `latents = original*mask + latents*(1-mask)` in latent space. The VAE encode is "compressive ... exact color qualities and exact visual fine-details are lost." arXiv:2512.05198 confirms "resizing the pixel mask to latent resolution discards fine structure ... downsamples by 1/8" and that linear latent blending "cannot be pixel-equivalent." Higher compression = more high-frequency loss (arXiv:2305.02541).
+
+**Sources.** https://onlinelibrary.wiley.com/doi/10.1111/cgf.70040 · https://differential-diffusion.github.io/ · https://github.com/exx8/differential-diffusion · https://arxiv.org/abs/2512.05198 · https://omriavrahami.com/blended-latent-diffusion-page/ · https://arxiv.org/pdf/2305.02541
+
+### Finding 2 — confidence: low (do not build on it yet)
+
+**Claim.** Pixel-space differential / blended-latent variants exist as a research direction, but the specific full-resolution-mask solution (PELC/DecFormer, arXiv:2512.05198) was NOT verified to deliver its claimed seam/edge improvements.
+
+**Evidence.** arXiv:2512.05198 argues linear latent blending is not pixel-equivalent and proposes decoder-equivariant compositing; PixPerfect (arXiv:2512.03247) does pixel-space refinement of chromatic shifts at edit boundaries. But the specific PELC full-resolution-mask and DecFormer "53% error reduction" claims were **refuted on adversarial vote (0-3 and 1-2)**. Treat pixel-equivalent latent compositing as an emerging idea to watch, not a production fix.
+
+**Sources.** https://arxiv.org/abs/2512.05198 · https://arxiv.org/abs/2512.03247
+
+### Finding 3 — confidence: high
+
+**Claim.** Glyph/text-conditioned diffusion can re-render detected text as freshly synthesized pixels (not copied), which inherently scrubs any watermark in the text region while rendering glyphs crisply. AnyText/AnyText2 inject text-rendering into a pretrained T2I model and support generation AND editing of existing scene images; multilingual including CJK and English.
+
+**Evidence.** AnyText2 "enables precise control over multilingual text attributes in natural scene image generation and editing" (WriteNet+AttnX); +3.3% (Chinese) / +9.3% (English) accuracy over AnyText v1. AnyText "can be plugged into existing diffusion models ... for rendering or editing text" and synthesizes text latent features through diffusion (fresh pixels), supporting zh/en/ja/ko/ar/bn/hi. **Caveat:** both are SD1.5-based, so NOT a drop-in into the SDXL scrub (separate base model); AnyText's own limitation: "the inpainting manner ... impedes editing quality on small text," and it ranks weak on STRICT (EMNLP 2025) — small-text crispness not guaranteed.
+
+**Sources.** https://github.com/tyxsspa/AnyText2 · https://arxiv.org/abs/2411.15245 · https://arxiv.org/abs/2311.03054
+
+### Finding 4 — confidence: high
+
+**Claim.** EasyText is a strong glyph-conditioned re-render candidate: built on the FLUX-dev DiT framework with LoRA tuning, renders compact per-character glyph patches (64px-high adaptive for alphabetic, 64x64 for logographic) concatenated in latent space, supports 10+ languages including Chinese, Japanese, Korean, Thai, Vietnamese, Greek, and Latin.
+
+**Evidence.** AAAI 2025 + arXiv:2505.24417: "implemented based on the open-source FLUX-dev framework with LoRA-based parameter-efficient tuning," VAE and text encoder frozen, two-stage 512->1024 training. Glyph conditioning via "64-pixel-high images ... adaptive widths for alphabetic; fixed 64x64 for logographic," VAE-encoded and concatenated with denoised latents, "less than one-tenth the spatial size of layout-matching methods." FLUX-based (16-channel VAE, DiT) also sidesteps the SDXL 4-channel wall. Fresh-pixel generation preserves the watermark-removal guarantee. Cyrillic/Arabic crispness not separately benchmarked.
+
+**Sources.** https://arxiv.org/html/2505.24417 · https://ojs.aaai.org/index.php/AAAI/article/view/37697
+
+### Finding 5 — confidence: high
+
+**Claim.** A two-stage "global watermark scrub then text-restoration pass" architecture is validated by recent literature, and the restoration stage can synthesize glyph pixels from priors (no original-pixel reintroduction). TIGER reconstructs stroke geometry then injects it as guidance into full-image super-resolution; TextSR uses a detector + multilingual OCR to regenerate text from character-shape priors; TeReDiff/TAIR couples a jointly-trained text-spotter with diffusion.
+
+**Evidence.** TIGER (arXiv:2510.21590): "a diffusion-based local text refiner ... reconstructing fine-grained stroke geometry ... injected as conditional guidance into the subsequent full-image restoration." TextSR (arXiv:2505.23119, Google): "leverages a text detector ... then employs OCR to extract multilingual text," regenerating from "multilingual character-to-shape diffusion priors" that "produce character shapes solely based on text prompts, even without visual input" — fresh pixels. TAIR/TeReDiff (ICLR 2026): standard restoration "frequently generates plausible but incorrect textures"; TeReDiff feeds text-spotter outputs back as prompts. **Caveat:** TIGER orders text-first then global (reverse of scrub-then-text); these target degraded-input super-resolution, not watermark removal, so the SynthID-scrub of the restoration stage must be verified empirically (the stages are themselves diffusion-based, so fresh-pixel = no SynthID is plausible but unproven here).
+
+**Sources.** https://arxiv.org/html/2510.21590v1 · https://arxiv.org/html/2505.23119v1 · https://cvlab-kaist.github.io/TAIR/ · https://arxiv.org/abs/2506.09993
+
+### Finding 6 — confidence: high
+
+**Claim.** Switching to a 16-channel VAE (SD3/FLUX class) materially reduces small-text/latent softening vs SDXL's 4-channel VAE, but it requires switching the base model — not a drop-in latent swap into an SDXL UNet img2img pipeline. RAE approaches are DiT-native and likewise not drop-in.
+
+**Evidence.** SD3/FLUX moved from 4-channel (48x) to 16-channel (12x) VAEs specifically to preserve fine detail (diffusers Discussion #8713; madebyollin VAE notes; arXiv:2305.02541). RAE (arXiv:2510.11690) "should be the new default for diffusion transformer training" but produces high-dimensional latents needing a DiT wide-DDT head — NOT compatible with an SDXL 4-channel UNet. EasyText shows the practical path: adopt a FLUX-DiT base rather than retrofit SDXL. The VAE upgrade couples to a base-model migration.
+
+**Sources.** https://arxiv.org/abs/2510.11690 · https://arxiv.org/pdf/2305.02541 · https://arxiv.org/html/2505.24417
+
+## Recommendation
+
+Under the hard constraint, the correct architecture is **not "protect text during the
+scrub" (Differential Diffusion)** but **"scrub everywhere, then restore text crisply by
+regeneration"**:
+
+1. Global SDXL scrub with text protection OFF (text region is scrubbed too).
+2. On detected text regions, a **glyph-conditioned restoration** that re-renders the same
+   glyphs as fresh pixels (no original reused).
+
+This is the only path that delivers both "watermark everywhere" and crisp text.
+
+**Top-2 to prototype:**
+- **TextSR** — detector + multilingual OCR + character-shape diffusion priors; closest to
+  the existing detector-driven pipeline.
+- **EasyText** — FLUX-DiT glyph re-render, multilingual incl. CJK; also gets the 16-channel
+  VAE for free.
+
+**Honest costs / unknowns:** this is a re-architecture, not a quick fix. It needs a new
+**OCR-recognition** step (we currently only detect text; we must know *what* to re-render).
+Models are FLUX/DiT-class (heavy) -> serverless GPU. Maturity is research-grade; CJK is
+covered, Cyrillic/Arabic crispness is not separately benchmarked -> a prototype must
+measure real fidelity. The restoration stage being diffusion-based makes "fresh pixels =
+no SynthID" plausible but **must be verified empirically** (run the SynthID oracle on the
+restored output).
+
+**Constraint-compatible quick win to try first:** run the global scrub at **higher
+resolution / tiled** so strokes exceed the latent cell — less softening, full scrub, no
+freezing. Cheap to test; quantify recall/quality vs cost.
+
+**Do not pursue:** raising `preserve` toward 1.0 or pixel paste-back (both leave original
+watermarked pixels in text); PELC/DecFormer pixel-equivalent latent compositing (refuted,
+not production-ready).
+
+## Provenance
+
+Deep-research workflow run `wf_118b9a03-3eb` (2026-05-29). Findings adversarially verified
+(2/3 refutes required to kill a claim). This note records research only; no code change is
+implied until a prototype validates fidelity and the SynthID-scrub guarantee on the
+restored output.
@@ -20,12 +20,12 @@ from rich.panel import Panel
 from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeElapsedColumn
 from rich.table import Table

-from remove_ai_watermarks import __version__
+from remove_ai_watermarks import __version__, watermark_registry

 if TYPE_CHECKING:
    from numpy.typing import NDArray

-    from remove_ai_watermarks.gemini_engine import DetectionResult, GeminiEngine
+    from remove_ai_watermarks.gemini_engine import DetectionResult

 console = Console()

@@ -133,72 +133,6 @@ def _write_bgr_with_alpha(
    image_io.imwrite(path, bgra)


-def _run_doubao_if_selected(
-    ctx: click.Context,
-    image: NDArray[Any],
-    alpha: NDArray[Any] | None,
-    output: Path,
-    mark: str,
-    gemini_engine: GeminiEngine,
-    detect: bool,
-    detect_threshold: float,
-    inpaint_method: str,
-    strip_metadata: bool,
-) -> bool:
-    """Run the Doubao text-strip removal path when it is the selected mark.
-
-    Returns True when this path handled the image (caller should stop). In
-    ``auto`` mode the Doubao detector competes with the Gemini detector and wins
-    only when it is both positive and at least as confident.
-    """
-    from remove_ai_watermarks.doubao_engine import DoubaoEngine
-
-    doubao = DoubaoEngine()
-    d_det = doubao.detect(image)
-
-    if mark == "auto":
-        g_det = gemini_engine.detect_watermark(image)
-        use_doubao = d_det.detected and d_det.confidence >= g_det.confidence
-        console.print(
-            f"  [dim]Mark auto:[/] gemini={g_det.confidence:.2f} doubao={d_det.confidence:.2f} "
-            f"-> {'doubao' if use_doubao else 'gemini'}"
-        )
-    else:
-        use_doubao = mark == "doubao"
-
-    if not use_doubao:
-        return False
-
-    if detect and not d_det.detected and d_det.confidence < detect_threshold:
-        console.print(
-            f"  [yellow]⚠[/] Doubao mark not detected  [dim](coverage {d_det.coverage:.1%}). "
-            f"Use --no-detect to force.[/]"
-        )
-        raise SystemExit(0)
-
-    method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
-    t0 = time.monotonic()
-    with console.status("[cyan]Removing Doubao watermark…[/]"):
-        result = doubao.remove_watermark(image, inpaint_method=method)
-    elapsed = time.monotonic() - t0
-
-    output.parent.mkdir(parents=True, exist_ok=True)
-    _write_bgr_with_alpha(output, result, alpha, clear_region=d_det.region)
-
-    if strip_metadata:
-        try:
-            from remove_ai_watermarks.metadata import remove_ai_metadata
-
-            remove_ai_metadata(output, output)
-        except Exception as e:
-            if ctx.obj.get("verbose"):
-                console.print(f"  [yellow]⚠[/] Failed to strip metadata: {e}")
-
-    size_kb = output.stat().st_size / 1024
-    console.print(f"  [green]✓[/] Doubao mark removed → {output}  [dim]({size_kb:.0f} KB, {elapsed:.2f}s)[/]")
-    return True
-
-
 # ── Main group ───────────────────────────────────────────────────────


@@ -238,9 +172,10 @@ def main(ctx: click.Context, verbose: bool) -> None:
@click.option("--detect-threshold", type=float, default=0.25, help="Detection confidence threshold.")
@click.option(
    "--mark",
-    type=click.Choice(["auto", "gemini", "doubao"]),
+    type=click.Choice(["auto", *watermark_registry.mark_keys()]),
    default="auto",
-    help="Which visible mark to target. auto picks the stronger of the two detectors.",
+    help="Which known visible mark to target (auto picks the strongest detected). "
+    "All marks are removed by exact reverse-alpha against a captured alpha map.",
 )
@click.option("--strip-metadata/--keep-metadata", default=True, help="Strip AI metadata from output.")
@click.pass_context
@@ -256,13 +191,14 @@ def cmd_visible(
    mark: str,
    strip_metadata: bool,
 ) -> None:
-    """Remove a visible AI watermark from an image.
+    """Remove a known visible AI watermark from an image.

-    Targets the Gemini sparkle logo (reverse alpha blending) or the Doubao
-    "豆包AI生成" text strip (locate -> mask -> inpaint). Fast, deterministic,
-    offline. ``--mark auto`` picks whichever detector fires stronger.
+    Finds a known mark in its usual place (Gemini sparkle / Doubao text) via the
+    watermark registry and removes it by exact reverse-alpha against a captured
+    alpha map -- recovering the true pixels, not an inpaint guess. ``--mark auto``
+    picks the strongest detected mark. For arbitrary logos/objects, use ``erase``.
    """
-    from remove_ai_watermarks.gemini_engine import GeminiEngine
+    from remove_ai_watermarks import watermark_registry as registry

    _banner()
    source = _validate_image(source)
@@ -270,8 +206,6 @@ def cmd_visible(
    if output is None:
        output = source.with_stem(source.stem + "_clean")

-    engine = GeminiEngine()
-
    # Load image (preserving any alpha channel separately)
    image, alpha = _read_bgr_and_alpha(source)
    if image is None:
@@ -281,45 +215,44 @@ def cmd_visible(
    h, w = image.shape[:2]
    console.print(f"  [dim]Input:[/]  {source.name}  ({w}x{h})")

-    # Resolve which visible mark to target, then run the Doubao path if chosen.
-    if _run_doubao_if_selected(
-        ctx, image, alpha, output, mark, engine, detect, detect_threshold, inpaint_method, strip_metadata
-    ):
-        return
-
-    # Detection (we always detect softly, to find dynamic region for inpainting)
-    with console.status("[cyan]Detecting watermark…[/]"):
-        det = engine.detect_watermark(image)
-
-    if detect:
-        if det.detected:
-            console.print(
-                f"  [green]✓[/] Watermark detected  "
-                f"[dim](confidence: {det.confidence:.1%}, "
-                f"spatial: {det.spatial_score:.3f}, "
-                f"gradient: {det.gradient_score:.3f})[/]"
-            )
-        else:
-            console.print(f"  [yellow]⚠[/] Watermark not detected  [dim](confidence: {det.confidence:.1%})[/]")
-            if det.confidence < detect_threshold:
-                console.print("  [dim]Skipping. Use --no-detect to force removal.[/]")
+    # Resolve the target mark from the known-watermark registry. ``auto`` scans
+    # every in-auto mark in its usual place and picks the strongest; an explicit
+    # ``--mark <key>`` targets that one (the user asserts its presence).
+    if mark == "auto":
+        best = registry.best_auto_mark(image)
+        if best is None:
+            console.print("  [yellow]⚠[/] No known visible mark detected (gemini / doubao).")
+            if detect:
+                console.print("  [dim]Skipping. Use --mark <name> --no-detect to force.[/]")
                raise SystemExit(0)
+            target = "gemini"  # forced (no-detect): fall back to the default mark
+        else:
+            target = best.key
+            console.print(f"  [dim]Mark auto:[/] {best.label}  [dim]({best.location}, conf {best.confidence:.2f})[/]")
+    else:
+        target = mark

-    # Removal
+    chosen = registry.get_mark(target)
+    det = chosen.detect(image)
+    if detect and not det.detected:
+        console.print(
+            f"  [yellow]⚠[/] {chosen.label} not detected  "
+            f"[dim](conf {det.confidence:.2f}). Use --no-detect to force.[/]"
+        )
+        raise SystemExit(0)
+    if det.detected:
+        console.print(f"  [green]✓[/] {chosen.label} detected  [dim]({chosen.location}, conf {det.confidence:.2f})[/]")
+
+    method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
    t0 = time.monotonic()
-    region: tuple[int, int, int, int] | None = None
-    with console.status("[cyan]Removing watermark…[/]"):
-        result = engine.remove_watermark(image)
-
-        if inpaint:
-            region = _watermark_region(det, w, h)
-            result = engine.inpaint_residual(
-                result,
-                region,
-                strength=inpaint_strength,
-                method=inpaint_method,
-            )
-
+    with console.status(f"[cyan]Removing {chosen.label}… ({chosen.recovery})[/]"):
+        result, region = chosen.remove(
+            image,
+            inpaint_method=method,
+            inpaint=inpaint,
+            inpaint_strength=inpaint_strength,
+            force=not detect,
+        )
    elapsed = time.monotonic() - t0

    # Save (preserves transparency by clearing alpha in the watermark region)
@@ -1,29 +1,24 @@
 """Doubao visible watermark removal engine.

 Doubao (ByteDance) stamps every generated image with a visible "豆包AI生成"
-(Doubao AI generated) text strip in the bottom-right corner. This is the
-explicit AIGC label mandated by China's TC260 standard, rendered as a
-near-white / light-gray, low-saturation text overlay.
+(Doubao AI generated) text strip in the bottom-right corner -- the explicit AIGC
+label mandated by China's TC260 standard, a near-white semi-transparent overlay.

-Unlike the Gemini sparkle (a fixed square logo removed by reverse alpha
-blending against a captured alpha map), the Doubao mark is a text strip whose
-exact alpha map we do not yet have. This engine therefore removes it by:
+Like the Gemini sparkle, it is a fixed overlay, so it is removed by **exact
+reverse-alpha blending** against a captured alpha map (``remove_watermark_reverse_alpha``):
+``original = (wm - a*logo)/(1-a)`` -- recovering the true pixels, not an inpaint
+guess. The alpha map + logo colour were solved from black+gray Doubao captures
+(see data/doubao_capture/ and the reverse-alpha section below) and bundled as
+``assets/doubao_alpha.png``.

-    locate -> mask -> inpaint
+Detection (``detect``) is reverse-alpha-consistent: it matches that same alpha
+glyph silhouette against the corner via normalized correlation, so it keys on
+the actual "豆包AI生成" shape rather than coverage/structure heuristics.

-1. Locate: the mark scales with image WIDTH and sits in the bottom-right at a
-   fixed margin, so we anchor a generous box there (geometry only -- no bundled
-   template). Constants below are derived from measured Doubao output.
-2. Mask: within the box, extract the light, low-saturation glyph pixels with a
-   polarity-aware rule (the mark is brighter than dark backgrounds and a
-   distinct off-white gray against light backgrounds).
-3. Inpaint: cv2 inpainting (TELEA / NS) reconstructs the covered pixels.
-
-This is fast, offline, deterministic, and needs no GPU. A future upgrade path
-is per-pixel reverse alpha blending once a Doubao alpha map is captured on a
-controlled black background (see data/doubao_capture/), which would recover the
-true pixels instead of hallucinating them -- the same approach as the Gemini
-engine.
+``locate`` (geometry box, scales with image WIDTH) and ``extract_mask`` (the
+candidate glyph mask the detector correlates) remain; there is no inpaint-based
+removal here -- arbitrary-region inpainting lives in ``region_eraser`` / the
+``erase`` command. Fast, offline, no GPU.
 """

 # cv2/numpy boundary: third-party libs ship no usable element types; relax the
@@ -33,7 +28,7 @@ from __future__ import annotations

 import logging
 from dataclasses import dataclass
-from typing import TYPE_CHECKING, Any, Literal
+from typing import TYPE_CHECKING, Any

 import cv2
 import numpy as np
@@ -66,17 +61,63 @@ MAX_SATURATION = 55  # max channel spread to count a pixel as "grayish"
 LOGO_MIN_LUMA = 150  # glyphs are at least this bright in absolute terms
 TOPHAT_DELTA = 12  # glyph must exceed the local background by this many levels

-# Detection: a genuine label fills a meaningful fraction of the box. Measured
-# coverage is >=0.20 on real Doubao outputs; random/textured corners stay <=0.06
-# on large images but can spike to ~0.15 on tiny ones (small box -> high variance),
-# so the threshold sits above that spike and below the real-mark floor.
-DETECT_MIN_COVERAGE = 0.16
+# Detection is reverse-alpha-consistent: the mark is recognized by matching the
+# bundled alpha-template glyph silhouette (assets/doubao_alpha.png -- the exact
+# shape we invert) against the extracted candidate mask via zero-mean normalized
+# correlation (cv2 TM_CCOEFF_NORMED). It keys on the actual "豆包AI生成" glyph
+# SHAPE, not on coverage/structure heuristics, so a merely-textured corner does
+# not fire (the old coverage detector false-positived on ~28% of images; #23).
+# Corpus-tuned: real marks score median ~0.61, arbitrary corners <=0.17 (p99);
+# threshold 0.4 -> false positives 7/1243 (0.6%). A small coverage floor skips
+# the template match on a near-empty candidate box.
+DETECT_MIN_COVERAGE = 0.04
+DETECT_NCC_THRESHOLD = 0.4

-# Safety: a text strip fills a modest slice of the (generous) box. When the box
-# is over a dense-text / document background the mask explodes and cv2 inpainting
-# would smear the real content. Above this coverage we refuse to inpaint and
-# leave the image untouched -- that hard case needs the neural path, not a guess.
-MAX_INPAINT_COVERAGE = 0.50
+# ── Reverse-alpha (exact recovery, Gemini-style) ─────────────────────
+# The Doubao mark is a fixed semi-transparent white overlay, so given its alpha
+# map the original pixels are recovered exactly: original = (wm - a*logo)/(1-a).
+# The alpha map + logo colour were solved from black+gray Doubao captures on a
+# controlled background (data/doubao_capture/): on black, captured = a*logo, and
+# the black/gray pair solves a per-pixel WITHOUT assuming the logo colour. The
+# bundled asset (assets/doubao_alpha.png) is the alpha template (a*255) at the
+# captured width. The mark scales with image WIDTH, but a pure width-scale is
+# only sub-pixel-accurate at the captured width and ghosts elsewhere, so removal
+# does NOT trust fixed geometry: `_aligned_alpha_map` registers the template to
+# the actual mark by a TM_CCOEFF_NORMED scale+position search, which makes the
+# single capture work at any resolution (verified clean on 1773x2364). Verified
+# 2026-05-29: white-capture cross-check -> mark vanishes to a flat fill; clean on
+# doubao-1.png (2048) and the 3:4 portrait corpus size.
+_ALPHA_NATIVE_WIDTH = 2048
+_ALPHA_LOGO_BGR: tuple[float, float, float] = (252.0, 255.0, 255.0)
+_ALPHA_WIDTH_FRAC = 0.1572  # glyph width / image width -- the alignment scale seed
+_ALPHA_HEIGHT_FRAC = 0.0347
+# Margins (of image WIDTH) of the captured mark -- the geometry record / where to
+# seed; alignment refines the actual position, so these are not load-bearing.
+_ALPHA_MARGIN_RIGHT_FRAC = 0.0166
+_ALPHA_MARGIN_BOTTOM_FRAC = 0.0195
+# Alignment scale search (np.linspace args) around the width-scaled glyph size.
+_ALPHA_ALIGN_SEARCH = (0.88, 1.12, 13)
+# At (near) the captured width the fixed geometry is pixel-exact, so we use it
+# directly there -- NCC alignment is integer-pixel and would land ~1px off,
+# degrading the otherwise-exact native recovery. Off this band, alignment wins.
+_ALPHA_NATIVE_BAND = 0.03
+_alpha_template_cache: NDArray[Any] | None = None
+
+
+def _alpha_template() -> NDArray[Any] | None:
+    """Lazily load the bundled Doubao alpha template (float [0,1]), or None."""
+    global _alpha_template_cache
+    if _alpha_template_cache is None:
+        from pathlib import Path
+
+        from remove_ai_watermarks import image_io
+
+        path = Path(__file__).parent / "assets" / "doubao_alpha.png"
+        img = image_io.imread(str(path), cv2.IMREAD_GRAYSCALE)
+        if img is None:
+            return None
+        _alpha_template_cache = img.astype(np.float32) / 255.0
+    return _alpha_template_cache


@dataclass(frozen=True)
@@ -104,6 +145,39 @@ class DoubaoDetection:
    coverage: float = 0.0  # fraction of the box occupied by glyph pixels


+_silhouette_cache: NDArray[Any] | None = None
+
+
+def _glyph_silhouette() -> NDArray[Any] | None:
+    """Binary "豆包AI生成" silhouette (255 = glyph) from the bundled alpha map,
+    used as the detection template. None if the alpha asset is missing."""
+    global _silhouette_cache
+    if _silhouette_cache is None:
+        at = _alpha_template()
+        if at is None:
+            return None
+        _silhouette_cache = (at > 0.15).astype(np.uint8) * 255
+    return _silhouette_cache
+
+
+def _template_match_score(box_mask: NDArray[Any], image_width: int) -> float:
+    """Zero-mean normalized correlation of the alpha-template glyph silhouette
+    (scaled to the mark's expected size) against the candidate ``box_mask``.
+
+    TM_CCOEFF_NORMED keys on glyph SHAPE, not coverage, so a dense textured
+    corner does not score highly -- only the actual "豆包AI生成" shape does.
+    """
+    sil = _glyph_silhouette()
+    if sil is None or box_mask.size == 0:
+        return 0.0
+    gw = min(box_mask.shape[1] - 1, max(8, int(_ALPHA_WIDTH_FRAC * image_width)))
+    gh = min(box_mask.shape[0] - 1, max(4, int(_ALPHA_HEIGHT_FRAC * image_width)))
+    if gw < 8 or gh < 4:
+        return 0.0
+    template = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST)
+    return float(cv2.matchTemplate(box_mask, template, cv2.TM_CCOEFF_NORMED).max())
+
+
 class DoubaoEngine:
    """Remove the visible Doubao "豆包AI生成" watermark (locate -> mask -> inpaint)."""

@@ -176,10 +250,12 @@ class DoubaoEngine:
    # ── Detect ────────────────────────────────────────────────────────

    def detect(self, image: NDArray[Any]) -> DoubaoDetection:
-        """Detect the visible Doubao mark by glyph coverage in the corner box.
+        """Detect the visible Doubao mark by matching the alpha-template glyph
+        silhouette against the corner candidate (TM_CCOEFF_NORMED).

-        Heuristic: a genuine label fills a meaningful fraction of the box with
-        text-like glyph pixels. Coverage maps to a confidence score.
+        Keys on the "豆包AI生成" SHAPE, not coverage, so a textured corner does
+        not fire. ``confidence`` is the correlation score; ``detected`` is it
+        clearing ``DETECT_NCC_THRESHOLD``.
        """
        det = DoubaoDetection()
        if image is None or image.size == 0:
@@ -191,53 +267,113 @@ class DoubaoEngine:
        coverage = float((box > 0).sum()) / float(max(1, bw * bh))
        det.region = loc.bbox
        det.coverage = coverage
-        # Map coverage to a 0-1 confidence: ~0.06 (noise floor) -> 0, ~0.26 -> 1.
-        det.confidence = float(max(0.0, min(1.0, (coverage - 0.06) / 0.20)))
-        det.detected = coverage >= DETECT_MIN_COVERAGE
-        logger.debug("Doubao detect: coverage=%.3f conf=%.3f", coverage, det.confidence)
+        if coverage >= DETECT_MIN_COVERAGE:
+            score = _template_match_score(box, image.shape[1])
+            det.confidence = score
+            det.detected = score >= DETECT_NCC_THRESHOLD
+            logger.debug("Doubao detect: coverage=%.3f ncc=%.2f detected=%s", coverage, score, det.detected)
        return det

-    # ── Remove ────────────────────────────────────────────────────────
+    # ── Reverse-alpha (exact recovery) ────────────────────────────────

-    def remove_watermark(
-        self,
-        image: NDArray[Any],
-        *,
-        inpaint_method: Literal["telea", "ns"] = "telea",
-        inpaint_radius: int = 6,
-        dilate: int = 3,
-    ) -> NDArray[Any]:
-        """Remove the visible Doubao watermark by inpainting the glyph mask.
+    def reverse_alpha_available(self, image: NDArray[Any]) -> bool:
+        """True if the bundled alpha map is loadable. Sub-pixel NCC alignment
+        (see ``_aligned_alpha_map``) places it on the actual mark at ANY
+        resolution, so there is no width gate -- the caller still gates on
+        ``detect`` so a clean corner is never touched."""
+        return image is not None and image.size > 0 and _alpha_template() is not None

-        Returns an unmodified copy when no glyph pixels are found (so we never
-        smear a clean corner). ``dilate`` grows the mask to cover anti-aliased
-        glyph edges before inpainting.
-        """
-        if image is None or image.size == 0:
-            return image
+    def _fixed_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None:
+        """Place the template by fixed width-relative geometry -- pixel-exact at
+        the captured width (used there instead of integer-pixel NCC alignment)."""
+        at = _alpha_template()
+        if at is None:
+            return None
+        h, w = image.shape[:2]
+        gw, gh = max(1, int(_ALPHA_WIDTH_FRAC * w)), max(1, int(_ALPHA_HEIGHT_FRAC * w))
+        ax = max(0, w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw)
+        ay = max(0, h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh)
+        amap = np.zeros((h, w), np.float32)
+        amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
+        return amap, (ax, ay, gw, gh)
+
+    def _aligned_alpha_map(self, image: NDArray[Any]) -> tuple[NDArray[Any], tuple[int, int, int, int]] | None:
+        """Build a full-image alpha map with the captured template registered to
+        the actual mark via a TM_CCOEFF_NORMED scale + position search -- so the
+        single capture works off the captured width (a pure width-scale ghosts).
+        Returns ``(alpha_map, glyph_bbox)`` or None."""
+        at = _alpha_template()
+        sil = _glyph_silhouette()
+        if at is None or sil is None:
+            return None
+        h, w = image.shape[:2]
        loc = self.locate(image)
-        mask = self.extract_mask(image, loc)
-        if not mask.any():
-            logger.debug("Doubao remove: no glyph pixels found; returning copy")
+        bx, by, bw, bh = loc.bbox
+        box_mask = self.extract_mask(image, loc)[by : by + bh, bx : bx + bw]
+        expected = _ALPHA_WIDTH_FRAC * w
+        best: tuple[float, int, int, int, int] | None = None
+        for scale in np.linspace(*_ALPHA_ALIGN_SEARCH):
+            gw, gh = int(expected * scale), int(_ALPHA_HEIGHT_FRAC * w * scale)
+            if gw < 8 or gh < 4 or gw >= bw or gh >= bh:
+                continue
+            t = cv2.resize(sil, (gw, gh), interpolation=cv2.INTER_NEAREST)
+            _, score, _, top_left = cv2.minMaxLoc(cv2.matchTemplate(box_mask, t, cv2.TM_CCOEFF_NORMED))
+            if best is None or score > best[0]:
+                best = (score, gw, gh, top_left[0], top_left[1])
+        if best is None:
+            return None
+        _, gw, gh, ox, oy = best
+        ax, ay = bx + ox, by + oy
+        amap = np.zeros((h, w), np.float32)
+        amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
+        return amap, (ax, ay, gw, gh)
+
+    def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]:
+        """Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``."""
+        a3 = np.clip(amap, 0.0, 1.0)[:, :, None]
+        logo = np.array(_ALPHA_LOGO_BGR, np.float32)
+        return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
+
+    def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]:
+        """Recover the original pixels by inverting the alpha blend
+        ``original = (wm - a*logo)/(1-a)``.
+
+        Placement: at (near) the captured width the fixed geometry is pixel-exact,
+        so the recovery is returned UNTOUCHED -- inpainting over exactly-recovered
+        interior pixels only swaps them for a cv2 hallucination (measured worse on
+        textured backgrounds: native error vs true bg 1.6 reverse-alpha-only vs
+        2.6 with full-footprint inpaint). Off-native, NCC alignment registers the
+        template to the real mark; the alignment is only sub-pixel-approximate, so
+        the interior recovery is no longer exact and the seam can re-trip the
+        detector. There we try BOTH placements and keep whichever leaves the least
+        residual mark (on a faint/busy-background mark the NCC peak can wander a
+        few px, where geometry wins; on a clear mark alignment wins) -- no magic
+        threshold, it just picks the better removal -- then a residual inpaint over
+        the glyph footprint cleans the seam (the interior is approximate anyway, so
+        inpaint there costs nothing and reliably clears the mark).
+        Call only when :meth:`reverse_alpha_available` and the mark is detected.
+        """
+        at_native = abs(image.shape[1] / _ALPHA_NATIVE_WIDTH - 1.0) <= _ALPHA_NATIVE_BAND
+        if at_native:
+            amap = self._fixed_alpha_map(image)
+            return self._apply_reverse_alpha(image, amap[0]) if amap is not None else image.copy()
+        maps = [c for c in (self._fixed_alpha_map(image), self._aligned_alpha_map(image)) if c is not None]
+        if not maps:
            return image.copy()
-
-        x, y, bw, bh = loc.bbox
-        coverage = float((mask[y : y + bh, x : x + bw] > 0).sum()) / float(max(1, bw * bh))
-        if coverage > MAX_INPAINT_COVERAGE:
-            logger.warning(
-                "Doubao remove: box coverage %.2f exceeds %.2f (dense-text/document "
-                "background); leaving image untouched to avoid smearing content",
-                coverage,
-                MAX_INPAINT_COVERAGE,
-            )
+        best_out: NDArray[Any] | None = None
+        best_amap: NDArray[Any] | None = None
+        best_residual = float("inf")
+        for amap, _region in maps:
+            out = self._apply_reverse_alpha(image, amap)
+            residual = self.detect(out).confidence
+            if residual < best_residual:
+                best_residual, best_out, best_amap = residual, out, amap
+        if best_out is None or best_amap is None:  # pragma: no cover - maps is non-empty
            return image.copy()
-
-        if dilate > 0:
-            k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2 * dilate + 1, 2 * dilate + 1))
-            mask = cv2.dilate(mask, k)
-
-        flag = cv2.INPAINT_TELEA if inpaint_method == "telea" else cv2.INPAINT_NS
-        return cv2.inpaint(image, mask, inpaint_radius, flag)
+        if residual_inpaint:
+            rm = cv2.dilate((best_amap > 0.10).astype(np.uint8) * 255, np.ones((3, 3), np.uint8))
+            best_out = cv2.inpaint(best_out, rm, 3, cv2.INPAINT_TELEA)
+        return best_out


 def load_image_bgr(path: str | Path) -> NDArray[Any]:
@@ -25,14 +25,15 @@ from typing import TYPE_CHECKING
 from remove_ai_watermarks.metadata import (
    AI_METADATA_KEYS,
    AIGC_MARKERS,
-    C2PA_UUID,
    IPTC_AI_FIELD_MARKERS,
    IPTC_AI_MARKERS,
    aigc_label,
+    c2pa_marker_in,
    exif_generator,
    get_ai_metadata,
    huggingface_job,
    iptc_ai_system,
+    samsung_genai,
    scan_head,
    xai_signature,
 )
@@ -65,6 +66,8 @@ _ISSUER_PLATFORM: tuple[tuple[str, str], ...] = (
    ("OpenAI", "OpenAI (ChatGPT / gpt-image / DALL-E / Sora)"),
    ("Google", "Google (Gemini / Imagen)"),
    ("Stability AI", "Stability AI (Stable Image / DreamStudio)"),
+    ("Black Forest Labs", "Black Forest Labs (FLUX)"),
+    ("ByteDance", "ByteDance (Doubao / Jimeng / Volcano Engine)"),
 )

 # PNG-text / EXIF keys that indicate a local diffusion pipeline (vs. a hosted
@@ -95,6 +98,12 @@ _HF_JOB_CAVEAT = (
    "generation) but names neither the model nor the content type, so it is a "
    "medium-confidence signal, not proof the pixels are AI-generated."
 )
+_SAMSUNG_GENAI_CAVEAT = (
+    "Samsung's genAIType marker shows a Galaxy AI editing tool (Generative Edit, "
+    "Sketch to Image, ...) touched the image; it is an undocumented proprietary "
+    "field, so it is a medium-confidence signal of AI editing, not proof the "
+    "whole image is AI-generated."
+)


@dataclass
@@ -151,7 +160,9 @@ def _ai_tools_in(data: bytes) -> list[str]:
 # assert is_ai on their own (the verdict still comes from the digital-source-type:
 # the Pixel sample carries `computationalCapture`, not `trainedAlgorithmicMedia`).
 # Only tokens verified against a real signed file are listed (Leica, Nikon,
-# Truepic, Google Pixel); add Sony/Canon/Samsung/Bria as real samples are captured.
+# Sony, Truepic, Google Pixel); add Canon/Bria as real samples are captured.
+# Samsung Galaxy is an AI-capable editing device, not a pure-capture camera, so
+# it lives in `_SIGNER_C2PA_PLATFORM` below (it must not feed the camera clash).
 _DEVICE_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = (
    (b"lc_c2pa", "Leica (camera, C2PA capture)"),
    (b"Leica Camera", "Leica (camera, C2PA capture)"),
@@ -177,6 +188,32 @@ def _device_platform(head: bytes) -> str | None:
    return None


+# C2PA signers that are an editing app or AI-capable device rather than a
+# verified-capture camera. Unlike `_DEVICE_C2PA_PLATFORM`, these do NOT feed the
+# camera-vs-AI integrity clash (rule 2 in `_integrity_clashes`): a Galaxy phone
+# legitimately stamps BOTH its device credentials AND a `trainedAlgorithmicMedia`
+# source type on a Generative-Edit image, so treating it as a "genuine camera
+# capture" would false-flag every Galaxy AI edit. They only resolve the platform
+# label; the AI verdict still comes from the digital-source-type / genAIType.
+# Tokens verified against real signed files (2026-05-29):
+#   Samsung Galaxy -- cert org on Galaxy S23 FE / S24 / S25 C2PA JPEGs/PNGs
+#     (distinct from the EXIF "SM-xxxx" model string on ordinary Samsung photos).
+#   com.asus.gallery -- ASUS Gallery claim_generator (a C2PA-signed edit, no AI
+#     source type or genAIType on the samples, so it never asserts is_ai).
+_SIGNER_C2PA_PLATFORM: tuple[tuple[bytes, str], ...] = (
+    (b"Samsung Galaxy", "Samsung Galaxy (C2PA)"),
+    (b"com.asus.gallery", "ASUS Gallery (C2PA signer)"),
+)
+
+
+def _signer_platform(head: bytes) -> str | None:
+    """Map a C2PA editing-app / AI-capable-device signer token to a platform."""
+    for token, platform in _SIGNER_C2PA_PLATFORM:
+        if token in head:
+            return platform
+    return None
+
+
 def _attribute_platform(issuers: list[str], *, is_ai: bool = True) -> str | None:
    """Map a set of C2PA issuer names to a human-readable generating platform.

@@ -353,9 +390,10 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
    # neither is a trustworthy "the generator stamped its identity" claim.
    ai_vendor_claims: dict[str, str] = {}
    camera_label = _device_platform(head)
+    signer_label = _signer_platform(head)

    # ── C2PA Content Credentials ────────────────────────────────────
-    has_c2pa = bool(info) or b"c2pa" in head.lower() or C2PA_UUID in head
+    has_c2pa = bool(info) or c2pa_marker_in(head)
    issuers = [info["issuer"]] if info.get("issuer") else _issuers_in(head)
    c2pa_is_ai = "trainedAlgorithmicMedia" in info.get("source_type", "") or any(
        m in head for m in (b"trainedAlgorithmicMedia", b"compositeWithTrainedAlgorithmicMedia")
@@ -370,10 +408,11 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
        or (", ".join(tools) if (tools := _ai_tools_in(head)) else None)
    )
    # Platform: a distinctive device/camera token in the manifest wins (it is the
-    # signer/producer), with the issuer byte-scan only as fallback. The issuer
-    # scan alone mis-attributed real samples (Leica->Truepic timestamp authority,
-    # Nikon->Adobe namespace, Pixel->Google Gemini) -- the device scan fixes that.
-    platform = (camera_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None
+    # signer/producer), then an editing-app/AI-device signer (Samsung Galaxy,
+    # ASUS Gallery), with the issuer byte-scan only as fallback. The issuer scan
+    # alone mis-attributed real samples (Leica->Truepic timestamp authority,
+    # Nikon->Adobe namespace, Pixel->Google Gemini) -- the token scans fix that.
+    platform = (camera_label or signer_label or _attribute_platform(issuers, is_ai=c2pa_is_ai)) if has_c2pa else None
    if has_c2pa:
        detail = ", ".join(filter(None, [", ".join(issuers), generator, info.get("source_type")]))
        signals.append(Signal("c2pa", detail or "C2PA manifest present", "high"))
@@ -484,6 +523,22 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
        if platform is None:
            platform = "HuggingFace-hosted job (model not identified)"

+    # ── Samsung Galaxy AI editing marker (genAIType) ─────────────────
+    # Galaxy AI tools stamp a proprietary genAIType in PhotoEditor_Re_Edit_Data.
+    # Medium confidence: it co-occurs with the C2PA trainedAlgorithmicMedia type
+    # on Galaxy files that record one, and is the SOLE AI marker on a Galaxy S24
+    # sample that omits the source type -- so it lifts an otherwise-Unknown
+    # verdict, but the field is undocumented, so it never overrides a high-
+    # confidence signal. The platform is usually already "Samsung Galaxy" via the
+    # signer-token scan; the fallback covers a future file without the cert org.
+    samsung_genai_type = samsung_genai(image_path)
+    if samsung_genai_type is not None:
+        signals.append(Signal("samsung_genai", f"Samsung genAIType={samsung_genai_type}", "medium"))
+        watermarks.append("Samsung Galaxy AI editing marker (genAIType)")
+        caveats.append(_SAMSUNG_GENAI_CAVEAT)
+        if platform is None:
+            platform = "Samsung Galaxy (Galaxy AI editing)"
+
    # ── Open invisible watermark (SD / SDXL / FLUX, dwtDct) ──────────
    # Public decoder, no key -- a definitive embedded signal on pristine files.
    if check_invisible and (scheme := _invisible_watermark(image_path)) is not None:
@@ -527,11 +582,12 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b

    visible_only = any(s.name == "visible_sparkle" for s in signals) and not ai_from_metadata
    hf_only = bool(hf_job) and not ai_from_metadata
+    samsung_only = samsung_genai_type is not None and not ai_from_metadata

    if ai_from_metadata:
        is_ai: bool | None = True
        confidence = "high"
-    elif visible_only or hf_only:
+    elif visible_only or hf_only or samsung_only:
        is_ai = True
        confidence = "medium"
    else:
@@ -65,6 +65,22 @@ AI_KEYWORDS: tuple[str, ...] = (
 # Reference: https://spec.c2pa.org/specifications/specifications/2.1/specs/C2PA_Specification.html
 C2PA_UUID: bytes = bytes.fromhex("d8fec3d61b0e483c92975828877ec481")

+
+def c2pa_marker_in(data: bytes) -> bool:
+    """True if ``data`` carries a real C2PA manifest marker, not just an
+    incidental 4-byte ``c2pa`` substring.
+
+    A bare ``c2pa`` byte match false-positives on compressed pixel data -- a
+    recompressed PNG IDAT (or any large binary) can contain the bytes ``c2pa``
+    by chance (verified 2026-05-29: 4 cleaned PNGs re-flagged this way after
+    their manifest was correctly stripped). Every real manifest is JUMBF-wrapped
+    (the ``jumb`` box FourCC accompanies the ``c2pa`` content type) or uses the
+    standalone C2PA ``uuid`` box in ISOBMFF, so we require one of those: the
+    joint ``jumb`` + ``c2pa`` match has negligible random-collision probability.
+    """
+    return C2PA_UUID in data or (b"jumb" in data and b"c2pa" in data.lower())
+
+
 # IPTC ``digitalSourceType`` values (IPTC 2025.1) that flag AI provenance.
 # Used by Instagram, Facebook, X (Twitter) to show "Made with AI" labels.
 IPTC_AI_MARKERS: tuple[bytes, ...] = (
@@ -213,9 +229,7 @@ def has_ai_metadata(image_path: Path) -> bool:
    # Binary scan covers C2PA (PNG caBX, JPEG APP11, AVIF/HEIF/JXL uuid boxes)
    # and IPTC AI markers in XMP. First 512KB (plus late ISOBMFF provenance boxes).
    data = scan_head(image_path, 512 * 1024)
-    if b"c2pa" in data.lower() or b"C2PA" in data:
-        return True
-    if C2PA_UUID in data:
+    if c2pa_marker_in(data):
        return True
    if any(marker in data for marker in AIGC_MARKERS):
        return True
@@ -310,6 +324,39 @@ def huggingface_job(image_path: Path) -> str | None:
    return None


+# Samsung Galaxy AI editing marker. Galaxy AI tools (Generative Edit, Sketch to
+# Image, Portrait Studio, Drawing Assist, ...) record their re-edit data as a
+# proprietary ``PhotoEditor_Re_Edit_Data`` JSON that carries a ``genAIType``
+# field; a non-zero value flags that a generative-AI tool produced or altered
+# the pixels. The field is undocumented by Samsung (verified 2026-05-29: absent
+# from the C2PA spec and Samsung's public docs/forums), so detection is
+# empirical -- on real Galaxy S23/S24/S25 files it co-occurs with the C2PA
+# ``trainedAlgorithmicMedia`` source type (3/3 of the verified files that record
+# that type), and on a Galaxy S24 sample it is the *only* AI marker (the C2PA
+# source type was absent there). Medium confidence: it signals Galaxy AI editing
+# without proving the whole image is AI-generated. Scoped to the Samsung editor
+# container to avoid matching a stray ``genAIType`` token elsewhere.
+_SAMSUNG_GENAI_RE = re.compile(rb'genAIType"\s*:\s*(-?\d+)')
+_SAMSUNG_EDITOR_MARKER = b"PhotoEditor_Re_Edit_Data"
+
+
+def samsung_genai(image_path: Path) -> int | None:
+    """Return Samsung's non-zero ``genAIType`` value if the image carries the
+    Galaxy AI editing marker, else None.
+
+    See the module note above ``_SAMSUNG_GENAI_RE``: detection is empirical and
+    gated on the ``PhotoEditor_Re_Edit_Data`` container so an incidental
+    ``genAIType`` token cannot false-positive.
+    """
+    head = scan_head(image_path, 512 * 1024)
+    if _SAMSUNG_EDITOR_MARKER not in head:
+        return None
+    m = _SAMSUNG_GENAI_RE.search(head)
+    if m is None:
+        return None
+    return int(m.group(1)) or None
+
+
 def iptc_ai_system(image_path: Path) -> str | None:
    """Return an IPTC 2025.1 AI-disclosure note if the file carries those XMP
    properties, else None.
@@ -360,7 +407,7 @@ def synthid_source(image_path: Path) -> str | None:
    # C2PA manifest where the PNG parser can't reach it. Binary-scan for the
    # same signal: a C2PA manifest from a SynthID-using issuer on AI content.
    data = scan_head(image_path)
-    has_c2pa = b"c2pa" in data.lower() or C2PA_UUID in data
+    has_c2pa = c2pa_marker_in(data)
    # Matches both "trainedAlgorithmicMedia" and "compositeWithTrainedAlgorithmicMedia".
    ai_source = b"trainedAlgorithmicMedia" in data or b"TrainedAlgorithmicMedia" in data
    if not (has_c2pa and ai_source):
@@ -585,6 +632,9 @@ def get_ai_metadata(image_path: Path) -> dict[str, str]:
    # HuggingFace-hosted job marker (hf-job-id PNG text chunk).
    if job := huggingface_job(image_path):
        result.setdefault("huggingface_job", f"HuggingFace-hosted job ({job})")
+    # Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data).
+    if (genai := samsung_genai(image_path)) is not None:
+        result.setdefault("samsung_genai", f"Samsung Galaxy AI editing marker (genAIType={genai})")
    return result


@@ -88,6 +88,14 @@ C2PA_ISSUERS = {
    # Stability AI signs C2PA as "Stability AI" (cert org "Stability AI Ltd").
    # Verified on a live Brand Studio (DreamStudio successor) output, 2026-05-24.
    b"Stability AI": "Stability AI",
+    # Black Forest Labs (FLUX) API output: claim_generator_info "Black Forest
+    # Labs API" + a c2pa.ai_generated_content assertion + trainedAlgorithmicMedia.
+    # Verified on a real signed FLUX JPEG, 2026-05-29.
+    b"Black Forest Labs": "Black Forest Labs",
+    # ByteDance's Volcano Engine (Volcengine) signs its AI image output with a
+    # cert from certificate_center@volcengine.com -- the platform behind Doubao /
+    # Jimeng. Verified on two real signed JPEGs, 2026-05-29.
+    b"volcengine": "ByteDance (Volcano Engine)",
 }

 # C2PA issuers whose signed outputs also carry an invisible SynthID pixel
@@ -51,12 +51,31 @@ def _decoder() -> Any:
    return _tm


+# JPEG quality for the false-positive durability gate (see detect_trustmark).
+# Deliberately mild: a genuine TrustMark survives far harsher, while every
+# observed false positive collapsed even at this quality.
+_REENCODE_QUALITY = 95
+
+
 def detect_trustmark(image_path: Path) -> str | None:
-    """Return a TrustMark scheme note if a TrustMark watermark is decoded, else None.
+    """Return a TrustMark scheme note if a *durable* TrustMark watermark is
+    decoded, else None.

    Returns e.g. ``"Adobe TrustMark (variant P, schema 0)"`` when the decoder
-    reports the watermark present, or None if it is absent, the optional
-    ``trustmark`` package is not installed, or the image cannot be read/decoded.
+    reports the watermark present AND it survives a mild JPEG re-encode, or None
+    if it is absent, the optional ``trustmark`` package is not installed, or the
+    image cannot be read/decoded.
+
+    **False-positive gate.** TrustMark's ``wm_present`` flag is a BCH
+    error-correction validity check, which spuriously validates on a small
+    fraction of un-watermarked images -- content-correlated, so AI-generated
+    textures trip it more often than camera photos (verified 2026-05-29 on real
+    files: the false "detections" were on Gemini / OpenAI / Doubao output that
+    cannot carry Adobe's watermark, and decoded a random-bytes secret). A genuine
+    TrustMark is a *durable* soft binding engineered to survive re-encoding (that
+    is its entire purpose once C2PA is stripped), so we re-decode after a mild
+    JPEG round-trip and require the same schema both times. Every observed false
+    positive collapsed under this gate.
    """
    if not is_available():
        return None
@@ -65,8 +84,30 @@ def detect_trustmark(image_path: Path) -> str | None:

        with Image.open(image_path) as img:
            cover = img.convert("RGB")
-        _wm_secret, wm_present, wm_schema = _decoder().decode(cover)
+        decoder = _decoder()
+        _wm_secret, wm_present, wm_schema = decoder.decode(cover)
+        if not wm_present:
+            return None
+        if not _survives_reencode(decoder, cover, wm_schema):
+            log.debug("TrustMark decode for %s did not survive re-encode; treating as false positive", image_path)
+            return None
    except Exception as exc:  # model download / decode failure / unreadable image
        log.debug("TrustMark decode failed for %s: %s", image_path, exc)
        return None
-    return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})" if wm_present else None
+    return f"Adobe TrustMark (variant {_MODEL_TYPE}, schema {wm_schema})"
+
+
+def _survives_reencode(decoder: Any, cover: Any, schema: int) -> bool:
+    """True if the watermark re-decodes with the same schema after a mild JPEG
+    round-trip -- the durability a genuine TrustMark guarantees, which a BCH
+    false positive (content noise) does not."""
+    import io
+
+    from PIL import Image
+
+    buffer = io.BytesIO()
+    cover.save(buffer, "JPEG", quality=_REENCODE_QUALITY)
+    buffer.seek(0)
+    with Image.open(buffer) as reencoded:
+        _secret, present, reencoded_schema = decoder.decode(reencoded.convert("RGB"))
+    return bool(present) and reencoded_schema == schema
@@ -0,0 +1,202 @@
+"""Registry of known visible watermarks.
+
+A single catalog that ties each known visible mark to (a) where it usually sits,
+(b) how to recognize it there, and (c) how to remove it. One pass over the
+registry detects every known mark in its usual place and removes the ones
+present.
+
+**Reverse-alpha only.** A known mark is a fixed semi-transparent overlay, so it
+is removed by inverting the alpha blend against a captured alpha map
+(``original = (wm - a*logo)/(1-a)``) -- exact recovery of the true pixels, not an
+inpaint guess. Detection is consistent with that: each mark is recognized by
+matching its known shape/template (the thing we invert), not by heuristics. A
+mark is therefore listed here only once a real alpha map has been captured for
+it; everything else (arbitrary logos/objects) is the user-directed
+``erase --region`` tool, not this catalog.
+
+Entries:
+  - ``gemini`` -- Google Gemini / Nano Banana sparkle, bottom-right.
+  - ``doubao`` -- ByteDance Doubao "豆包AI生成" text strip, bottom-right.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING, Any, Literal
+
+if TYPE_CHECKING:
+    from collections.abc import Callable
+
+    from numpy.typing import NDArray
+
+# cv2 method for the Gemini reverse-alpha edge-residual cleanup (not a standalone
+# remover): "ns" / "telea".
+InpaintMethod = Literal["telea", "ns"]
+Region = tuple[int, int, int, int]
+
+
+@dataclass(frozen=True)
+class MarkDetection:
+    """Uniform detection result for a known mark (across heterogeneous engines)."""
+
+    key: str
+    label: str
+    location: str
+    detected: bool
+    confidence: float
+    region: Region
+
+
+@dataclass(frozen=True)
+class KnownMark:
+    """A known visible watermark: where it lives, how to find and remove it."""
+
+    key: str
+    label: str
+    location: str  # usual place, human-readable ("bottom-right")
+    in_auto: bool  # participate in `--mark auto` scanning
+    recovery: str  # removal strategy (all reverse-alpha today)
+    _detect: Callable[[NDArray[Any]], MarkDetection]
+    _remove: Callable[..., tuple[NDArray[Any], Region | None]]
+
+    def detect(self, image: NDArray[Any]) -> MarkDetection:
+        return self._detect(image)
+
+    def remove(
+        self,
+        image: NDArray[Any],
+        *,
+        inpaint_method: InpaintMethod = "ns",
+        inpaint: bool = True,
+        inpaint_strength: float = 0.85,
+        force: bool = False,
+    ) -> tuple[NDArray[Any], Region | None]:
+        """Remove this mark by reverse-alpha; returns ``(result, cleared_region)``
+        (region for clearing alpha on save, or None if nothing was removed).
+
+        ``inpaint`` / ``inpaint_strength`` / ``inpaint_method`` tune the Gemini
+        reverse-alpha edge-residual cleanup only. ``force`` removes at the mark's
+        usual location even without a positive detection (the ``--no-detect`` path).
+        """
+        return self._remove(image, inpaint_method, inpaint, inpaint_strength, force)
+
+
+# Gemini-sparkle confidence above which the registry treats it as a confident
+# detection for arbitration. Matches identify's corpus-validated sparkle
+# threshold (0.5): the gemini engine's own detect flag uses a looser internal
+# threshold and weakly fires (~0.36) on unrelated bottom-right text (e.g. the
+# Doubao mark), which would otherwise let it hijack `--mark auto`. 0.5 gives 0
+# false positives on the corpus.
+_GEMINI_AUTO_MIN_CONF = 0.5
+
+# ── Engine adapters (lazy singletons; engines are cv2-only, no model load) ──
+
+_engines: dict[str, Any] = {}
+
+
+def _engine(key: str) -> Any:
+    if key not in _engines:
+        if key == "gemini":
+            from remove_ai_watermarks.gemini_engine import GeminiEngine
+
+            _engines[key] = GeminiEngine()
+        elif key == "doubao":
+            from remove_ai_watermarks.doubao_engine import DoubaoEngine
+
+            _engines[key] = DoubaoEngine()
+        else:  # pragma: no cover - guarded by the registry keys
+            raise KeyError(key)
+    return _engines[key]
+
+
+def _gemini_detect(image: NDArray[Any]) -> MarkDetection:
+    d = _engine("gemini").detect_watermark(image)
+    detected = bool(d.detected) and d.confidence >= _GEMINI_AUTO_MIN_CONF
+    return MarkDetection("gemini", "Google Gemini sparkle", "bottom-right", detected, d.confidence, d.region)
+
+
+def _gemini_remove(
+    image: NDArray[Any], inpaint_method: InpaintMethod, inpaint: bool, strength: float, force: bool
+) -> tuple[NDArray[Any], Region | None]:
+    engine = _engine("gemini")
+    det = engine.detect_watermark(image)
+    if not det.detected:
+        if not force:
+            return image.copy(), None
+        # Forced (--no-detect): remove at the default sparkle slot for the size.
+        from remove_ai_watermarks.gemini_engine import get_watermark_config
+
+        h, w = image.shape[:2]
+        cfg = get_watermark_config(w, h)
+        px, py = cfg.get_position(w, h)
+        region = (px, py, cfg.logo_size, cfg.logo_size)
+        result = engine.remove_watermark_custom(image, region)
+        if inpaint:
+            result = engine.inpaint_residual(result, region, strength=strength, method=inpaint_method)
+        return result, region
+    result = engine.remove_watermark(image)
+    # Reverse-alpha leaves a faint residual at the sparkle edge; the engine's
+    # own residual inpaint cleans that seam (part of its reverse-alpha pipeline).
+    if inpaint:
+        result = engine.inpaint_residual(result, det.region, strength=strength, method=inpaint_method)
+    return result, det.region
+
+
+def _doubao_detect(image: NDArray[Any]) -> MarkDetection:
+    d = _engine("doubao").detect(image)
+    return MarkDetection("doubao", "Doubao 豆包AI生成 text", "bottom-right", d.detected, d.confidence, d.region)
+
+
+def _doubao_remove(
+    image: NDArray[Any], _inpaint_method: InpaintMethod, _inpaint: bool, _strength: float, force: bool
+) -> tuple[NDArray[Any], Region | None]:
+    # Reverse-alpha only: apply when the mark is present AND the resolution is in
+    # the alpha map's calibrated band. Outside it we do NOT inpaint (no
+    # hallucination) -- removal is skipped until a capture for that resolution.
+    engine = _engine("doubao")
+    det = engine.detect(image)
+    if (det.detected or force) and engine.reverse_alpha_available(image):
+        return engine.remove_watermark_reverse_alpha(image), (det.region if det.detected else None)
+    return image.copy(), None
+
+
+_REGISTRY: tuple[KnownMark, ...] = (
+    KnownMark("gemini", "Google Gemini sparkle", "bottom-right", True, "reverse-alpha", _gemini_detect, _gemini_remove),
+    KnownMark(
+        "doubao", "Doubao 豆包AI生成 text", "bottom-right", True, "reverse-alpha", _doubao_detect, _doubao_remove
+    ),
+)
+
+
+def known_marks() -> tuple[KnownMark, ...]:
+    """All registered known visible watermarks."""
+    return _REGISTRY
+
+
+def mark_keys() -> list[str]:
+    """Keys of all registered marks (for CLI choices)."""
+    return [m.key for m in _REGISTRY]
+
+
+def get_mark(key: str) -> KnownMark:
+    """Look up a known mark by key (raises KeyError if unknown)."""
+    for m in _REGISTRY:
+        if m.key == key:
+            return m
+    raise KeyError(key)
+
+
+def detect_marks(image: NDArray[Any], *, include_explicit: bool = True) -> list[MarkDetection]:
+    """Detect every known mark in its usual place.
+
+    Returns one MarkDetection per scanned mark (``detected`` flags which fired).
+    ``include_explicit=False`` scans only the ``in_auto`` marks -- the set used
+    by ``--mark auto``.
+    """
+    return [m.detect(image) for m in _REGISTRY if include_explicit or m.in_auto]
+
+
+def best_auto_mark(image: NDArray[Any]) -> MarkDetection | None:
+    """The highest-confidence detected ``in_auto`` mark, or None if none fired."""
+    fired = [d for d in detect_marks(image, include_explicit=False) if d.detected]
+    return max(fired, key=lambda d: d.confidence) if fired else None
@@ -1,4 +1,4 @@
-"""Tests for the Doubao visible-watermark engine."""
+"""Tests for the Doubao visible-watermark engine (reverse-alpha only)."""

 from __future__ import annotations

@@ -8,91 +8,156 @@ import cv2
 import numpy as np
 import pytest

-from remove_ai_watermarks.doubao_engine import DoubaoEngine, load_image_bgr
+from remove_ai_watermarks.doubao_engine import (
+    _ALPHA_HEIGHT_FRAC,
+    _ALPHA_LOGO_BGR,
+    _ALPHA_MARGIN_BOTTOM_FRAC,
+    _ALPHA_MARGIN_RIGHT_FRAC,
+    _ALPHA_NATIVE_WIDTH,
+    _ALPHA_WIDTH_FRAC,
+    DETECT_NCC_THRESHOLD,
+    DoubaoEngine,
+    _alpha_template,
+    _glyph_silhouette,
+    _template_match_score,
+    load_image_bgr,
+)

 SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png"


-# ── Locate ──────────────────────────────────────────────────────────
-
-
 class TestLocate:
    def test_box_anchored_bottom_right(self):
        eng = DoubaoEngine()
        img = np.zeros((2048, 2048, 3), np.uint8)
        loc = eng.locate(img)
-        # right and bottom edges sit close to the image corner (within margins)
        assert 2048 - (loc.x + loc.w) < int(2048 * 0.03)
        assert 2048 - (loc.y + loc.h) < int(2048 * 0.03)
-        assert loc.is_fallback  # geometry anchor, no bundled template yet

    def test_box_scales_with_width(self):
        eng = DoubaoEngine()
        small = eng.locate(np.zeros((1024, 1024, 3), np.uint8))
        large = eng.locate(np.zeros((2048, 2048, 3), np.uint8))
-        # width-relative geometry: 2x wider image -> ~2x wider box
        assert large.w == pytest.approx(small.w * 2, rel=0.1)


-# ── Detect + remove on the real sample ──────────────────────────────
+# ── Detection: alpha-template NCC ───────────────────────────────────
+
+
+class TestDetect:
+    def test_clean_gradient_not_detected(self):
+        eng = DoubaoEngine()
+        ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
+        img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
+        assert not eng.detect(img).detected
+
+    def test_solid_blob_corner_not_detected(self):
+        """A bright blob is not the glyph shape -> low correlation, not detected."""
+        eng = DoubaoEngine()
+        img = np.zeros((1024, 1024, 3), np.uint8)
+        x, y, bw, bh = eng.locate(img).bbox
+        img[y + bh // 4 : y + bh * 3 // 4, x : x + bw // 2] = 200
+        assert not eng.detect(img).detected
+
+    def test_silhouette_loads(self):
+        sil = _glyph_silhouette()
+        assert sil is not None
+        assert set(np.unique(sil)).issubset({0, 255})
+
+    def test_match_score_shape_sensitive(self):
+        """The glyph silhouette correlates with itself, not with a filled block."""
+        sil = _glyph_silhouette()
+        h, w = sil.shape
+        # box that contains the silhouette -> high score
+        box = np.zeros((h + 8, int(w / _ALPHA_WIDTH_FRAC * 0.2) + w), np.uint8)
+        box[4 : 4 + h, 4 : 4 + w] = sil
+        assert _template_match_score(box, _ALPHA_NATIVE_WIDTH) >= DETECT_NCC_THRESHOLD
+        # a uniformly filled box has no glyph structure -> low score
+        solid = np.full_like(box, 255)
+        assert _template_match_score(solid, _ALPHA_NATIVE_WIDTH) < DETECT_NCC_THRESHOLD


@pytest.mark.skipif(not SAMPLE.exists(), reason="sample image not present")
 class TestRealSample:
    def test_detects_watermark(self):
-        eng = DoubaoEngine()
-        det = eng.detect(load_image_bgr(SAMPLE))
+        det = DoubaoEngine().detect(load_image_bgr(SAMPLE))
        assert det.detected
-        assert det.confidence > 0.0
-        assert det.coverage > 0.04
+        assert det.confidence >= DETECT_NCC_THRESHOLD

-    def test_remove_reduces_glyph_coverage(self):
+    def test_reverse_alpha_removes_mark(self):
        eng = DoubaoEngine()
        img = load_image_bgr(SAMPLE)
-        before = eng.detect(img).coverage
-        out = eng.remove_watermark(img)
-        after = eng.detect(out).coverage
-        # the inpaint should clear most glyph pixels from the corner box
-        assert after < before * 0.5
+        assert eng.reverse_alpha_available(img)  # sample is at the captured width
+        out = eng.remove_watermark_reverse_alpha(img)
+        assert not eng.detect(out).detected  # mark gone after recovery

-    def test_pixels_outside_box_untouched(self):
+    def test_far_region_untouched(self):
        eng = DoubaoEngine()
        img = load_image_bgr(SAMPLE)
-        out = eng.remove_watermark(img)
-        # top-left quadrant is far from the bottom-right mark: must be identical
+        out = eng.remove_watermark_reverse_alpha(img)
        h, w = img.shape[:2]
        assert np.array_equal(img[: h // 2, : w // 2], out[: h // 2, : w // 2])


-# ── Negative + safety guard ─────────────────────────────────────────
+# ── Reverse-alpha (exact recovery) ──────────────────────────────────


-class TestNegativeAndGuard:
-    def test_clean_image_not_detected(self):
+class TestReverseAlpha:
+    def test_alpha_asset_loads(self):
+        at = _alpha_template()
+        assert at is not None
+        assert at.dtype.kind == "f"
+        assert float(at.min()) >= 0.0
+        assert float(at.max()) <= 1.0
+
+    def test_available_whenever_asset_present(self):
+        # NCC alignment generalizes to any resolution, so availability is just
+        # "asset loadable" (any non-empty image); the caller gates on detect.
        eng = DoubaoEngine()
-        # smooth gradient, no watermark
-        ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
-        img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
-        det = eng.detect(img)
-        assert not det.detected
+        assert eng.reverse_alpha_available(np.zeros((1024, 1024, 3), np.uint8))
+        assert eng.reverse_alpha_available(np.zeros((1773, 1535, 3), np.uint8))
+        assert not eng.reverse_alpha_available(np.zeros((0, 0, 3), np.uint8))

-    def test_clean_image_returned_unchanged(self):
-        eng = DoubaoEngine()
-        ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.uint8), (1024, 1))
-        img = cv2.cvtColor(ramp, cv2.COLOR_GRAY2BGR)
-        out = eng.remove_watermark(img)
-        assert np.array_equal(img, out)
+    @staticmethod
+    def _compose(w: int, h: int, bg: float = 100.0):
+        """Composite the real alpha (scaled to width ``w``) onto a flat bg.
+        Returns ``(watermarked_uint8, mark_bool_mask)``."""
+        img = np.full((h, w, 3), bg, np.float32)
+        at = _alpha_template()
+        gw, gh = int(_ALPHA_WIDTH_FRAC * w), int(_ALPHA_HEIGHT_FRAC * w)
+        ax = w - int(_ALPHA_MARGIN_RIGHT_FRAC * w) - gw
+        ay = h - int(_ALPHA_MARGIN_BOTTOM_FRAC * w) - gh
+        amap = np.zeros((h, w), np.float32)
+        amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh))
+        a3 = amap[:, :, None]
+        wm = (a3 * np.array(_ALPHA_LOGO_BGR, np.float32) + (1 - a3) * img).clip(0, 255).astype(np.uint8)
+        return wm, amap > 0.2

-    def test_document_background_guard(self):
-        """A dense high-frequency corner (document-like) trips the coverage
-        guard, so the image is left untouched rather than smeared."""
+    def test_native_returns_exact_reverse_alpha_no_inpaint(self):
+        """At native width the recovery is exact, so it must be returned untouched
+        -- inpainting over exactly-recovered interior pixels degrades quality
+        (regression: native textured error 1.6 reverse-alpha-only vs 2.6 with the
+        old full-footprint inpaint). The output must equal pure reverse-alpha."""
        eng = DoubaoEngine()
-        rng = np.random.default_rng(0)
-        img = np.full((1024, 1024, 3), 255, np.uint8)
-        # fill the bottom-right box area with random grayish text-like noise
-        loc = eng.locate(img)
-        x, y, bw, bh = loc.bbox
-        noise = rng.integers(150, 246, size=(bh, bw), dtype=np.uint8)
-        img[y : y + bh, x : x + bw] = noise[:, :, None]
-        out = eng.remove_watermark(img)
-        assert np.array_equal(img, out)
+        wm, _mark = self._compose(_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH)
+        out = eng.remove_watermark_reverse_alpha(wm)
+        amap = eng._fixed_alpha_map(wm)
+        assert amap is not None
+        expected = eng._apply_reverse_alpha(wm, amap[0])
+        assert np.array_equal(out, expected)  # no inpaint touched the recovery
+
+    @pytest.mark.parametrize(
+        ("w", "h", "max_err"),
+        [
+            (_ALPHA_NATIVE_WIDTH, _ALPHA_NATIVE_WIDTH, 5.0),  # native 1:1 -> fixed geometry, ~exact
+            (1773, 2364, 8.0),  # 3:4 portrait -> NCC alignment generalizes the single capture
+        ],
+    )
+    def test_recovers_flat_background(self, w, h, max_err):
+        """Recovers the flat background at native (fixed geometry, exact) AND a
+        non-native resolution (NCC alignment generalizes the single capture)."""
+        eng = DoubaoEngine()
+        wm, mark = self._compose(w, h)
+        assert float(np.abs(wm.astype(np.float32)[mark] - 100.0).mean()) > 15  # mark visible
+        out = eng.remove_watermark_reverse_alpha(wm).astype(np.float32)
+        assert float(np.abs(out[mark] - 100.0).mean()) < max_err
@@ -113,6 +113,18 @@ class TestIdentifyNonPng:
        r = identify(path, check_visible=False)
        assert any("SynthID" in w for w in r.watermarks)

+    def test_black_forest_labs_flux_attributed(self, tmp_path: Path):
+        path = self._c2pa_jpeg(tmp_path, b"Black Forest Labs API ... trainedAlgorithmicMedia")
+        r = identify(path, check_visible=False, check_invisible=False)
+        assert r.is_ai_generated is True
+        assert r.platform == "Black Forest Labs (FLUX)"
+
+    def test_bytedance_volcengine_attributed(self, tmp_path: Path):
+        path = self._c2pa_jpeg(tmp_path, b"certificate_center@volcengine.com ... trainedAlgorithmicMedia")
+        r = identify(path, check_visible=False, check_invisible=False)
+        assert r.is_ai_generated is True
+        assert "ByteDance" in (r.platform or "")
+
    def test_stability_ai_issuer_attributed_no_synthid(self, tmp_path: Path):
        path = self._c2pa_jpeg(tmp_path, b"Stability AI ... trainedAlgorithmicMedia")
        r = identify(path, check_visible=False)
@@ -132,6 +144,50 @@ class TestIdentifyNonPng:
        assert not any("SynthID" in w for w in r.watermarks)


+class TestIdentifySamsungGalaxy:
+    """Samsung Galaxy / ASUS Gallery C2PA signers (verified on real signed files
+    2026-05-29; synthetic byte blobs here since the originals are private).
+
+    Galaxy AI edits stamp BOTH the device cert AND an AI source-type / genAIType,
+    so the signer attribution must NOT trip the camera-vs-AI integrity clash.
+    """
+
+    def _jpeg(self, tmp_path: Path, name: str, blob: bytes) -> Path:
+        path = tmp_path / name
+        path.write_bytes(b"\xff\xd8\xff\xe1jumbc2pa" + blob + b"\xff\xd9")
+        return path
+
+    def test_galaxy_trained_source_is_high_ai(self, tmp_path: Path):
+        path = self._jpeg(tmp_path, "s25.jpg", b"Samsung Galaxy Galaxy S25 c2pa-rs trainedAlgorithmicMedia")
+        r = identify(path, check_visible=False, check_invisible=False)
+        assert r.is_ai_generated is True
+        assert r.confidence == "high"
+        assert r.platform == "Samsung Galaxy (C2PA)"
+        assert r.integrity_clashes == []  # device cert + AI source-type is legitimate, not a clash
+
+    def test_galaxy_genai_only_is_medium_ai(self, tmp_path: Path):
+        # The Galaxy S24 case: no trainedAlgorithmicMedia, genAIType is the only
+        # AI marker -- previously missed, now a medium-confidence verdict.
+        path = self._jpeg(
+            tmp_path, "s24.jpg", b'Samsung Galaxy Galaxy S24 c2pa-rs PhotoEditor_Re_Edit_Data{"genAIType":1}'
+        )
+        r = identify(path, check_visible=False, check_invisible=False)
+        assert r.is_ai_generated is True
+        assert r.confidence == "medium"
+        assert r.platform == "Samsung Galaxy (C2PA)"
+        assert any(s.name == "samsung_genai" for s in r.signals)
+        assert r.integrity_clashes == []
+
+    def test_asus_gallery_signer_not_ai(self, tmp_path: Path):
+        # ASUS Gallery signs edited photos; no AI source-type or genAIType, so the
+        # platform is attributed but the verdict stays unknown.
+        path = self._jpeg(tmp_path, "asus.jpg", b"/com.asus.gallery/3.8.0.98 c2pa-rs no ai marker")
+        r = identify(path, check_visible=False, check_invisible=False)
+        assert r.is_ai_generated is None
+        assert r.platform == "ASUS Gallery (C2PA signer)"
+        assert any("C2PA" in w for w in r.watermarks)
+
+
 # ── End-to-end verdicts on real fixtures ────────────────────────────


@@ -12,12 +12,15 @@ from PIL import Image
 from PIL.PngImagePlugin import PngInfo

 from remove_ai_watermarks.metadata import (
+    C2PA_UUID,
    _is_ai_key,
+    c2pa_marker_in,
    exif_generator,
    get_ai_metadata,
    has_ai_metadata,
    iptc_ai_system,
    remove_ai_metadata,
+    samsung_genai,
    synthid_source,
    xai_signature,
 )
@@ -135,6 +138,71 @@ class TestHasAiMetadata:
        assert has_ai_metadata(path)


+class TestC2paMarkerIn:
+    """The C2PA presence check requires a JUMBF wrapper or the C2PA uuid box, so
+    a bare 4-byte ``c2pa`` substring (e.g. random compressed pixel data) does not
+    false-positive -- the regression behind 4 cleaned PNGs re-flagging C2PA."""
+
+    def test_jumbf_wrapped_c2pa_detected(self):
+        assert c2pa_marker_in(b"....jumbc2pa....manifest....") is True
+
+    def test_c2pa_uuid_box_detected(self):
+        assert c2pa_marker_in(b"\x00\x00\x00\x18uuid" + C2PA_UUID + b"payload") is True
+
+    def test_bare_c2pa_substring_not_detected(self):
+        # The exact false positive: "c2pa" appears in noise but no JUMBF/uuid box.
+        assert c2pa_marker_in(b"\x9c\xc3\xa7B1\x11c2pa\x80b\x804\xc5\xf9random idat") is False
+
+    def test_jumb_without_c2pa_not_detected(self):
+        assert c2pa_marker_in(b"some jumb box but no manifest label") is False
+
+    def test_empty_not_detected(self):
+        assert c2pa_marker_in(b"") is False
+
+
+class TestSamsungGenai:
+    """Samsung Galaxy AI editing marker (genAIType in PhotoEditor_Re_Edit_Data).
+
+    Synthetic byte blobs -- real Galaxy files are user content and not shipped
+    (public repo), same discipline as the Grok/Doubao fixtures.
+    """
+
+    @staticmethod
+    def _samsung_jpeg(tmp_path: Path, name: str, payload: bytes) -> Path:
+        path = tmp_path / name
+        path.write_bytes(b"\xff\xd8\xff\xe1" + payload + b"\xff\xd9")
+        return path
+
+    def test_nonzero_genai_type_detected(self, tmp_path: Path):
+        p = self._samsung_jpeg(
+            tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"connectorType":"srvg","genAIType":1}'
+        )
+        assert samsung_genai(p) == 1
+
+    def test_other_nonzero_value_detected(self, tmp_path: Path):
+        p = self._samsung_jpeg(tmp_path, "galaxy5.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":5}')
+        assert samsung_genai(p) == 5
+
+    def test_zero_genai_type_is_none(self, tmp_path: Path):
+        """genAIType:0 means no generative AI was used -- not a positive signal."""
+        p = self._samsung_jpeg(tmp_path, "edit.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":0}')
+        assert samsung_genai(p) is None
+
+    def test_genai_without_editor_container_ignored(self, tmp_path: Path):
+        """An incidental genAIType token outside Samsung's editor JSON is ignored."""
+        p = self._samsung_jpeg(tmp_path, "stray.jpg", b'some other blob "genAIType":1 elsewhere')
+        assert samsung_genai(p) is None
+
+    def test_clean_image_is_none(self, tmp_clean_png):
+        assert samsung_genai(tmp_clean_png) is None
+
+    def test_surfaced_in_get_ai_metadata(self, tmp_path: Path):
+        p = self._samsung_jpeg(tmp_path, "galaxy.jpg", b'PhotoEditor_Re_Edit_Data{"genAIType":1}')
+        meta = get_ai_metadata(p)
+        assert "samsung_genai" in meta
+        assert "genAIType=1" in meta["samsung_genai"]
+
+
 class TestGetAiMetadata:
    """Tests for extracting AI metadata."""

@@ -12,12 +12,28 @@ from typing import TYPE_CHECKING

 import pytest

+from remove_ai_watermarks import trustmark_detector
 from remove_ai_watermarks.trustmark_detector import detect_trustmark, is_available

 if TYPE_CHECKING:
    from pathlib import Path


+class _FakeDecoder:
+    """A TrustMark decoder whose successive ``decode`` calls return scripted
+    ``(secret, present, schema)`` tuples -- the first for the original image, the
+    second for the re-encoded copy used by the false-positive durability gate."""
+
+    def __init__(self, *results: tuple[bytes, bool, int]):
+        self._results = list(results)
+        self.calls = 0
+
+    def decode(self, _img: object) -> tuple[bytes, bool, int]:
+        result = self._results[min(self.calls, len(self._results) - 1)]
+        self.calls += 1
+        return result
+
+
 def test_detect_never_raises(tmp_clean_png: Path):
    # Whether or not trustmark is installed, a clean image must yield None
    # (no watermark) without raising. When absent, the import guard returns None.
@@ -34,3 +50,40 @@ def test_unreadable_file_returns_none(tmp_path: Path):
 def test_clean_image_reports_no_watermark(tmp_clean_png: Path):
    # With the decoder present, an un-watermarked image must report absent.
    assert detect_trustmark(tmp_clean_png) is None
+
+
+class TestFalsePositiveGate:
+    """The re-encode durability gate keeps real (durable) TrustMarks and drops
+    BCH false positives that collapse under a mild JPEG round-trip."""
+
+    @pytest.fixture(autouse=True)
+    def _force_available(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setattr(trustmark_detector, "is_available", lambda: True)
+
+    def _patch_decoder(self, monkeypatch: pytest.MonkeyPatch, decoder: _FakeDecoder) -> None:
+        monkeypatch.setattr(trustmark_detector, "_decoder", lambda: decoder)
+
+    def test_durable_watermark_survives_and_is_reported(self, monkeypatch, tmp_clean_png: Path):
+        decoder = _FakeDecoder((b"secret", True, 2), (b"secret", True, 2))
+        self._patch_decoder(monkeypatch, decoder)
+        result = detect_trustmark(tmp_clean_png)
+        assert result == "Adobe TrustMark (variant P, schema 2)"
+        assert decoder.calls == 2  # original + re-encode
+
+    def test_false_positive_collapsing_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path):
+        # Present on the original, absent after re-encode -> content-noise FP.
+        decoder = _FakeDecoder((b"\x00\x01", True, 3), (b"", False, -1))
+        self._patch_decoder(monkeypatch, decoder)
+        assert detect_trustmark(tmp_clean_png) is None
+
+    def test_schema_drift_on_reencode_is_dropped(self, monkeypatch, tmp_clean_png: Path):
+        # Present both times but the schema changes -> not a stable watermark.
+        decoder = _FakeDecoder((b"\x00", True, 2), (b"\x00", True, 3))
+        self._patch_decoder(monkeypatch, decoder)
+        assert detect_trustmark(tmp_clean_png) is None
+
+    def test_absent_skips_reencode(self, monkeypatch, tmp_clean_png: Path):
+        decoder = _FakeDecoder((b"", False, -1))
+        self._patch_decoder(monkeypatch, decoder)
+        assert detect_trustmark(tmp_clean_png) is None
+        assert decoder.calls == 1  # no second decode when the first is absent
@@ -0,0 +1,70 @@
+"""Tests for the known-visible-watermark registry (reverse-alpha only)."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import pytest
+
+from remove_ai_watermarks import watermark_registry as reg
+
+DOUBAO_SAMPLE = Path(__file__).resolve().parents[1] / "data" / "samples" / "doubao-1.png"
+
+
+class TestCatalog:
+    def test_keys(self):
+        assert reg.mark_keys() == ["gemini", "doubao"]
+
+    def test_all_in_auto(self):
+        assert all(m.in_auto for m in reg.known_marks())
+
+    def test_recovery_is_reverse_alpha(self):
+        # Every catalogued mark is removed by exact reverse-alpha (no inpaint).
+        assert all(m.recovery == "reverse-alpha" for m in reg.known_marks())
+
+    def test_locations(self):
+        by_key = {m.key: m for m in reg.known_marks()}
+        assert by_key["gemini"].location == "bottom-right"
+        assert by_key["doubao"].location == "bottom-right"
+
+    def test_get_mark_unknown_raises(self):
+        with pytest.raises(KeyError):
+            reg.get_mark("nope")
+
+
+class TestScan:
+    def test_detect_marks_scans_all(self):
+        img = np.zeros((256, 256, 3), np.uint8)
+        keys = {d.key for d in reg.detect_marks(img)}
+        assert keys == {"gemini", "doubao"}
+
+    def test_blank_image_no_auto_mark(self):
+        assert reg.best_auto_mark(np.zeros((256, 256, 3), np.uint8)) is None
+
+
+@pytest.mark.skipif(not DOUBAO_SAMPLE.exists(), reason="doubao sample not present")
+class TestRealSample:
+    def test_doubao_sample_wins_auto(self):
+        from remove_ai_watermarks.image_io import imread
+
+        best = reg.best_auto_mark(imread(DOUBAO_SAMPLE))
+        assert best is not None
+        assert best.key == "doubao"
+
+    def test_doubao_remove_returns_region(self):
+        from remove_ai_watermarks.image_io import imread
+
+        img = imread(DOUBAO_SAMPLE)  # 2048 wide -> reverse-alpha applies
+        result, region = reg.get_mark("doubao").remove(img)
+        assert region is not None
+        assert result.shape == img.shape
+
+
+class TestReverseAlphaOnly:
+    def test_doubao_off_resolution_is_skipped(self):
+        # No alpha capture for this width -> no inpaint fallback, image untouched.
+        img = np.zeros((512, 512, 3), np.uint8)
+        result, region = reg.get_mark("doubao").remove(img)
+        assert region is None
+        assert np.array_equal(result, img)