From 89f427852f7ecaa92b8812770277c291b4c0e20b Mon Sep 17 00:00:00 2001
From: Victor Kuznetsov <kuznetsov.va@gmail.com>
Date: Sat, 30 May 2026 12:27:37 -0700
Subject: [PATCH] Fix #30 white box: stop zeroing alpha in the watermark region
 on save

On RGBA inputs the CLI forced the watermark bbox alpha to 0 on save, so the
removed-sparkle area became a transparent hole that renders as a solid white
box on any non-transparent viewer. The Gemini app exports opaque RGBA, so
every user hit it. Reverse-alpha already recovers the real pixels there (and
`erase` inpaints them), so there is no artifact to hide -- the hole was the
bug, introduced as an over-correction in d091b9f.

`_write_bgr_with_alpha` now rejoins the input alpha plane unchanged (drops the
`clear_region`/`pad` params); the `visible` / `erase` / `all` / `batch` call
sites drop the cleared-region argument and the orphaned region bookkeeping.
The registry `remove()` still returns the mark bbox (used for inpaint_residual
positioning); the CLI just no longer clears alpha with it.

Inverts the test that locked in the old behavior into a #30 regression guard
(watermark-region alpha stays opaque, no pixel forced transparent). Verified
end-to-end on a real Gemini RGBA export: sparkle gone, zero transparent
pixels, clean over a white background.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 CLAUDE.md                                     |  2 +-
 src/remove_ai_watermarks/cli.py               | 49 +++++++------------
 .../watermark_registry.py                     |  6 ++-
 tests/test_cli.py                             | 16 +++---
 4 files changed, 32 insertions(+), 41 deletions(-)
diff --git a/CLAUDE.md b/CLAUDE.md
index f85ebd8..7a912d6 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -34,7 +34,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r
 - `noai/constants.py` — PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, C2PA_ISSUERS, `SYNTHID_C2PA_ISSUERS` (issuers that pair SynthID with C2PA: Google, OpenAI), and `C2PA_SOFT_BINDINGS` (soft-binding `alg` prefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new issuer/binding here, not inline.
 - `metadata.py` — `scan_head(path, size=1MB)` is the shared input for every C2PA/AIGC/IPTC byte scan: first `size` bytes plus the payloads of any provenance metadata found beyond that window — for ISOBMFF, the late provenance boxes from `isobmff.scan_c2pa_region` (catches a manifest after a large `mdat`); for **PNG**, the late `tEXt`/`iTXt`/`zTXt`/`eXIf`/`iCCP` chunks from `_png_late_metadata` (catches an XMP/EXIF packet appended after a large `IDAT`, e.g. a TC260 AIGC label at ~2.7 MB). Behavior-neutral (`f.read(size)`) for non-ISOBMFF inputs and for any file that fits within `size`. Use it instead of `open().read(1MB)` for any new marker scan. `synthid_source(path)` returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). `get_ai_metadata` surfaces the verdict, and `metadata --check` prints it as a callout. Both `get_ai_metadata` and `has_ai_metadata` guard the PIL open with `except Exception` (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. `xai_signature(path)` detects xAI/Grok's EXIF-only scheme (`ImageDescription` = `Signature: <base64>` + UUID `Artist`); it feeds `has_ai_metadata`, `get_ai_metadata` (key `xai_signature`), and `identify`. `iptc_ai_system(path)` detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (`IPTC_AI_FIELD_MARKERS` = `AISystemUsed`/`AISystemVersionUsed`/`AIPromptInformation`/`AIPromptWriterName`) and returns the `AISystemUsed` generator name (or `"fields present"`). `remove_ai_metadata` routes **ISOBMFF video** (`.mp4`/`.mov`/`.m4v`) through the same `isobmff.strip_c2pa_boxes` as AVIF/HEIF (MP4 is ISOBMFF), and `_scrub_ai_exif` removes the xAI signature + AI-generator EXIF tags on JPEG output.
 - `identify.py` — `identify(path)` aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1 `AISystemUsed`, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via `metadata.xai_signature`, the China TC260 AIGC label via `metadata.aigc_label`, the HuggingFace `hf-job-id` job marker via `metadata.huggingface_job`, the Samsung Galaxy AI editing marker via `metadata.samsung_genai`, visible Gemini sparkle, open invisible watermark, Adobe TrustMark via `trustmark_detector`) into one `ProvenanceReport`. `is_ai_generated` is True or None (never asserted False — stripped metadata is not proof of clean origin). The `hf_job`, visible-sparkle, and Samsung `samsung_genai` signals are **medium** confidence: each lifts an otherwise-Unknown verdict to a tentative AI (`hf_only` / `visible_only` / `samsung_only`, parallel branches) but is excluded from the high-confidence `ai_from_metadata` set, so none overrides a hard metadata signal. Visible-sparkle is promoted only at confidence ≥ `_SPARKLE_THRESHOLD` (0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49). The cv2 dependency lives in `gemini_engine.detect_sparkle_confidence`, not here. **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`). **Why, verified on real signed files 2026-05-26:** the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. **Token distinctiveness is load-bearing:** bare `b"Truepic"` mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI `chatgpt-1.png` fixture), so the token is the specific `b"Truepic_Lens"` from the Lens SDK claim generator; likewise `b"Pixel Camera"` (cert CN) not bare `b"Pixel"`. `_DEVICE_C2PA_PLATFORM` lists ONLY tokens **verified against a real C2PA file**: Leica (`lc_c2pa`/`Leica Camera`), Nikon (`NIKON`), Pixel (`Pixel Camera` -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (`sony.sig`/`sony.cert` -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (`Truepic_Lens`). Canon/Bria have **no public direct-download C2PA sample** (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the `sony.*` namespace but are not separately verified. **Samsung Galaxy + ASUS Gallery live in a separate `_SIGNER_C2PA_PLATFORM` (scanned after `_device_platform`, before the issuer fallback), NOT in `_DEVICE_C2PA_PLATFORM`** — verified on real signed files 2026-05-29. Reason: a Galaxy phone stamps BOTH its device cert AND a `trainedAlgorithmicMedia`/genAIType AI marker on a Generative-Edit image, so treating it as a "genuine camera capture" would false-fire integrity-clash rule 2 on every Galaxy AI edit. The signer tokens (`b"Samsung Galaxy"` cert org — distinct from the EXIF `SM-xxxx` model string on ordinary Samsung photos; `b"com.asus.gallery"` claim generator) only resolve the platform label; the AI verdict still comes from the source-type / genAIType. ASUS Gallery is a C2PA-signed edit with no AI marker, so it attributes the platform without asserting `is_ai`. **Samsung's `genAIType` (in the proprietary `PhotoEditor_Re_Edit_Data` JSON) is an undocumented Galaxy-AI editing marker** (`metadata.samsung_genai`, gated on the `PhotoEditor_Re_Edit_Data` container; non-zero value = AI tool used, values {1,5} observed): medium-confidence because the field has no public spec (verified 2026-05-29: absent from C2PA spec + Samsung docs), but it co-occurred with `trainedAlgorithmicMedia` in 3/3 verified files that record a source-type and was the SOLE AI marker on a Galaxy S24 file that omits the source type. Camera C2PA marks capture authenticity, not AI (Pixel carries `computationalCapture`, not `trainedAlgorithmicMedia`), so these never set `is_ai` -- that stays driven by digital-source-type. `c2pa.cbor_text_after` (now public) is best-effort for the `generator` detail string only and can be None when the manifest keys it `claim_generator_info` (Pixel). **Issuer→generator mapping is `is_ai`-gated** (`_attribute_platform(issuers, is_ai=c2pa_is_ai)`): a specific AI-generator platform is named only when the digital-source-type is `trainedAlgorithmicMedia`; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an *unmapped* Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). `_attribute_platform` defaults `is_ai=True` so the mapping stays unit-testable in isolation. Add capture-camera tokens to `_DEVICE_C2PA_PLATFORM`, editing-app/AI-device signer tokens to `_SIGNER_C2PA_PLATFORM`, generator/issuer platforms to `_ISSUER_PLATFORM`, not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (`_issuers_in`) and generator (`_ai_tools_in`, reusing `C2PA_AI_TOOLS`) are recovered by binary-scanning the first MB. EXIF `Software` / `Make` / `Artist` / `ImageDescription` and XMP `CreatorTool` generator tags are read by `metadata.exif_generator` (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against `AI_GENERATOR_TOKENS` so ordinary editors (plain "Adobe Photoshop") and real-camera `Make` ("Apple"/"Canon") are not flagged. **Ideogram tags its output with EXIF `Make="Ideogram AI"`** (verified on a real download 2026-05-24) — that's why `Make` is read. **Integrity-clash detection** (`_integrity_clashes`, surfaced as `ProvenanceReport.integrity_clashes`, printed in red by `identify` and serialized to `--json`): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIF `Make="Ideogram AI"`), and (2) a camera-capture C2PA device (`_DEVICE_C2PA_PLATFORM`) coexisting with any AI-generation marker. Vendor normalization is `_vendor_of` over `_AI_VENDOR_TOKENS` (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). **High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`).
-- `watermark_registry.py` — **single catalog of known visible watermarks**, the unified "find known marks in their usual places, recognize, remove" entry. **Reverse-alpha only by policy**: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (`original = (wm - a*logo)/(1-a)`, exact recovery) — no inpaint/heuristic removal here (arbitrary-region inpainting lives in `region_eraser`/`erase`). Each `KnownMark` ties a key to {usual `location`, `in_auto` flag, `recovery` (="reverse-alpha"), a `detect` adapter → uniform `MarkDetection`, a `remove` adapter}. Entries today: `gemini` (bottom-right sparkle) and `doubao` (bottom-right "豆包AI生成"). `detect_marks` scans all; `best_auto_mark` picks the highest-confidence detection. **Cross-engine confidences aren't directly comparable**, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (`_GEMINI_AUTO_MIN_CONF`) for its `detected` flag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacks `auto`. `cli.cmd_visible` is registry-driven: `--mark auto` → `best_auto_mark`, `--mark <key>` → that mark; `--mark` choices come from `mark_keys()`. `_doubao_remove` applies reverse-alpha only when the mark is detected AND `reverse_alpha_available` (resolution in the alpha band); outside that, removal is **skipped** (not inpainted). Add a new visible mark = one `KnownMark` entry + its engine (with a captured alpha map); do not re-add per-mark `if` branches in the CLI.
+- `watermark_registry.py` — **single catalog of known visible watermarks**, the unified "find known marks in their usual places, recognize, remove" entry. **Reverse-alpha only by policy**: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (`original = (wm - a*logo)/(1-a)`, exact recovery) — no inpaint/heuristic removal here (arbitrary-region inpainting lives in `region_eraser`/`erase`). Each `KnownMark` ties a key to {usual `location`, `in_auto` flag, `recovery` (="reverse-alpha"), a `detect` adapter → uniform `MarkDetection`, a `remove` adapter}. Entries today: `gemini` (bottom-right sparkle) and `doubao` (bottom-right "豆包AI生成"). `detect_marks` scans all; `best_auto_mark` picks the highest-confidence detection. **Cross-engine confidences aren't directly comparable**, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (`_GEMINI_AUTO_MIN_CONF`) for its `detected` flag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacks `auto`. `cli.cmd_visible` is registry-driven: `--mark auto` → `best_auto_mark`, `--mark <key>` → that mark; `--mark` choices come from `mark_keys()`. `_doubao_remove` applies reverse-alpha only when the mark is detected AND `reverse_alpha_available` (resolution in the alpha band); outside that, removal is **skipped** (not inpainted). Add a new visible mark = one `KnownMark` entry + its engine (with a captured alpha map); do not re-add per-mark `if` branches in the CLI. **Alpha-on-save policy (issue #30):** `cli._write_bgr_with_alpha` rejoins the input's alpha plane **unchanged** — it must NOT zero alpha in the watermark bbox. Reverse-alpha (and `erase` inpaint) recover real pixels there, so zeroing alpha punched a transparent hole that renders as a solid **white box** on any non-transparent viewer (Gemini app exports are opaque RGBA, so every user hit it; regression-guarded by `test_visible_keeps_alpha_opaque_in_watermark_region`). The registry `remove()` still returns its region (used for `inpaint_residual` positioning), but the CLI no longer uses it to clear alpha.
 - `gemini_engine.py` — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). `detect_sparkle_confidence(path)` is the file-level entry point used by `identify.py`.
 - `doubao_engine.py` — visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU), **reverse-alpha only**. `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light low-saturation glyphs (the detection candidate). `detect` is **reverse-alpha-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`, the exact shape we invert) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage/structure heuristics) fixed #23: corpus FP fell to 7/1243 (0.6%); old coverage-only fired on ~28%. **Removal is exact reverse-alpha** (`remove_watermark_reverse_alpha`): `original = (wm - a*logo)/(1-a)` from the bundled alpha map + `_ALPHA_LOGO_BGR` (near-white ~253) + `_ALPHA_*_FRAC` geometry. The alpha map + logo were **solved from real black+gray Doubao captures** (`data/doubao_capture/captures/`, gitignored): on black `captured = a*logo`, the black/gray pair solves `a` per-pixel without assuming the logo colour (white capture cross-validates: mark → flat fill). The single captured alpha map (at width 2048) **generalizes to any resolution**: at (near) the captured width (`_ALPHA_NATIVE_BAND` of `_ALPHA_NATIVE_WIDTH`) `_fixed_alpha_map` places it by exact width-relative geometry (pixel-exact recovery, ~0.9 mean error — the whole point of reverse-alpha); off that width it **tries BOTH placements -- fixed geometry AND `_aligned_alpha_map`'s `TM_CCOEFF_NORMED` scale+position search (`_ALPHA_ALIGN_SEARCH`) -- and keeps whichever leaves the least residual mark** (re-`detect` confidence on the bare reverse-alpha). On a faint/busy-background mark the NCC peak wanders a few px and geometry wins; on a clear mark alignment wins -- no magic threshold, it just picks the better removal. Verified **56/56 real detected-Doubao removed clean across all corpus resolutions** (2048 fixed 27/27, 1773 22/22, plus 1185/1187/1535/1672); a single fixed-vs-aligned choice left 2/56 busy-background residuals, try-both fixed them. `reverse_alpha_available` is just "asset present"; the registry still gates removal on `detect` so a clean corner is never touched. **Residual inpaint is off-native-only:** at the captured width the fixed-geometry recovery is exact, so it is returned untouched -- inpainting over exactly-recovered interior pixels only swaps them for a cv2 hallucination (measured worse, native textured-bg error vs true bg **1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint**; regression-guarded by `test_native_returns_exact_reverse_alpha_no_inpaint`). Off-native the NCC alignment is only sub-pixel-approximate, so the interior is no longer exact and a residual inpaint over the glyph footprint cleans the seam (costs nothing there and reliably clears the mark). The shipped third-party `_refs/zhengsuanfa_doubao_alpha_120x20.png` is NOT a usable alpha (≈0.85 everywhere → blacks out on inversion; wrong resolution/version), verified 2026-05-29. There is no inpaint-based removal here (removed 2026-05-29; arbitrary-region inpainting is `region_eraser`/`erase`).
 - `region_eraser.py` — universal region eraser (`erase` CLI). `erase(image, boxes=|mask=, backend=)`: `boxes_to_mask` → `cv2.inpaint` (`cv2` backend, default, no deps) or big-LaMa via onnxruntime (`lama` backend, extra `lama`, `Carve/LaMa-ONNX` Apache-2.0 model downloaded on first use, never bundled). `erase_lama` crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy `_get_lama_session` singleton; `lama_available()` guards the optional import. **LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU** (FFC working set, not arena — `enable_cpu_mem_arena=False` does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal.
diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py
index b33643c..598db45 100644
--- a/src/remove_ai_watermarks/cli.py
+++ b/src/remove_ai_watermarks/cli.py
@@ -101,15 +101,15 @@ def _write_bgr_with_alpha(
     path: Path,
     bgr: NDArray[Any],
     alpha: NDArray[Any] | None,
-    clear_region: tuple[int, int, int, int] | None = None,
-    pad: int = 6,
 ) -> None:
     """Write BGR (with optional alpha) to ``path``.
 
-    When ``alpha`` is provided and the output extension supports it, writes a
-    4-channel image. If ``clear_region`` is given as ``(x, y, w, h)``, alpha is
-    forced to 0 inside that bbox (expanded by ``pad`` px) so the watermark area
-    becomes fully transparent in the saved file.
+    When ``alpha`` is provided and the output extension supports it, the original
+    alpha plane is rejoined unchanged. The watermark region is NOT made
+    transparent: reverse-alpha (and inpaint) recover real pixels there, so
+    zeroing alpha would punch a transparent hole that renders as a white box on
+    any non-transparent viewer (issue #30). Preserving the input alpha keeps
+    genuinely transparent backgrounds intact without inventing new holes.
     """
     import numpy as np
 
@@ -119,17 +119,7 @@ def _write_bgr_with_alpha(
         image_io.imwrite(path, bgr)
         return
 
-    alpha_out = alpha
-    if clear_region is not None:
-        alpha_out = alpha.copy()
-        x, y, w, h = clear_region
-        height, width = alpha.shape[:2]
-        x0, y0 = max(0, x - pad), max(0, y - pad)
-        x1, y1 = min(width, x + w + pad), min(height, y + h + pad)
-        if x1 > x0 and y1 > y0:
-            alpha_out[y0:y1, x0:x1] = 0
-
-    bgra = np.dstack([bgr, alpha_out])
+    bgra = np.dstack([bgr, alpha])
     image_io.imwrite(path, bgra)
 
 
@@ -246,7 +236,7 @@ def cmd_visible(
     method: Literal["telea", "ns"] = "ns" if inpaint_method == "ns" else "telea"
     t0 = time.monotonic()
     with console.status(f"[cyan]Removing {chosen.label}… ({chosen.recovery})[/]"):
-        result, region = chosen.remove(
+        result, _ = chosen.remove(
             image,
             inpaint_method=method,
             inpaint=inpaint,
@@ -255,9 +245,9 @@ def cmd_visible(
         )
     elapsed = time.monotonic() - t0
 
-    # Save (preserves transparency by clearing alpha in the watermark region)
+    # Save (rejoins the original alpha plane unchanged)
     output.parent.mkdir(parents=True, exist_ok=True)
-    _write_bgr_with_alpha(output, result, alpha, clear_region=region)
+    _write_bgr_with_alpha(output, result, alpha)
 
     # Strip metadata
     if strip_metadata:
@@ -349,8 +339,7 @@ def cmd_erase(
     elapsed = time.monotonic() - t0
 
     output.parent.mkdir(parents=True, exist_ok=True)
-    clear = boxes[0] if len(boxes) == 1 else None
-    _write_bgr_with_alpha(output, result, alpha, clear_region=clear)
+    _write_bgr_with_alpha(output, result, alpha)
 
     if strip_metadata:
         try:
@@ -695,7 +684,6 @@ def cmd_all(
         h, w = image.shape[:2]
         console.print(f"    [dim]Input:[/] {source.name}  ({w}x{h})")
 
-        region: tuple[int, int, int, int] | None = None
         with console.status("[cyan]Removing visible watermark…[/]"):
             det = engine.detect_watermark(image)
             if det.detected:
@@ -709,7 +697,7 @@ def cmd_all(
                 console.print("    [dim]Skipped (no visible watermark detected)[/]")
 
         # Save to temp file for invisible engine input (preserve alpha if present)
-        _write_bgr_with_alpha(tmp_path, result, alpha, clear_region=region)
+        _write_bgr_with_alpha(tmp_path, result, alpha)
 
         # ── Step 2: Invisible watermark ──────────────────────────────
         console.print("\n  [bold cyan]② Invisible watermark removal[/]")
@@ -761,14 +749,14 @@ def cmd_all(
 
         # ── Write final result ────────────────────────────────────────
         # The invisible step (and downstream cv2.IMREAD_COLOR paths) drops alpha,
-        # so re-attach the original alpha (with the watermark region cleared)
-        # when writing the final output for transparent formats.
+        # so re-attach the original alpha plane unchanged when writing the final
+        # output for transparent formats.
         output.parent.mkdir(parents=True, exist_ok=True)
         final_bgr, _ = _read_bgr_and_alpha(tmp_path)
         if final_bgr is None:
             console.print(f"[red]Error:[/] Failed to read intermediate file: {tmp_path}")
             raise SystemExit(1)
-        _write_bgr_with_alpha(output, final_bgr, alpha, clear_region=region)
+        _write_bgr_with_alpha(output, final_bgr, alpha)
 
     finally:
         # Clean up temp file if it still exists
@@ -808,7 +796,6 @@ def _process_batch_image(
         ValueError: If the image cannot be opened.
     """
     saved_alpha: NDArray[Any] | None = None
-    saved_region: tuple[int, int, int, int] | None = None
 
     if mode in ("visible", "all"):
         from remove_ai_watermarks.gemini_engine import GeminiEngine
@@ -823,7 +810,6 @@ def _process_batch_image(
         if image is None:
             raise ValueError("Failed to read image")
 
-        region: tuple[int, int, int, int] | None = None
         det = engine.detect_watermark(image)
         if det.detected:
             result = engine.remove_watermark(image)
@@ -834,9 +820,8 @@ def _process_batch_image(
         else:
             result = image.copy()
 
-        _write_bgr_with_alpha(out_path, result, alpha, clear_region=region)
+        _write_bgr_with_alpha(out_path, result, alpha)
         saved_alpha = alpha
-        saved_region = region
 
     if mode in ("invisible", "all"):
         from remove_ai_watermarks.invisible_engine import (
@@ -873,7 +858,7 @@ def _process_batch_image(
     if mode == "all" and saved_alpha is not None:
         final_bgr, _ = _read_bgr_and_alpha(out_path)
         if final_bgr is not None:
-            _write_bgr_with_alpha(out_path, final_bgr, saved_alpha, clear_region=saved_region)
+            _write_bgr_with_alpha(out_path, final_bgr, saved_alpha)
 
 
 @main.command("batch")
diff --git a/src/remove_ai_watermarks/watermark_registry.py b/src/remove_ai_watermarks/watermark_registry.py
index 7fb130d..a7cb6ce 100644
--- a/src/remove_ai_watermarks/watermark_registry.py
+++ b/src/remove_ai_watermarks/watermark_registry.py
@@ -71,8 +71,10 @@ class KnownMark:
         inpaint_strength: float = 0.85,
         force: bool = False,
     ) -> tuple[NDArray[Any], Region | None]:
-        """Remove this mark by reverse-alpha; returns ``(result, cleared_region)``
-        (region for clearing alpha on save, or None if nothing was removed).
+        """Remove this mark by reverse-alpha; returns ``(result, region)`` where
+        ``region`` is the removed mark's bbox (for residual-inpaint positioning),
+        or None if nothing was removed. NB: the CLI does NOT use ``region`` to
+        clear alpha on save -- that zeroing caused the issue-#30 white box.
 
         ``inpaint`` / ``inpaint_strength`` / ``inpaint_method`` tune the Gemini
         reverse-alpha edge-residual cleanup only. ``force`` removes at the mark's
diff --git a/tests/test_cli.py b/tests/test_cli.py
index 3a53c62..08e0339 100644
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -198,15 +198,17 @@ class TestVisibleCommand:
         # which doesn't overlap the centre square at 200x200).
         assert out[100, 100, 3] == 255
 
-    def test_visible_clears_alpha_in_watermark_region(self, runner, tmp_path):
-        """When inpainting an RGBA image, the watermark region must be cleared
-        in the alpha channel so the sparkle area becomes transparent, not opaque-black.
+    def test_visible_keeps_alpha_opaque_in_watermark_region(self, runner, tmp_path):
+        """Regression for issue #30 (white box): on an opaque RGBA image, the
+        watermark region must stay OPAQUE. Reverse-alpha recovers real pixels
+        there, so zeroing alpha would punch a transparent hole that renders as a
+        solid white box on any non-transparent viewer.
         """
         rgba = np.full((200, 200, 4), 255, dtype=np.uint8)  # fully opaque white
         src = tmp_path / "rgba_full.png"
         cv2.imwrite(str(src), rgba)
 
-        output = tmp_path / "rgba_cleared.png"
+        output = tmp_path / "rgba_kept.png"
         result = runner.invoke(
             main,
             ["visible", str(src), "-o", str(output), "--no-detect"],
@@ -215,13 +217,15 @@ class TestVisibleCommand:
         assert result.exit_code == 0, result.output
         out = cv2.imread(str(output), cv2.IMREAD_UNCHANGED)
         assert out.shape[2] == 4
-        # Default sparkle position is in the bottom-right; alpha there must be 0.
+        # Default sparkle position is in the bottom-right; alpha there must stay 255.
         from remove_ai_watermarks.gemini_engine import get_watermark_config
 
         cfg = get_watermark_config(200, 200)
         px, py = cfg.get_position(200, 200)
         size = cfg.logo_size
-        assert out[py + size // 2, px + size // 2, 3] == 0, "alpha in the watermark region was not cleared"
+        assert out[py + size // 2, px + size // 2, 3] == 255, "watermark region alpha was zeroed (white-box regression)"
+        # No pixel anywhere should have been forced transparent.
+        assert int((out[:, :, 3] == 0).sum()) == 0, "spurious transparent pixels introduced"
 
     def test_visible_rgb_input_stays_rgb(self, runner, sample_png, tmp_path):
         """Regression: a plain RGB PNG must NOT gain a spurious alpha channel."""