mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-04 23:47:49 +02:00
fix(visible): inpaint mid-tone Gemini sparkle instead of a dark diamond
The free `visible` path over-subtracted a faint Gemini sparkle on a mid-tone background into a darker-than-background brown diamond instead of removing it (2026-06-18 prod NPS report, "the watermark was not removed, just its color changed"). The existing over-subtraction guard only tripped when reverse-alpha drove a footprint pixel fully negative (the issue #30 dark-background black-pit case); on a mid-tone background the over-subtraction darkens the core well below the background without any pixel crossing zero, so the gate missed it and shipped the dark mark. Add a second over-subtraction signal to `_reverse_alpha_oversubtracts`: predict the reverse-alpha output at the bright core, (core - a*logo)/(1-a), and route to the footprint inpaint when it lands more than `_OVERSUB_DARK_MARGIN` (25) gray levels below the local background ring. Calibrated wide: clean removals predict within ~12 of background (demo_banana ~-1), the prod regression ~-40, the issue #30 dark case ~-82. Corpus-validated on the 479 detected Gemini images: 10 switch reverse-alpha to inpaint, all of them dark-diamond cases that improve or match; the other 469 stay byte-identical. demo_banana stays on the reverse-alpha path (byte-identical). Also crop both reverse-alpha helpers to the region they actually touch, a pure O(image) -> O(mark) win that is byte-identical to the full-frame math (a uint8<->float32 round-trip is exact): - `GeminiEngine._core_and_bg` converts only the footprint+ring crop to gray, not the whole frame (~70 ms -> 0.1 ms on a 12 MP image; it runs for both the alpha-gain estimate and the new gate). Verified identical across 479 images; detector confidence unchanged. - `TextMarkEngine._apply_reverse_alpha` computes the blend on the glyph crop only (`amap` is zero outside it, so the math is a no-op there): ~275 ms -> ~2 ms per placement on a 12 MP frame, up to 2 placements per removal. Verified identical across 142 Doubao/Jimeng placements. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -11,7 +11,7 @@ module.
|
||||
|
||||
## `noai/c2pa.py`
|
||||
|
||||
`noai/c2pa.py` — PNG chunk parser; use `extract_c2pa_chunk(path)` to get raw caBX payload, `has_c2pa_metadata(path)` to detect. Do not reimplement chunk parsing. `extract_c2pa_info(path)` sets `synthid_watermark`/`synthid_vendors` when the manifest is signed by a SynthID-using vendor, and `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`).
|
||||
`noai/c2pa.py` — C2PA reading, **official c2pa-python `Reader` first, hand-rolled parser as fallback** (migrated 2026-06-18; the official lib is a core dep, MIT/Apache, spec-tracking). `read_manifest_store_json(path)` runs `Reader.try_create` with a default `Context` (NO trust enforcement — we report what is in the file, we do not gate on cert trust) and returns the **whole** manifest-store JSON (every manifest plus ingredient manifests); it is memoized per (path, mtime) (`lru_cache(maxsize=8)`) because one `identify`/`get_ai_metadata` call invokes the structured parser ~3x on the same file. `extract_c2pa_info(path)` builds its dict from that store JSON (`_info_from_store_json`: structured `claim_generator` from the active manifest's `claim_generator` / `claim_generator_info[].name`, `timestamp` from `signature_info.time`) and falls back to the legacy caBX parser (`_extract_c2pa_info_png`) when the reader is unavailable (broken/absent wheel, `reader_available()` False) or finds no parseable manifest (synthetic/partial test blobs, the inject round-trip's re-stitched chunk). **Both paths share `_populate_registry_fields(buf, info)`** — the issuer / AI-tool / action / source-type / SynthID / soft-binding registry byte-scan applied to the store JSON (reader path) or the raw caBX bytes (fallback) — so the return-dict shape is identical and the registry stays the single source of truth. Whole-store scanning is load-bearing: a ChatGPT *edit* of a Sora generation keeps `trainedAlgorithmicMedia` + issuer "OpenAI" on the **parent/ingredient** manifest, not the active "opened" one (the active manifest's `signature_info.issuer` is "OpenAI", `common_name` "Truepic Lens CLI in Sora", so the issuer field now reads "OpenAI, Truepic" — first-match-wins platform attribution still resolves OpenAI). `extract_c2pa_info` now also serves non-PNG containers (JPEG/AVIF/MP4) structurally via the reader; the consumers (`identify`, `synthid_source`, `get_ai_metadata`) already merge `info OR byte-scan`, so this strictly upgrades the non-PNG path with no double-counting. `synthid_watermark`/`synthid_vendors` is set when the manifest is signed by a SynthID-using vendor on AI content; `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both paths and the non-PNG binary path). `extract_c2pa_chunk` / `inject_c2pa_chunk` / `has_c2pa_metadata` stay the PNG caBX byte tools (raw-chunk extraction for `extractor.py`, test injection, fallback detection). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`). Regression-guarded by `tests/test_noai.py::TestC2PARealSamples::{test_extract_info_uses_reader_store,test_fallback_to_png_parser_when_reader_unavailable}`.
|
||||
|
||||
## `noai/constants.py`
|
||||
|
||||
@@ -29,7 +29,7 @@ module.
|
||||
|
||||
**Visible-mark detection** (`check_visible`, signals `visible_sparkle` / `visible_doubao` / `visible_jimeng` / `visible_samsung`): the Gemini sparkle keeps its own file-level path (`_visible_sparkle` → `gemini_engine.detect_sparkle_confidence`, promoted only at confidence ≥ `_SPARKLE_THRESHOLD` 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng/Samsung reuse the registry detectors (`_visible_text_marks` → `watermark_registry`, iterating `_VISIBLE_MARK_PLATFORM`), each gated by its own engine NCC threshold via `MarkDetection.detected` (Doubao 0.4, Jimeng 0.45, Samsung 0.4). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label and Samsung by its C2PA + `genAIType` marker, so the visible path is their stripped-metadata fallback. Visible marks set `platform` only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here.
|
||||
|
||||
**`import identify` is deliberately light** (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports only the pure `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB.
|
||||
**`import identify` is deliberately light** (~26 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports the `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. `noai.c2pa` does eagerly import the **c2pa-python** binary (Rust + cryptography, ~+5 MB RSS, no torch) for the primary `Reader` path — light enough to stay on the dependency-light host; a broken/absent wheel degrades to the byte-scan parser (`reader_available()` False). The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB.
|
||||
|
||||
**C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`).
|
||||
|
||||
@@ -103,6 +103,8 @@ The 11 survivors are near-white ill-conditioning (reverse-alpha divides by `1-a`
|
||||
|
||||
`_text_mark_engine.py` — **shared base for the three reverse-alpha text-mark engines (Doubao/Jimeng/Samsung), extracted 2026-06-09** (they were ~90% byte-identical clones). `TextMarkEngine(config: TextMarkConfig)` owns the whole `locate → extract_mask → detect → _fixed/_aligned_alpha_map → _apply_reverse_alpha → remove_watermark_reverse_alpha` pipeline (+ the asset-keyed `load_alpha_template`/`glyph_silhouette`/`template_match_score` caches). Each engine module is now a thin subclass: it supplies only its `TextMarkConfig` (the tuned constants, the bundled asset, and the bounded structural deltas — `corner` br/bl, `margin_floor` 4/2, `morph_open_size` 5/3, `min_gw` 8/16) plus the test-facing module shims (`_alpha_template`/`_glyph_silhouette`/`_template_match_score` + the constants). Behavior is byte-exact vs the old per-engine code (the three engine test suites pass unchanged). Gemini stays a SEPARATE engine (its multi-size fixed-slot sparkle model is genuinely different). Add a new text mark = a new `TextMarkConfig` + a thin subclass + one registry `_text_mark(...)` row. The engine bullets below describe each mark's calibration history; the LOGIC lives here.
|
||||
|
||||
**`_apply_reverse_alpha` runs on the glyph crop only:** `amap` is zero outside the glyph `region` (x, y, w, h), so the blend is a no-op there (`(wm - 0)/(1 - 0) == wm`, and a uint8→float32→uint8 round-trip is exact). It copies the frame through and computes the reverse-alpha math on the `region` crop only — byte-identical to the old full-frame pass (verified: Doubao 130 + Jimeng 22 placements, 0 mismatches) but O(glyph) not O(image). The full-frame pass cost ~275 ms on a 12 MP frame for a glyph that is <0.1% of it, once per candidate placement (fixed + aligned ≈ 2×/removal); the crop drops that to ~2 ms. Mirror of the Gemini `_core_and_bg` crop. `remove_watermark_reverse_alpha` passes the `region` each `_fixed/_aligned_alpha_map` returns.
|
||||
|
||||
## `doubao_engine.py`
|
||||
|
||||
`doubao_engine.py` — **a thin `_text_mark_engine.TextMarkEngine` subclass (config only) since 2026-06-09.** visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy `sat = roi.max(axis=2) - roi.min(axis=2)` (no HSV conversion). `detect` is **shape-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243).
|
||||
|
||||
@@ -302,11 +302,28 @@ class TextMarkEngine:
|
||||
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
|
||||
return amap, (ax, ay, gw, gh)
|
||||
|
||||
def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``."""
|
||||
a3 = np.clip(amap, 0.0, 1.0)[:, :, None]
|
||||
def _apply_reverse_alpha(
|
||||
self, image: NDArray[Any], amap: NDArray[Any], region: tuple[int, int, int, int]
|
||||
) -> NDArray[Any]:
|
||||
"""Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``.
|
||||
|
||||
``amap`` is zero everywhere except the glyph ``region`` (x, y, w, h), so the
|
||||
blend is a no-op (``(wm - 0)/(1 - 0) == wm``) outside it. Compute the math on
|
||||
the glyph crop only and copy the rest through unchanged -- byte-identical to a
|
||||
full-frame pass (a uint8 round-trip through float32 is exact), but O(glyph)
|
||||
instead of O(image): a full-frame pass costs ~275 ms on a 12 MP frame for a
|
||||
glyph that is <0.1% of it, and it runs once per candidate placement.
|
||||
"""
|
||||
out = image.copy()
|
||||
x1, y1, gw, gh = region
|
||||
x2, y2 = x1 + gw, y1 + gh
|
||||
if y1 >= y2 or x1 >= x2:
|
||||
return out
|
||||
a3 = np.clip(amap[y1:y2, x1:x2], 0.0, 1.0)[:, :, None]
|
||||
logo = np.array(self.config.alpha_logo_bgr, np.float32)
|
||||
return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
|
||||
roi = out[y1:y2, x1:x2].astype(np.float32)
|
||||
out[y1:y2, x1:x2] = np.clip((roi - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
|
||||
return out
|
||||
|
||||
def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]:
|
||||
"""Recover the original pixels by inverting the alpha blend, then clear the
|
||||
@@ -335,8 +352,8 @@ class TextMarkEngine:
|
||||
best_out: NDArray[Any] | None = None
|
||||
best_amap: NDArray[Any] | None = None
|
||||
best_residual = float("inf")
|
||||
for amap, _region in maps:
|
||||
out = self._apply_reverse_alpha(image, amap)
|
||||
for amap, region in maps:
|
||||
out = self._apply_reverse_alpha(image, amap, region)
|
||||
residual = self.detect(out).confidence
|
||||
if residual < best_residual:
|
||||
best_residual, best_out, best_amap = residual, out, amap
|
||||
|
||||
@@ -142,6 +142,19 @@ class GeminiEngine:
|
||||
# gate separates them with a wide margin.
|
||||
_OVERSUB_FOOTPRINT_FRAC = 0.05
|
||||
|
||||
# Mid-tone over-subtraction (2026-06-18 prod "the color just changed, not removed"
|
||||
# report). The numerator fraction above only trips when reverse-alpha drives a
|
||||
# footprint pixel fully NEGATIVE -- the dark-background black-pit case. On a MID-TONE
|
||||
# background a sparkle fainter than the captured alpha is over-subtracted into a
|
||||
# visibly DARKER-than-background diamond while no pixel ever crosses zero, so the
|
||||
# numerator gate misses it and ships the dark mark. Predict the reverse-alpha output
|
||||
# at the bright core, (core - a*logo)/(1-a); when it lands more than this many gray
|
||||
# levels BELOW the local background ring, reverse-alpha would leave a dark diamond --
|
||||
# inpaint instead. Calibrated wide: clean removals predict within ~12 of background
|
||||
# (demo_banana ~-1, a bright-bg sparkle ~-12), the prod regression predicts ~-40 and
|
||||
# the issue #30 dark case ~-82, so 25 separates keep-vs-inpaint with margin.
|
||||
_OVERSUB_DARK_MARGIN = 25.0
|
||||
|
||||
# Per-image alpha gain (under-subtraction fix). The captured alpha peaks ~0.51
|
||||
# (a ~51%-opaque sparkle). Some real Gemini sparkles are rendered MORE opaque,
|
||||
# so the fixed alpha under-subtracts and reverse-alpha leaves a bright residual
|
||||
@@ -642,19 +655,24 @@ class GeminiEngine:
|
||||
a_cap = float(alpha_roi.max())
|
||||
if a_cap < 0.2:
|
||||
return None
|
||||
gray = image.astype(np.float32).mean(axis=2)
|
||||
core = alpha_roi >= a_cap * self._ALPHA_GAIN_CORE_FRAC
|
||||
if not bool(core.any()):
|
||||
return None
|
||||
core_obs = float(np.percentile(gray[y1:y2, x1:x2][core], 75))
|
||||
# Local background = a ring just outside the footprint box.
|
||||
# Convert only the footprint+ring crop to gray, not the whole image: every
|
||||
# sample below lives inside the ring box, so a full-image mean is wasted work
|
||||
# that scales with resolution (~70 ms on a 12 MP image, recomputed for both
|
||||
# the alpha-gain estimate and the over-subtraction gate). The crop is sized by
|
||||
# the footprint, so this is O(footprint^2) regardless of image size.
|
||||
ih, iw = image.shape[:2]
|
||||
pad = int((x2 - x1) * 0.7)
|
||||
ry1, ry2 = max(0, y1 - pad), min(ih, y2 + pad)
|
||||
rx1, rx2 = max(0, x1 - pad), min(iw, x2 + pad)
|
||||
ring = gray[ry1:ry2, rx1:rx2]
|
||||
ring = image[ry1:ry2, rx1:rx2].astype(np.float32).mean(axis=2)
|
||||
# Footprint box expressed in ring-crop coordinates.
|
||||
fy1, fy2, fx1, fx2 = y1 - ry1, y2 - ry1, x1 - rx1, x2 - rx1
|
||||
core_obs = float(np.percentile(ring[fy1:fy2, fx1:fx2][core], 75))
|
||||
ring_mask = np.ones(ring.shape, dtype=bool)
|
||||
ring_mask[y1 - ry1 : y2 - ry1, x1 - rx1 : x2 - rx1] = False
|
||||
ring_mask[fy1:fy2, fx1:fx2] = False
|
||||
if int(ring_mask.sum()) < 10:
|
||||
return None
|
||||
return core_obs, float(np.median(ring[ring_mask])), a_cap
|
||||
@@ -704,11 +722,19 @@ class GeminiEngine:
|
||||
alpha_map: NDArray[Any],
|
||||
position: tuple[int, int],
|
||||
) -> bool:
|
||||
"""True when reverse-alpha would drive the footprint dark (issue #30).
|
||||
"""True when reverse-alpha would drive the footprint dark.
|
||||
|
||||
Tests the numerator ``watermarked - alpha*logo`` over the sparkle body: a
|
||||
brightening overlay can never make it negative, so a large negative fraction
|
||||
means the fixed alpha over-estimates this image's opacity.
|
||||
Two signatures of the captured alpha over-estimating this image's sparkle
|
||||
opacity, either of which means reverse-alpha would leave a dark mark:
|
||||
|
||||
1. Dark-background black pit (issue #30): the numerator
|
||||
``watermarked - alpha*logo`` over the sparkle body. A brightening overlay
|
||||
can never make it negative, so a large negative fraction means the fixed
|
||||
alpha over-subtracts past black.
|
||||
2. Mid-tone dark diamond (see ``_OVERSUB_DARK_MARGIN``): on a mid-tone
|
||||
background the over-subtraction darkens the core well below the background
|
||||
without any pixel crossing zero, so case 1 misses it. Predict the
|
||||
reverse-alpha core output and trip when it lands far below the local ring.
|
||||
"""
|
||||
placed = self._footprint_indices(alpha_map, position, image.shape)
|
||||
if placed is None:
|
||||
@@ -720,7 +746,18 @@ class GeminiEngine:
|
||||
roi = image[y1:y2, x1:x2].astype(np.float32)
|
||||
numerator = roi.mean(axis=2) - np.clip(alpha_roi, 0.0, 0.99) * self.logo_value
|
||||
frac = float((numerator[body] < 0).sum()) / float(body.sum())
|
||||
return frac > self._OVERSUB_FOOTPRINT_FRAC
|
||||
if frac > self._OVERSUB_FOOTPRINT_FRAC:
|
||||
return True
|
||||
|
||||
# Mid-tone darkening: predict the reverse-alpha output at the bright core and
|
||||
# compare to the local background ring (reuses the FP-gate / alpha-gain machinery).
|
||||
cb = self._core_and_bg(image, alpha_map, position)
|
||||
if cb is None:
|
||||
return False
|
||||
core_obs, bg, a_cap = cb
|
||||
a = min(a_cap, 0.99)
|
||||
predicted_core = (core_obs - a * self.logo_value) / (1.0 - a)
|
||||
return predicted_core < bg - self._OVERSUB_DARK_MARGIN
|
||||
|
||||
def _inpaint_footprint(
|
||||
self,
|
||||
|
||||
@@ -298,6 +298,28 @@ class TestOverSubtractionGuard:
|
||||
dalpha = self.engine.get_interpolated_alpha(dpos[2])
|
||||
assert self.engine._reverse_alpha_oversubtracts(dark, dalpha, (dpos[0], dpos[1])) is True
|
||||
|
||||
def test_midtone_background_does_not_leave_dark_diamond(self):
|
||||
"""2026-06-18 prod report: a faint sparkle on a MID-TONE background was
|
||||
over-subtracted into a darker-than-background diamond ("the color just
|
||||
changed, not removed"). No footprint pixel crosses zero there, so the
|
||||
numerator gate misses it -- the dark-margin gate must catch it and inpaint.
|
||||
"""
|
||||
image, (x, y, w, h) = self._composite_sparkle(bg_value=160)
|
||||
footprint = image[y : y + h, x : x + w]
|
||||
# The numerator (black-pit) gate alone does NOT fire on a mid-tone background.
|
||||
alpha = self.engine.get_interpolated_alpha(w)
|
||||
roi = footprint.astype(np.float32).mean(axis=2)
|
||||
body = alpha[:h, :w] >= self.engine._FOOTPRINT_ALPHA
|
||||
numerator = roi - np.clip(alpha[:h, :w], 0.0, 0.99) * self.engine.logo_value
|
||||
assert float((numerator[body] < 0).sum()) / float(body.sum()) <= self.engine._OVERSUB_FOOTPRINT_FRAC
|
||||
# ...but the over-subtraction guard still trips (via the dark-margin path) and
|
||||
# removal leaves the footprint reading like the mid-tone background, not darker.
|
||||
assert self.engine._reverse_alpha_oversubtracts(image, alpha, (x, y)) is True
|
||||
out = self.engine.remove_watermark(image)
|
||||
cleaned = out[y : y + h, x : x + w]
|
||||
assert abs(float(cleaned.mean()) - 160.0) < 15.0, f"dark diamond: mean={cleaned.mean()}"
|
||||
assert int(cleaned.min()) > 160 - 30, f"dark pit: min={cleaned.min()}"
|
||||
|
||||
|
||||
class TestUnderSubtractionGain:
|
||||
"""Under-subtraction fix: a sparkle MORE opaque than the captured alpha must not
|
||||
|
||||
Reference in New Issue
Block a user