fix(visible): inpaint mid-tone Gemini sparkle instead of a dark diamond

The free `visible` path over-subtracted a faint Gemini sparkle on a
mid-tone background into a darker-than-background brown diamond instead
of removing it (2026-06-18 prod NPS report, "the watermark was not
removed, just its color changed"). The existing over-subtraction guard
only tripped when reverse-alpha drove a footprint pixel fully negative
(the issue #30 dark-background black-pit case); on a mid-tone background
the over-subtraction darkens the core well below the background without
any pixel crossing zero, so the gate missed it and shipped the dark mark.

Add a second over-subtraction signal to `_reverse_alpha_oversubtracts`:
predict the reverse-alpha output at the bright core, (core - a*logo)/(1-a),
and route to the footprint inpaint when it lands more than
`_OVERSUB_DARK_MARGIN` (25) gray levels below the local background ring.
Calibrated wide: clean removals predict within ~12 of background
(demo_banana ~-1), the prod regression ~-40, the issue #30 dark case ~-82.
Corpus-validated on the 479 detected Gemini images: 10 switch reverse-alpha
to inpaint, all of them dark-diamond cases that improve or match; the
other 469 stay byte-identical. demo_banana stays on the reverse-alpha
path (byte-identical).

Also crop both reverse-alpha helpers to the region they actually touch,
a pure O(image) -> O(mark) win that is byte-identical to the full-frame
math (a uint8<->float32 round-trip is exact):
- `GeminiEngine._core_and_bg` converts only the footprint+ring crop to
  gray, not the whole frame (~70 ms -> 0.1 ms on a 12 MP image; it runs
  for both the alpha-gain estimate and the new gate). Verified identical
  across 479 images; detector confidence unchanged.
- `TextMarkEngine._apply_reverse_alpha` computes the blend on the glyph
  crop only (`amap` is zero outside it, so the math is a no-op there):
  ~275 ms -> ~2 ms per placement on a 12 MP frame, up to 2 placements per
  removal. Verified identical across 142 Doubao/Jimeng placements.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-18 17:19:41 -07:00
parent 09fdb4544a
commit 41f67973ce
4 changed files with 96 additions and 18 deletions
+4 -2
View File
@@ -11,7 +11,7 @@ module.
## `noai/c2pa.py`
`noai/c2pa.py`PNG chunk parser; use `extract_c2pa_chunk(path)` to get raw caBX payload, `has_c2pa_metadata(path)` to detect. Do not reimplement chunk parsing. `extract_c2pa_info(path)` sets `synthid_watermark`/`synthid_vendors` when the manifest is signed by a SynthID-using vendor, and `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`).
`noai/c2pa.py`C2PA reading, **official c2pa-python `Reader` first, hand-rolled parser as fallback** (migrated 2026-06-18; the official lib is a core dep, MIT/Apache, spec-tracking). `read_manifest_store_json(path)` runs `Reader.try_create` with a default `Context` (NO trust enforcement — we report what is in the file, we do not gate on cert trust) and returns the **whole** manifest-store JSON (every manifest plus ingredient manifests); it is memoized per (path, mtime) (`lru_cache(maxsize=8)`) because one `identify`/`get_ai_metadata` call invokes the structured parser ~3x on the same file. `extract_c2pa_info(path)` builds its dict from that store JSON (`_info_from_store_json`: structured `claim_generator` from the active manifest's `claim_generator` / `claim_generator_info[].name`, `timestamp` from `signature_info.time`) and falls back to the legacy caBX parser (`_extract_c2pa_info_png`) when the reader is unavailable (broken/absent wheel, `reader_available()` False) or finds no parseable manifest (synthetic/partial test blobs, the inject round-trip's re-stitched chunk). **Both paths share `_populate_registry_fields(buf, info)`** — the issuer / AI-tool / action / source-type / SynthID / soft-binding registry byte-scan applied to the store JSON (reader path) or the raw caBX bytes (fallback) — so the return-dict shape is identical and the registry stays the single source of truth. Whole-store scanning is load-bearing: a ChatGPT *edit* of a Sora generation keeps `trainedAlgorithmicMedia` + issuer "OpenAI" on the **parent/ingredient** manifest, not the active "opened" one (the active manifest's `signature_info.issuer` is "OpenAI", `common_name` "Truepic Lens CLI in Sora", so the issuer field now reads "OpenAI, Truepic" — first-match-wins platform attribution still resolves OpenAI). `extract_c2pa_info` now also serves non-PNG containers (JPEG/AVIF/MP4) structurally via the reader; the consumers (`identify`, `synthid_source`, `get_ai_metadata`) already merge `info OR byte-scan`, so this strictly upgrades the non-PNG path with no double-counting. `synthid_watermark`/`synthid_vendors` is set when the manifest is signed by a SynthID-using vendor on AI content; `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both paths and the non-PNG binary path). `extract_c2pa_chunk` / `inject_c2pa_chunk` / `has_c2pa_metadata` stay the PNG caBX byte tools (raw-chunk extraction for `extractor.py`, test injection, fallback detection). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`). Regression-guarded by `tests/test_noai.py::TestC2PARealSamples::{test_extract_info_uses_reader_store,test_fallback_to_png_parser_when_reader_unavailable}`.
## `noai/constants.py`
@@ -29,7 +29,7 @@ module.
**Visible-mark detection** (`check_visible`, signals `visible_sparkle` / `visible_doubao` / `visible_jimeng` / `visible_samsung`): the Gemini sparkle keeps its own file-level path (`_visible_sparkle``gemini_engine.detect_sparkle_confidence`, promoted only at confidence ≥ `_SPARKLE_THRESHOLD` 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng/Samsung reuse the registry detectors (`_visible_text_marks``watermark_registry`, iterating `_VISIBLE_MARK_PLATFORM`), each gated by its own engine NCC threshold via `MarkDetection.detected` (Doubao 0.4, Jimeng 0.45, Samsung 0.4). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label and Samsung by its C2PA + `genAIType` marker, so the visible path is their stripped-metadata fallback. Visible marks set `platform` only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here.
**`import identify` is deliberately light** (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports only the pure `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB.
**`import identify` is deliberately light** (~26 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports the `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. `noai.c2pa` does eagerly import the **c2pa-python** binary (Rust + cryptography, ~+5 MB RSS, no torch) for the primary `Reader` path — light enough to stay on the dependency-light host; a broken/absent wheel degrades to the byte-scan parser (`reader_available()` False). The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB.
**C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`).
@@ -103,6 +103,8 @@ The 11 survivors are near-white ill-conditioning (reverse-alpha divides by `1-a`
`_text_mark_engine.py`**shared base for the three reverse-alpha text-mark engines (Doubao/Jimeng/Samsung), extracted 2026-06-09** (they were ~90% byte-identical clones). `TextMarkEngine(config: TextMarkConfig)` owns the whole `locate → extract_mask → detect → _fixed/_aligned_alpha_map → _apply_reverse_alpha → remove_watermark_reverse_alpha` pipeline (+ the asset-keyed `load_alpha_template`/`glyph_silhouette`/`template_match_score` caches). Each engine module is now a thin subclass: it supplies only its `TextMarkConfig` (the tuned constants, the bundled asset, and the bounded structural deltas — `corner` br/bl, `margin_floor` 4/2, `morph_open_size` 5/3, `min_gw` 8/16) plus the test-facing module shims (`_alpha_template`/`_glyph_silhouette`/`_template_match_score` + the constants). Behavior is byte-exact vs the old per-engine code (the three engine test suites pass unchanged). Gemini stays a SEPARATE engine (its multi-size fixed-slot sparkle model is genuinely different). Add a new text mark = a new `TextMarkConfig` + a thin subclass + one registry `_text_mark(...)` row. The engine bullets below describe each mark's calibration history; the LOGIC lives here.
**`_apply_reverse_alpha` runs on the glyph crop only:** `amap` is zero outside the glyph `region` (x, y, w, h), so the blend is a no-op there (`(wm - 0)/(1 - 0) == wm`, and a uint8→float32→uint8 round-trip is exact). It copies the frame through and computes the reverse-alpha math on the `region` crop only — byte-identical to the old full-frame pass (verified: Doubao 130 + Jimeng 22 placements, 0 mismatches) but O(glyph) not O(image). The full-frame pass cost ~275 ms on a 12 MP frame for a glyph that is <0.1% of it, once per candidate placement (fixed + aligned ≈ 2×/removal); the crop drops that to ~2 ms. Mirror of the Gemini `_core_and_bg` crop. `remove_watermark_reverse_alpha` passes the `region` each `_fixed/_aligned_alpha_map` returns.
## `doubao_engine.py`
`doubao_engine.py`**a thin `_text_mark_engine.TextMarkEngine` subclass (config only) since 2026-06-09.** visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy `sat = roi.max(axis=2) - roi.min(axis=2)` (no HSV conversion). `detect` is **shape-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243).
+23 -6
View File
@@ -302,11 +302,28 @@ class TextMarkEngine:
amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR)
return amap, (ax, ay, gw, gh)
def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]:
"""Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``."""
a3 = np.clip(amap, 0.0, 1.0)[:, :, None]
def _apply_reverse_alpha(
self, image: NDArray[Any], amap: NDArray[Any], region: tuple[int, int, int, int]
) -> NDArray[Any]:
"""Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``.
``amap`` is zero everywhere except the glyph ``region`` (x, y, w, h), so the
blend is a no-op (``(wm - 0)/(1 - 0) == wm``) outside it. Compute the math on
the glyph crop only and copy the rest through unchanged -- byte-identical to a
full-frame pass (a uint8 round-trip through float32 is exact), but O(glyph)
instead of O(image): a full-frame pass costs ~275 ms on a 12 MP frame for a
glyph that is <0.1% of it, and it runs once per candidate placement.
"""
out = image.copy()
x1, y1, gw, gh = region
x2, y2 = x1 + gw, y1 + gh
if y1 >= y2 or x1 >= x2:
return out
a3 = np.clip(amap[y1:y2, x1:x2], 0.0, 1.0)[:, :, None]
logo = np.array(self.config.alpha_logo_bgr, np.float32)
return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
roi = out[y1:y2, x1:x2].astype(np.float32)
out[y1:y2, x1:x2] = np.clip((roi - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8)
return out
def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]:
"""Recover the original pixels by inverting the alpha blend, then clear the
@@ -335,8 +352,8 @@ class TextMarkEngine:
best_out: NDArray[Any] | None = None
best_amap: NDArray[Any] | None = None
best_residual = float("inf")
for amap, _region in maps:
out = self._apply_reverse_alpha(image, amap)
for amap, region in maps:
out = self._apply_reverse_alpha(image, amap, region)
residual = self.detect(out).confidence
if residual < best_residual:
best_residual, best_out, best_amap = residual, out, amap
+47 -10
View File
@@ -142,6 +142,19 @@ class GeminiEngine:
# gate separates them with a wide margin.
_OVERSUB_FOOTPRINT_FRAC = 0.05
# Mid-tone over-subtraction (2026-06-18 prod "the color just changed, not removed"
# report). The numerator fraction above only trips when reverse-alpha drives a
# footprint pixel fully NEGATIVE -- the dark-background black-pit case. On a MID-TONE
# background a sparkle fainter than the captured alpha is over-subtracted into a
# visibly DARKER-than-background diamond while no pixel ever crosses zero, so the
# numerator gate misses it and ships the dark mark. Predict the reverse-alpha output
# at the bright core, (core - a*logo)/(1-a); when it lands more than this many gray
# levels BELOW the local background ring, reverse-alpha would leave a dark diamond --
# inpaint instead. Calibrated wide: clean removals predict within ~12 of background
# (demo_banana ~-1, a bright-bg sparkle ~-12), the prod regression predicts ~-40 and
# the issue #30 dark case ~-82, so 25 separates keep-vs-inpaint with margin.
_OVERSUB_DARK_MARGIN = 25.0
# Per-image alpha gain (under-subtraction fix). The captured alpha peaks ~0.51
# (a ~51%-opaque sparkle). Some real Gemini sparkles are rendered MORE opaque,
# so the fixed alpha under-subtracts and reverse-alpha leaves a bright residual
@@ -642,19 +655,24 @@ class GeminiEngine:
a_cap = float(alpha_roi.max())
if a_cap < 0.2:
return None
gray = image.astype(np.float32).mean(axis=2)
core = alpha_roi >= a_cap * self._ALPHA_GAIN_CORE_FRAC
if not bool(core.any()):
return None
core_obs = float(np.percentile(gray[y1:y2, x1:x2][core], 75))
# Local background = a ring just outside the footprint box.
# Convert only the footprint+ring crop to gray, not the whole image: every
# sample below lives inside the ring box, so a full-image mean is wasted work
# that scales with resolution (~70 ms on a 12 MP image, recomputed for both
# the alpha-gain estimate and the over-subtraction gate). The crop is sized by
# the footprint, so this is O(footprint^2) regardless of image size.
ih, iw = image.shape[:2]
pad = int((x2 - x1) * 0.7)
ry1, ry2 = max(0, y1 - pad), min(ih, y2 + pad)
rx1, rx2 = max(0, x1 - pad), min(iw, x2 + pad)
ring = gray[ry1:ry2, rx1:rx2]
ring = image[ry1:ry2, rx1:rx2].astype(np.float32).mean(axis=2)
# Footprint box expressed in ring-crop coordinates.
fy1, fy2, fx1, fx2 = y1 - ry1, y2 - ry1, x1 - rx1, x2 - rx1
core_obs = float(np.percentile(ring[fy1:fy2, fx1:fx2][core], 75))
ring_mask = np.ones(ring.shape, dtype=bool)
ring_mask[y1 - ry1 : y2 - ry1, x1 - rx1 : x2 - rx1] = False
ring_mask[fy1:fy2, fx1:fx2] = False
if int(ring_mask.sum()) < 10:
return None
return core_obs, float(np.median(ring[ring_mask])), a_cap
@@ -704,11 +722,19 @@ class GeminiEngine:
alpha_map: NDArray[Any],
position: tuple[int, int],
) -> bool:
"""True when reverse-alpha would drive the footprint dark (issue #30).
"""True when reverse-alpha would drive the footprint dark.
Tests the numerator ``watermarked - alpha*logo`` over the sparkle body: a
brightening overlay can never make it negative, so a large negative fraction
means the fixed alpha over-estimates this image's opacity.
Two signatures of the captured alpha over-estimating this image's sparkle
opacity, either of which means reverse-alpha would leave a dark mark:
1. Dark-background black pit (issue #30): the numerator
``watermarked - alpha*logo`` over the sparkle body. A brightening overlay
can never make it negative, so a large negative fraction means the fixed
alpha over-subtracts past black.
2. Mid-tone dark diamond (see ``_OVERSUB_DARK_MARGIN``): on a mid-tone
background the over-subtraction darkens the core well below the background
without any pixel crossing zero, so case 1 misses it. Predict the
reverse-alpha core output and trip when it lands far below the local ring.
"""
placed = self._footprint_indices(alpha_map, position, image.shape)
if placed is None:
@@ -720,7 +746,18 @@ class GeminiEngine:
roi = image[y1:y2, x1:x2].astype(np.float32)
numerator = roi.mean(axis=2) - np.clip(alpha_roi, 0.0, 0.99) * self.logo_value
frac = float((numerator[body] < 0).sum()) / float(body.sum())
return frac > self._OVERSUB_FOOTPRINT_FRAC
if frac > self._OVERSUB_FOOTPRINT_FRAC:
return True
# Mid-tone darkening: predict the reverse-alpha output at the bright core and
# compare to the local background ring (reuses the FP-gate / alpha-gain machinery).
cb = self._core_and_bg(image, alpha_map, position)
if cb is None:
return False
core_obs, bg, a_cap = cb
a = min(a_cap, 0.99)
predicted_core = (core_obs - a * self.logo_value) / (1.0 - a)
return predicted_core < bg - self._OVERSUB_DARK_MARGIN
def _inpaint_footprint(
self,
+22
View File
@@ -298,6 +298,28 @@ class TestOverSubtractionGuard:
dalpha = self.engine.get_interpolated_alpha(dpos[2])
assert self.engine._reverse_alpha_oversubtracts(dark, dalpha, (dpos[0], dpos[1])) is True
def test_midtone_background_does_not_leave_dark_diamond(self):
"""2026-06-18 prod report: a faint sparkle on a MID-TONE background was
over-subtracted into a darker-than-background diamond ("the color just
changed, not removed"). No footprint pixel crosses zero there, so the
numerator gate misses it -- the dark-margin gate must catch it and inpaint.
"""
image, (x, y, w, h) = self._composite_sparkle(bg_value=160)
footprint = image[y : y + h, x : x + w]
# The numerator (black-pit) gate alone does NOT fire on a mid-tone background.
alpha = self.engine.get_interpolated_alpha(w)
roi = footprint.astype(np.float32).mean(axis=2)
body = alpha[:h, :w] >= self.engine._FOOTPRINT_ALPHA
numerator = roi - np.clip(alpha[:h, :w], 0.0, 0.99) * self.engine.logo_value
assert float((numerator[body] < 0).sum()) / float(body.sum()) <= self.engine._OVERSUB_FOOTPRINT_FRAC
# ...but the over-subtraction guard still trips (via the dark-margin path) and
# removal leaves the footprint reading like the mid-tone background, not darker.
assert self.engine._reverse_alpha_oversubtracts(image, alpha, (x, y)) is True
out = self.engine.remove_watermark(image)
cleaned = out[y : y + h, x : x + w]
assert abs(float(cleaned.mean()) - 160.0) < 15.0, f"dark diamond: mean={cleaned.mean()}"
assert int(cleaned.min()) > 160 - 30, f"dark pit: min={cleaned.min()}"
class TestUnderSubtractionGain:
"""Under-subtraction fix: a sparkle MORE opaque than the captured alpha must not