diff --git a/docs/module-internals.md b/docs/module-internals.md index 0fe3b2a..3818e05 100644 --- a/docs/module-internals.md +++ b/docs/module-internals.md @@ -11,7 +11,7 @@ module. ## `noai/c2pa.py` -`noai/c2pa.py` — PNG chunk parser; use `extract_c2pa_chunk(path)` to get raw caBX payload, `has_c2pa_metadata(path)` to detect. Do not reimplement chunk parsing. `extract_c2pa_info(path)` sets `synthid_watermark`/`synthid_vendors` when the manifest is signed by a SynthID-using vendor, and `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`). +`noai/c2pa.py` — C2PA reading, **official c2pa-python `Reader` first, hand-rolled parser as fallback** (migrated 2026-06-18; the official lib is a core dep, MIT/Apache, spec-tracking). `read_manifest_store_json(path)` runs `Reader.try_create` with a default `Context` (NO trust enforcement — we report what is in the file, we do not gate on cert trust) and returns the **whole** manifest-store JSON (every manifest plus ingredient manifests); it is memoized per (path, mtime) (`lru_cache(maxsize=8)`) because one `identify`/`get_ai_metadata` call invokes the structured parser ~3x on the same file. `extract_c2pa_info(path)` builds its dict from that store JSON (`_info_from_store_json`: structured `claim_generator` from the active manifest's `claim_generator` / `claim_generator_info[].name`, `timestamp` from `signature_info.time`) and falls back to the legacy caBX parser (`_extract_c2pa_info_png`) when the reader is unavailable (broken/absent wheel, `reader_available()` False) or finds no parseable manifest (synthetic/partial test blobs, the inject round-trip's re-stitched chunk). **Both paths share `_populate_registry_fields(buf, info)`** — the issuer / AI-tool / action / source-type / SynthID / soft-binding registry byte-scan applied to the store JSON (reader path) or the raw caBX bytes (fallback) — so the return-dict shape is identical and the registry stays the single source of truth. Whole-store scanning is load-bearing: a ChatGPT *edit* of a Sora generation keeps `trainedAlgorithmicMedia` + issuer "OpenAI" on the **parent/ingredient** manifest, not the active "opened" one (the active manifest's `signature_info.issuer` is "OpenAI", `common_name` "Truepic Lens CLI in Sora", so the issuer field now reads "OpenAI, Truepic" — first-match-wins platform attribution still resolves OpenAI). `extract_c2pa_info` now also serves non-PNG containers (JPEG/AVIF/MP4) structurally via the reader; the consumers (`identify`, `synthid_source`, `get_ai_metadata`) already merge `info OR byte-scan`, so this strictly upgrades the non-PNG path with no double-counting. `synthid_watermark`/`synthid_vendors` is set when the manifest is signed by a SynthID-using vendor on AI content; `soft_binding`/`soft_binding_vendors` when a `c2pa.soft-binding` `alg` names a forensic-watermark vendor (`soft_binding_vendors_in(buffer)` is the shared byte-scan, used by both paths and the non-PNG binary path). `extract_c2pa_chunk` / `inject_c2pa_chunk` / `has_c2pa_metadata` stay the PNG caBX byte tools (raw-chunk extraction for `extractor.py`, test injection, fallback detection). PNG/caBX chunk reads are clamped to the remaining file size (`safe_length = min(length, remaining)`; skipped chunks use seek) so a malformed huge `length` cannot drive a multi-GB allocation (shared safety discipline matching `isobmff.scan_c2pa_region`). Regression-guarded by `tests/test_noai.py::TestC2PARealSamples::{test_extract_info_uses_reader_store,test_fallback_to_png_parser_when_reader_unavailable}`. ## `noai/constants.py` @@ -29,7 +29,7 @@ module. **Visible-mark detection** (`check_visible`, signals `visible_sparkle` / `visible_doubao` / `visible_jimeng` / `visible_samsung`): the Gemini sparkle keeps its own file-level path (`_visible_sparkle` → `gemini_engine.detect_sparkle_confidence`, promoted only at confidence ≥ `_SPARKLE_THRESHOLD` 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng/Samsung reuse the registry detectors (`_visible_text_marks` → `watermark_registry`, iterating `_VISIBLE_MARK_PLATFORM`), each gated by its own engine NCC threshold via `MarkDetection.detected` (Doubao 0.4, Jimeng 0.45, Samsung 0.4). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label and Samsung by its C2PA + `genAIType` marker, so the visible path is their stripped-metadata fallback. Visible marks set `platform` only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here. -**`import identify` is deliberately light** (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports only the pure `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB. +**`import identify` is deliberately light** (~26 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports the `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. `noai.c2pa` does eagerly import the **c2pa-python** binary (Rust + cryptography, ~+5 MB RSS, no torch) for the primary `Reader` path — light enough to stay on the dependency-light host; a broken/absent wheel degrades to the byte-scan parser (`reader_available()` False). The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB. **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`). @@ -103,6 +103,8 @@ The 11 survivors are near-white ill-conditioning (reverse-alpha divides by `1-a` `_text_mark_engine.py` — **shared base for the three reverse-alpha text-mark engines (Doubao/Jimeng/Samsung), extracted 2026-06-09** (they were ~90% byte-identical clones). `TextMarkEngine(config: TextMarkConfig)` owns the whole `locate → extract_mask → detect → _fixed/_aligned_alpha_map → _apply_reverse_alpha → remove_watermark_reverse_alpha` pipeline (+ the asset-keyed `load_alpha_template`/`glyph_silhouette`/`template_match_score` caches). Each engine module is now a thin subclass: it supplies only its `TextMarkConfig` (the tuned constants, the bundled asset, and the bounded structural deltas — `corner` br/bl, `margin_floor` 4/2, `morph_open_size` 5/3, `min_gw` 8/16) plus the test-facing module shims (`_alpha_template`/`_glyph_silhouette`/`_template_match_score` + the constants). Behavior is byte-exact vs the old per-engine code (the three engine test suites pass unchanged). Gemini stays a SEPARATE engine (its multi-size fixed-slot sparkle model is genuinely different). Add a new text mark = a new `TextMarkConfig` + a thin subclass + one registry `_text_mark(...)` row. The engine bullets below describe each mark's calibration history; the LOGIC lives here. +**`_apply_reverse_alpha` runs on the glyph crop only:** `amap` is zero outside the glyph `region` (x, y, w, h), so the blend is a no-op there (`(wm - 0)/(1 - 0) == wm`, and a uint8→float32→uint8 round-trip is exact). It copies the frame through and computes the reverse-alpha math on the `region` crop only — byte-identical to the old full-frame pass (verified: Doubao 130 + Jimeng 22 placements, 0 mismatches) but O(glyph) not O(image). The full-frame pass cost ~275 ms on a 12 MP frame for a glyph that is <0.1% of it, once per candidate placement (fixed + aligned ≈ 2×/removal); the crop drops that to ~2 ms. Mirror of the Gemini `_core_and_bg` crop. `remove_watermark_reverse_alpha` passes the `region` each `_fixed/_aligned_alpha_map` returns. + ## `doubao_engine.py` `doubao_engine.py` — **a thin `_text_mark_engine.TextMarkEngine` subclass (config only) since 2026-06-09.** visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy `sat = roi.max(axis=2) - roi.min(axis=2)` (no HSV conversion). `detect` is **shape-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243). diff --git a/src/remove_ai_watermarks/_text_mark_engine.py b/src/remove_ai_watermarks/_text_mark_engine.py index 8fae06d..708a80a 100644 --- a/src/remove_ai_watermarks/_text_mark_engine.py +++ b/src/remove_ai_watermarks/_text_mark_engine.py @@ -302,11 +302,28 @@ class TextMarkEngine: amap[ay : ay + gh, ax : ax + gw] = cv2.resize(at, (gw, gh), interpolation=cv2.INTER_LINEAR) return amap, (ax, ay, gw, gh) - def _apply_reverse_alpha(self, image: NDArray[Any], amap: NDArray[Any]) -> NDArray[Any]: - """Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``.""" - a3 = np.clip(amap, 0.0, 1.0)[:, :, None] + def _apply_reverse_alpha( + self, image: NDArray[Any], amap: NDArray[Any], region: tuple[int, int, int, int] + ) -> NDArray[Any]: + """Invert the alpha blend with ``amap``: ``original = (wm - a*logo)/(1-a)``. + + ``amap`` is zero everywhere except the glyph ``region`` (x, y, w, h), so the + blend is a no-op (``(wm - 0)/(1 - 0) == wm``) outside it. Compute the math on + the glyph crop only and copy the rest through unchanged -- byte-identical to a + full-frame pass (a uint8 round-trip through float32 is exact), but O(glyph) + instead of O(image): a full-frame pass costs ~275 ms on a 12 MP frame for a + glyph that is <0.1% of it, and it runs once per candidate placement. + """ + out = image.copy() + x1, y1, gw, gh = region + x2, y2 = x1 + gw, y1 + gh + if y1 >= y2 or x1 >= x2: + return out + a3 = np.clip(amap[y1:y2, x1:x2], 0.0, 1.0)[:, :, None] logo = np.array(self.config.alpha_logo_bgr, np.float32) - return np.clip((image.astype(np.float32) - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8) + roi = out[y1:y2, x1:x2].astype(np.float32) + out[y1:y2, x1:x2] = np.clip((roi - a3 * logo) / np.clip(1.0 - a3, 0.25, 1.0), 0, 255).astype(np.uint8) + return out def remove_watermark_reverse_alpha(self, image: NDArray[Any], *, residual_inpaint: bool = True) -> NDArray[Any]: """Recover the original pixels by inverting the alpha blend, then clear the @@ -335,8 +352,8 @@ class TextMarkEngine: best_out: NDArray[Any] | None = None best_amap: NDArray[Any] | None = None best_residual = float("inf") - for amap, _region in maps: - out = self._apply_reverse_alpha(image, amap) + for amap, region in maps: + out = self._apply_reverse_alpha(image, amap, region) residual = self.detect(out).confidence if residual < best_residual: best_residual, best_out, best_amap = residual, out, amap diff --git a/src/remove_ai_watermarks/gemini_engine.py b/src/remove_ai_watermarks/gemini_engine.py index f308709..97385f6 100644 --- a/src/remove_ai_watermarks/gemini_engine.py +++ b/src/remove_ai_watermarks/gemini_engine.py @@ -142,6 +142,19 @@ class GeminiEngine: # gate separates them with a wide margin. _OVERSUB_FOOTPRINT_FRAC = 0.05 + # Mid-tone over-subtraction (2026-06-18 prod "the color just changed, not removed" + # report). The numerator fraction above only trips when reverse-alpha drives a + # footprint pixel fully NEGATIVE -- the dark-background black-pit case. On a MID-TONE + # background a sparkle fainter than the captured alpha is over-subtracted into a + # visibly DARKER-than-background diamond while no pixel ever crosses zero, so the + # numerator gate misses it and ships the dark mark. Predict the reverse-alpha output + # at the bright core, (core - a*logo)/(1-a); when it lands more than this many gray + # levels BELOW the local background ring, reverse-alpha would leave a dark diamond -- + # inpaint instead. Calibrated wide: clean removals predict within ~12 of background + # (demo_banana ~-1, a bright-bg sparkle ~-12), the prod regression predicts ~-40 and + # the issue #30 dark case ~-82, so 25 separates keep-vs-inpaint with margin. + _OVERSUB_DARK_MARGIN = 25.0 + # Per-image alpha gain (under-subtraction fix). The captured alpha peaks ~0.51 # (a ~51%-opaque sparkle). Some real Gemini sparkles are rendered MORE opaque, # so the fixed alpha under-subtracts and reverse-alpha leaves a bright residual @@ -642,19 +655,24 @@ class GeminiEngine: a_cap = float(alpha_roi.max()) if a_cap < 0.2: return None - gray = image.astype(np.float32).mean(axis=2) core = alpha_roi >= a_cap * self._ALPHA_GAIN_CORE_FRAC if not bool(core.any()): return None - core_obs = float(np.percentile(gray[y1:y2, x1:x2][core], 75)) - # Local background = a ring just outside the footprint box. + # Convert only the footprint+ring crop to gray, not the whole image: every + # sample below lives inside the ring box, so a full-image mean is wasted work + # that scales with resolution (~70 ms on a 12 MP image, recomputed for both + # the alpha-gain estimate and the over-subtraction gate). The crop is sized by + # the footprint, so this is O(footprint^2) regardless of image size. ih, iw = image.shape[:2] pad = int((x2 - x1) * 0.7) ry1, ry2 = max(0, y1 - pad), min(ih, y2 + pad) rx1, rx2 = max(0, x1 - pad), min(iw, x2 + pad) - ring = gray[ry1:ry2, rx1:rx2] + ring = image[ry1:ry2, rx1:rx2].astype(np.float32).mean(axis=2) + # Footprint box expressed in ring-crop coordinates. + fy1, fy2, fx1, fx2 = y1 - ry1, y2 - ry1, x1 - rx1, x2 - rx1 + core_obs = float(np.percentile(ring[fy1:fy2, fx1:fx2][core], 75)) ring_mask = np.ones(ring.shape, dtype=bool) - ring_mask[y1 - ry1 : y2 - ry1, x1 - rx1 : x2 - rx1] = False + ring_mask[fy1:fy2, fx1:fx2] = False if int(ring_mask.sum()) < 10: return None return core_obs, float(np.median(ring[ring_mask])), a_cap @@ -704,11 +722,19 @@ class GeminiEngine: alpha_map: NDArray[Any], position: tuple[int, int], ) -> bool: - """True when reverse-alpha would drive the footprint dark (issue #30). + """True when reverse-alpha would drive the footprint dark. - Tests the numerator ``watermarked - alpha*logo`` over the sparkle body: a - brightening overlay can never make it negative, so a large negative fraction - means the fixed alpha over-estimates this image's opacity. + Two signatures of the captured alpha over-estimating this image's sparkle + opacity, either of which means reverse-alpha would leave a dark mark: + + 1. Dark-background black pit (issue #30): the numerator + ``watermarked - alpha*logo`` over the sparkle body. A brightening overlay + can never make it negative, so a large negative fraction means the fixed + alpha over-subtracts past black. + 2. Mid-tone dark diamond (see ``_OVERSUB_DARK_MARGIN``): on a mid-tone + background the over-subtraction darkens the core well below the background + without any pixel crossing zero, so case 1 misses it. Predict the + reverse-alpha core output and trip when it lands far below the local ring. """ placed = self._footprint_indices(alpha_map, position, image.shape) if placed is None: @@ -720,7 +746,18 @@ class GeminiEngine: roi = image[y1:y2, x1:x2].astype(np.float32) numerator = roi.mean(axis=2) - np.clip(alpha_roi, 0.0, 0.99) * self.logo_value frac = float((numerator[body] < 0).sum()) / float(body.sum()) - return frac > self._OVERSUB_FOOTPRINT_FRAC + if frac > self._OVERSUB_FOOTPRINT_FRAC: + return True + + # Mid-tone darkening: predict the reverse-alpha output at the bright core and + # compare to the local background ring (reuses the FP-gate / alpha-gain machinery). + cb = self._core_and_bg(image, alpha_map, position) + if cb is None: + return False + core_obs, bg, a_cap = cb + a = min(a_cap, 0.99) + predicted_core = (core_obs - a * self.logo_value) / (1.0 - a) + return predicted_core < bg - self._OVERSUB_DARK_MARGIN def _inpaint_footprint( self, diff --git a/tests/test_gemini_engine.py b/tests/test_gemini_engine.py index ce746eb..4aa64cb 100644 --- a/tests/test_gemini_engine.py +++ b/tests/test_gemini_engine.py @@ -298,6 +298,28 @@ class TestOverSubtractionGuard: dalpha = self.engine.get_interpolated_alpha(dpos[2]) assert self.engine._reverse_alpha_oversubtracts(dark, dalpha, (dpos[0], dpos[1])) is True + def test_midtone_background_does_not_leave_dark_diamond(self): + """2026-06-18 prod report: a faint sparkle on a MID-TONE background was + over-subtracted into a darker-than-background diamond ("the color just + changed, not removed"). No footprint pixel crosses zero there, so the + numerator gate misses it -- the dark-margin gate must catch it and inpaint. + """ + image, (x, y, w, h) = self._composite_sparkle(bg_value=160) + footprint = image[y : y + h, x : x + w] + # The numerator (black-pit) gate alone does NOT fire on a mid-tone background. + alpha = self.engine.get_interpolated_alpha(w) + roi = footprint.astype(np.float32).mean(axis=2) + body = alpha[:h, :w] >= self.engine._FOOTPRINT_ALPHA + numerator = roi - np.clip(alpha[:h, :w], 0.0, 0.99) * self.engine.logo_value + assert float((numerator[body] < 0).sum()) / float(body.sum()) <= self.engine._OVERSUB_FOOTPRINT_FRAC + # ...but the over-subtraction guard still trips (via the dark-margin path) and + # removal leaves the footprint reading like the mid-tone background, not darker. + assert self.engine._reverse_alpha_oversubtracts(image, alpha, (x, y)) is True + out = self.engine.remove_watermark(image) + cleaned = out[y : y + h, x : x + w] + assert abs(float(cleaned.mean()) - 160.0) < 15.0, f"dark diamond: mean={cleaned.mean()}" + assert int(cleaned.min()) > 160 - 30, f"dark pit: min={cleaned.min()}" + class TestUnderSubtractionGain: """Under-subtraction fix: a sparkle MORE opaque than the captured alpha must not