Files
remove-ai-watermarks/docs/module-internals.md
T
Victor Kuznetsov 41f67973ce fix(visible): inpaint mid-tone Gemini sparkle instead of a dark diamond
The free `visible` path over-subtracted a faint Gemini sparkle on a
mid-tone background into a darker-than-background brown diamond instead
of removing it (2026-06-18 prod NPS report, "the watermark was not
removed, just its color changed"). The existing over-subtraction guard
only tripped when reverse-alpha drove a footprint pixel fully negative
(the issue #30 dark-background black-pit case); on a mid-tone background
the over-subtraction darkens the core well below the background without
any pixel crossing zero, so the gate missed it and shipped the dark mark.

Add a second over-subtraction signal to `_reverse_alpha_oversubtracts`:
predict the reverse-alpha output at the bright core, (core - a*logo)/(1-a),
and route to the footprint inpaint when it lands more than
`_OVERSUB_DARK_MARGIN` (25) gray levels below the local background ring.
Calibrated wide: clean removals predict within ~12 of background
(demo_banana ~-1), the prod regression ~-40, the issue #30 dark case ~-82.
Corpus-validated on the 479 detected Gemini images: 10 switch reverse-alpha
to inpaint, all of them dark-diamond cases that improve or match; the
other 469 stay byte-identical. demo_banana stays on the reverse-alpha
path (byte-identical).

Also crop both reverse-alpha helpers to the region they actually touch,
a pure O(image) -> O(mark) win that is byte-identical to the full-frame
math (a uint8<->float32 round-trip is exact):
- `GeminiEngine._core_and_bg` converts only the footprint+ring crop to
  gray, not the whole frame (~70 ms -> 0.1 ms on a 12 MP image; it runs
  for both the alpha-gain estimate and the new gate). Verified identical
  across 479 images; detector confidence unchanged.
- `TextMarkEngine._apply_reverse_alpha` computes the blend on the glyph
  crop only (`amap` is zero outside it, so the math is a no-op there):
  ~275 ms -> ~2 ms per placement on a 12 MP frame, up to 2 placements per
  removal. Verified identical across 142 Doubao/Jimeng placements.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 17:19:41 -07:00

61 KiB
Raw Blame History

Module internals

Relocated verbatim from CLAUDE.md on 2026-06-11 to keep the always-loaded context small. Long single-line entries were reformatted into paragraphs; no content was changed or summarized.

Full per-module detail: design decisions, tuned thresholds, calibration history, incident records, and the regression-guard map. The compact module list lives in CLAUDE.md; read the relevant section here before changing a module.

noai/c2pa.py

noai/c2pa.py — C2PA reading, official c2pa-python Reader first, hand-rolled parser as fallback (migrated 2026-06-18; the official lib is a core dep, MIT/Apache, spec-tracking). read_manifest_store_json(path) runs Reader.try_create with a default Context (NO trust enforcement — we report what is in the file, we do not gate on cert trust) and returns the whole manifest-store JSON (every manifest plus ingredient manifests); it is memoized per (path, mtime) (lru_cache(maxsize=8)) because one identify/get_ai_metadata call invokes the structured parser ~3x on the same file. extract_c2pa_info(path) builds its dict from that store JSON (_info_from_store_json: structured claim_generator from the active manifest's claim_generator / claim_generator_info[].name, timestamp from signature_info.time) and falls back to the legacy caBX parser (_extract_c2pa_info_png) when the reader is unavailable (broken/absent wheel, reader_available() False) or finds no parseable manifest (synthetic/partial test blobs, the inject round-trip's re-stitched chunk). Both paths share _populate_registry_fields(buf, info) — the issuer / AI-tool / action / source-type / SynthID / soft-binding registry byte-scan applied to the store JSON (reader path) or the raw caBX bytes (fallback) — so the return-dict shape is identical and the registry stays the single source of truth. Whole-store scanning is load-bearing: a ChatGPT edit of a Sora generation keeps trainedAlgorithmicMedia + issuer "OpenAI" on the parent/ingredient manifest, not the active "opened" one (the active manifest's signature_info.issuer is "OpenAI", common_name "Truepic Lens CLI in Sora", so the issuer field now reads "OpenAI, Truepic" — first-match-wins platform attribution still resolves OpenAI). extract_c2pa_info now also serves non-PNG containers (JPEG/AVIF/MP4) structurally via the reader; the consumers (identify, synthid_source, get_ai_metadata) already merge info OR byte-scan, so this strictly upgrades the non-PNG path with no double-counting. synthid_watermark/synthid_vendors is set when the manifest is signed by a SynthID-using vendor on AI content; soft_binding/soft_binding_vendors when a c2pa.soft-binding alg names a forensic-watermark vendor (soft_binding_vendors_in(buffer) is the shared byte-scan, used by both paths and the non-PNG binary path). extract_c2pa_chunk / inject_c2pa_chunk / has_c2pa_metadata stay the PNG caBX byte tools (raw-chunk extraction for extractor.py, test injection, fallback detection). PNG/caBX chunk reads are clamped to the remaining file size (safe_length = min(length, remaining); skipped chunks use seek) so a malformed huge length cannot drive a multi-GB allocation (shared safety discipline matching isobmff.scan_c2pa_region). Regression-guarded by tests/test_noai.py::TestC2PARealSamples::{test_extract_info_uses_reader_store,test_fallback_to_png_parser_when_reader_unavailable}.

noai/constants.py

noai/constants.py — PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, and C2PA_AI_VENDORS — the single C2paAiVendor registry of C2PA-signing vendors (issuer byte, resolved org name, the identify platform label, and a synthid flag), from which C2PA_ISSUERS, SYNTHID_C2PA_ISSUERS (issuers that pair SynthID with C2PA: Google, OpenAI), and identify._ISSUER_PLATFORM are all derived — plus C2PA_SOFT_BINDINGS (soft-binding alg prefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new C2PA vendor as one C2PA_AI_VENDORS entry (never edit the derived dicts), a new soft-binding to C2PA_SOFT_BINDINGS; not inline.

metadata.py

metadata.pyscan_head(path, size=1MB) is the shared input for every C2PA/AIGC/IPTC byte scan: first size bytes plus the payloads of any provenance metadata found beyond that window — for ISOBMFF, the late provenance boxes from isobmff.scan_c2pa_region (catches a manifest after a large mdat); for PNG, the late tEXt/iTXt/zTXt/eXIf/iCCP chunks from _png_late_metadata (catches an XMP/EXIF packet appended after a large IDAT, e.g. a TC260 AIGC label at ~2.7 MB). Behavior-neutral (f.read(size)) for non-ISOBMFF inputs and for any file that fits within size. Use it instead of open().read(1MB) for any new marker scan.

Memoized per (path, size, mtime) (added 2026-06-09, _scan_head_cached lru_cache, maxsize=8): one identify/get_ai_metadata call fans out to ~8 byte-scan detectors that each re-read the same file head, so the cache turns those into a single read; the mtime key invalidates on change, a stat failure falls back to an uncached read. synthid_source(path) returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). get_ai_metadata surfaces the verdict, and metadata --check prints it as a callout. Both get_ai_metadata and has_ai_metadata guard the PIL open with except Exception (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. xai_signature(path) detects xAI/Grok's EXIF-only scheme (ImageDescription = Signature: <base64> + UUID Artist); it feeds has_ai_metadata, get_ai_metadata (key xai_signature), and identify. iptc_ai_system(path) detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (IPTC_AI_FIELD_MARKERS = AISystemUsed/AISystemVersionUsed/AIPromptInformation/AIPromptWriterName) and returns the AISystemUsed generator name (or "fields present"). remove_ai_metadata routes ISOBMFF video (.mp4/.mov/.m4v) through the same isobmff.strip_c2pa_boxes as AVIF/HEIF (MP4 is ISOBMFF), and _scrub_ai_exif removes the xAI signature + AI-generator EXIF tags on JPEG output. strip_c2pa_boxes is fail-safe on a malformed box: it returns the original bytes unchanged with a logged warning instead of truncating the tail to EOF (detection-only scan_c2pa_region still stops at a malformed box). _png_late_metadata clamps each late-chunk read to the remaining file size (safe_length = min(length, remaining)) so a malformed length cannot drive a multi-GB allocation, AND advances the cursor by safe_length (not the raw length) so an inflated length cannot jump past EOF and abort the scan, silently skipping a genuine AI-label chunk after it.

identify.py

identify.py — the OpenAI rollout caveat is keyed on _vendor_of(synthid) == "OpenAI" (not a raw substring over the issuer + verdict blob). identify(path) aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, C2PA cloud-manifest reference via metadata.c2pa_cloud_manifest — signal c2pa_cloud, medium, provenance-only (does NOT set is_ai, excluded from ai_from_metadata + clash vendors): a C2PA 2.4 Durable-Content-Credentials case where the embedded manifest is stripped but an XMP dcterms:provenance pointer to the vendor's cloud manifest store (_C2PA_MANIFEST_REPOSITORIES, today cai-manifests.adobe.com → "Adobe Content Authenticity") survives, so the credentials stay recoverable server-side; only emitted when no embedded manifest already attributed the file — surfaced on 2 corpus PNGs 2026-06-10 that read fully unknown before, IPTC "Made with AI" + IPTC 2025.1 AISystemUsed, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via metadata.xai_signature, the China TC260 AIGC label via metadata.aigc_label, the HuggingFace hf-job-id job marker via metadata.huggingface_job, the Samsung Galaxy AI editing marker via metadata.samsung_genai, the visible marks — Gemini sparkle plus the ByteDance Doubao 豆包AI生成 / Jimeng 即梦AI / Samsung Galaxy AI "Contenuti generati dall'AI" text marks via the watermark_registry — open invisible watermark, Adobe TrustMark via trustmark_detector) into one ProvenanceReport. is_ai_generated is True or None (never asserted False — stripped metadata is not proof of clean origin). The hf_job, visible-mark, and Samsung samsung_genai signals are medium confidence: each lifts an otherwise-Unknown verdict to a tentative AI (hf_only / visible_only / samsung_only, parallel branches; visible_only fires on any visible_* signal) but is excluded from the high-confidence ai_from_metadata set, so none overrides a hard metadata signal.

Visible-mark detection (check_visible, signals visible_sparkle / visible_doubao / visible_jimeng / visible_samsung): the Gemini sparkle keeps its own file-level path (_visible_sparklegemini_engine.detect_sparkle_confidence, promoted only at confidence ≥ _SPARKLE_THRESHOLD 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng/Samsung reuse the registry detectors (_visible_text_markswatermark_registry, iterating _VISIBLE_MARK_PLATFORM), each gated by its own engine NCC threshold via MarkDetection.detected (Doubao 0.4, Jimeng 0.45, Samsung 0.4). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label and Samsung by its C2PA + genAIType marker, so the visible path is their stripped-metadata fallback. Visible marks set platform only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here.

import identify is deliberately light (~26 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full check_visible run): it imports the noai.c2pa/noai.constants submodules, and noai/__init__ is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full gpu/detect install — fits a 512 MB host. noai.c2pa does eagerly import the c2pa-python binary (Rust + cryptography, ~+5 MB RSS, no torch) for the primary Reader path — light enough to stay on the dependency-light host; a broken/absent wheel degrades to the byte-scan parser (reader_available() False). The heavy paths are opt-in: check_invisible=True needs the detect/trustmark extras (each pulls torch; TrustMark also downloads weights), so on a core-only deploy leave check_invisible off (it is a no-op there anyway). Before the lazy __init__, the mere presence of torch in the env inflated import identify to ~420 MB.

C2PA platform attribution is device-token-first, issuer-scan fallback (_device_platform scans manifest bytes for _DEVICE_C2PA_PLATFORM tokens, then _attribute_platform/_ISSUER_PLATFORM).

Why, verified on real signed files 2026-05-26: the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead.

Token distinctiveness is load-bearing: bare b"Truepic" mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI chatgpt-1.png fixture), so the token is the specific b"Truepic_Lens" from the Lens SDK claim generator; likewise b"Pixel Camera" (cert CN) not bare b"Pixel". _DEVICE_C2PA_PLATFORM lists ONLY tokens verified against a real C2PA file: Leica (lc_c2pa/Leica Camera), Nikon (NIKON), Pixel (Pixel Camera -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (sony.sig/sony.cert -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (Truepic_Lens). Canon/Bria have no public direct-download C2PA sample (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the sony.* namespace but are not separately verified.

Samsung Galaxy + ASUS Gallery live in a separate _SIGNER_C2PA_PLATFORM (scanned after _device_platform, before the issuer fallback), NOT in _DEVICE_C2PA_PLATFORM — verified on real signed files 2026-05-29. Reason: a Galaxy phone stamps BOTH its device cert AND a trainedAlgorithmicMedia/genAIType AI marker on a Generative-Edit image, so treating it as a "genuine camera capture" would false-fire integrity-clash rule 2 on every Galaxy AI edit. The signer tokens (b"Samsung Galaxy" cert org — distinct from the EXIF SM-xxxx model string on ordinary Samsung photos; b"com.asus.gallery" claim generator) only resolve the platform label; the AI verdict still comes from the source-type / genAIType. ASUS Gallery is a C2PA-signed edit with no AI marker, so it attributes the platform without asserting is_ai.

Samsung's genAIType (in the proprietary PhotoEditor_Re_Edit_Data JSON) is an undocumented Galaxy-AI editing marker (metadata.samsung_genai, gated on the PhotoEditor_Re_Edit_Data container; non-zero value = AI tool used, values {1,5} observed): medium-confidence because the field has no public spec (verified 2026-05-29: absent from C2PA spec + Samsung docs), but it co-occurred with trainedAlgorithmicMedia in 3/3 verified files that record a source-type and was the SOLE AI marker on a Galaxy S24 file that omits the source type. Camera C2PA marks capture authenticity, not AI (Pixel carries computationalCapture, not trainedAlgorithmicMedia), so these never set is_ai -- that stays driven by digital-source-type. c2pa.cbor_text_after (now public) is best-effort for the generator detail string only and can be None when the manifest keys it claim_generator_info (Pixel).

Issuer→generator mapping is is_ai-gated (_attribute_platform(issuers, is_ai=c2pa_is_ai)): a specific AI-generator platform is named only when the digital-source-type is trainedAlgorithmicMedia; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an unmapped Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). _attribute_platform defaults is_ai=True so the mapping stays unit-testable in isolation. Add capture-camera tokens to _DEVICE_C2PA_PLATFORM, editing-app/AI-device signer tokens to _SIGNER_C2PA_PLATFORM, generator/issuer platforms to the C2PA_AI_VENDORS registry in constants.py (which derives _ISSUER_PLATFORM), not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (_issuers_in) and generator (_ai_tools_in, reusing C2PA_AI_TOOLS) are recovered by binary-scanning the first MB. EXIF Software / Make / Artist / ImageDescription and XMP CreatorTool generator tags are read by metadata.exif_generator (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against AI_GENERATOR_TOKENS so ordinary editors (plain "Adobe Photoshop") and real-camera Make ("Apple"/"Canon") are not flagged.

Ideogram tags its output with EXIF Make="Ideogram AI" (verified on a real download 2026-05-24) — that's why Make is read.

Integrity-clash detection (_integrity_clashes, surfaced as ProvenanceReport.integrity_clashes, printed in red by identify and serialized to --json): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIF Make="Ideogram AI"), and (2) a camera-capture C2PA device (_DEVICE_C2PA_PLATFORM) coexisting with an AI-generation marker from a source INDEPENDENT of the camera's own manifest.

Rule 2's independence gate (added 2026-06-11): a device that both captures and runs on-device generative AI (Google Pixel Magic Editor / Pixel Studio) records the capture AND the AI edit in ONE C2PA manifest — so the AI vendor is named only from that same manifest (c2pa issuer + synthid proxy, both c2pa_manifest source) — a legitimate edit chain, NOT a clash. Rule 2 therefore fires only when some ai_vendor_claims family has a source != "c2pa_manifest" (EXIF/XMP generator, IPTC, TC260 AIGC, a second manifest naming AI on a camera capture — the real laundering tell). This killed a false-positive class on the corpus: 2 real Pixel generative-edit PNGs (computationalCapture + trainedAlgorithmicMedia + "Applied imperceptible SynthID watermark" in one Google manifest) read as camera-vs-AI clashes before the gate. Pure cameras (Leica/Sony/Nikon/Truepic) that do NOT generate AI still clash on any within-manifest AI marker only if it is independent — they never legitimately carry one, so the gate is behavior-neutral for them while fixing Pixel (regression-guarded by test_identify.py::TestIntegrityClashesHelper::{test_pixel_generative_edit_same_manifest_no_clash,test_camera_plus_independent_ai_marker_still_clashes} + TestIntegrityClashEndToEnd::test_pixel_generative_edit_no_clash).

Independence is source-grouped (_CLASH_SOURCE, added 2026-06-02): the C2PA issuer attribution (c2pa) and the SynthID proxy (synthid) are NOT independent — the proxy is inferred from the same manifest — so they share one source and two vendors named within a single manifest do not clash. This killed a false-positive class found on the spaces corpus: legitimate multi-actor manifests where a product wraps another vendor's engine (Microsoft Designer on OpenAI → OpenAI, Microsoft; Microsoft on Google → Microsoft, Google LLC, Google C2PA Core Generator Library) or an edit chain re-signs (Adobe over a Gemini original → Adobe c2pa + Google synthid) — 19 such files across the 2026-06-01/02 batches read as clashes before the fix. Rule 1 still fires when a manifest vendor disagrees with a genuinely independent stamp (EXIF/XMP generator, IPTC AISystemUsed, AIGC, xAI); each non-c2pa/synthid family is its own source (test_identify.py::TestIntegrityClashes::{test_multi_actor_manifest_no_clash,test_manifest_vendor_vs_independent_signal_clashes}). Vendor normalization is _vendor_of over _AI_VENDOR_TOKENS (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash).

High-precision by design: only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC AISystemUsed, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are excluded (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved platform (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce zero clashes (false-positive guard in test_identify.py::TestRealSamplesHaveNoClash).

watermark_registry.py

watermark_registry.pysingle catalog of known visible watermarks, the unified "find known marks in their usual places, recognize, remove" entry.

Reverse-alpha based by policy: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (original = (wm - a*logo)/(1-a)) — Gemini recovers cleanly with no inpaint (its sparkle alpha comes from a pure-black capture, so it is near-exact), while Doubao, Jimeng, and Samsung all add an always-on THIN residual inpaint over the glyph footprint (their text marks re-rasterize + jitter a few px per image, so a single capture cannot pixel-cancel them; the inpaint blends into the reverse-alpha-recovered pixels). Arbitrary-region inpainting still lives in region_eraser/erase. Each KnownMark ties a key to {usual location, in_auto flag, recovery (="reverse-alpha"), a detect adapter → uniform MarkDetection, a remove adapter}. Entries today: gemini (bottom-right sparkle), doubao (bottom-right "豆包AI生成"), jimeng (bottom-right "★ 即梦AI"), and samsung (bottom-LEFT "✦ Contenuti generati dall'AI", Samsung Galaxy AI, Italian locale). detect_marks scans all; best_auto_mark picks the highest-confidence detection.

Cross-engine confidences aren't directly comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (_GEMINI_AUTO_MIN_CONF) for its detected flag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacks auto. The shape-keyed Doubao/Jimeng/Samsung NCC detectors don't cross-fire (jimeng scores ~0.22 on the Doubao strip, well under its 0.45 threshold; Samsung is bottom-left so it shares no corner with the others, and scored 0.0 on Doubao/Jimeng captures and they 0.0 on a real Samsung photo), so auto picks the right one. cli.cmd_visible is registry-driven: --mark autobest_auto_mark, --mark <key> → that mark; --mark choices come from mark_keys().

cli._remove_visible_auto is the shared visible-removal helper used by cmd_all/cmd_batch too (they no longer hardcode GeminiEngine), so all/batch remove Doubao/Jimeng/Samsung text marks, not just the Gemini sparkle (regression-guarded by test_all_visible_step_uses_registry). The three text-mark adapters were consolidated 2026-06-09: a single _text_mark(key, label, location) builds the registry row from one parameterized _text_mark_detect/_text_mark_remove pair (reverse-alpha only when detected/forced AND reverse_alpha_available, else skipped — no inpaint); the gemini adapters stay bespoke. Add a new visible mark = one _text_mark(...) row + its TextMarkConfig (with a captured alpha map); do not re-add per-mark if branches or copy-paste adapters.

Alpha-on-save policy (issue #30): cli._write_bgr_with_alpha rejoins the input's alpha plane unchanged — it must NOT zero alpha in the watermark bbox. Reverse-alpha (and erase inpaint) recover real pixels there, so zeroing alpha punched a transparent hole that renders as a solid white box on any non-transparent viewer (Gemini app exports are opaque RGBA, so every user hit it; regression-guarded by test_visible_keeps_alpha_opaque_in_watermark_region). The registry remove() still returns its region (used for inpaint_residual positioning), but the CLI no longer uses it to clear alpha.

gemini_engine.py

gemini_engine.py — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). detect_sparkle_confidence(path) is the file-level entry point used by identify.py. The public entry points normalize a grayscale (2D) or RGBA (4-channel) input to BGR up front so a non-BGR image does not crash the cv2 pipeline.

Detection localization (issue #36): detect_watermark's global multi-scale NCC search applies a size weight ((scale/96)**0.5) that suppresses tiny-patch false positives but can let a larger, mediocre match (e.g. a bright collar in a portrait) outrank a small, near-perfect sparkle in the corner — so a faint sparkle on a busy background scored below threshold and read as clean (the regression osachub reported from widening the search window 256px->512px between v0.7.2 and v0.8.8). _corner_promote adds a bottom-right-corner raw-NCC pass on top of the global search: a match with raw NCC >= _CORNER_PROMOTE_NCC 0.85 that beats the global pick overrides it (it only ever replaces a lower-fidelity pick, so it cannot weaken an existing detection), rescuing the buried sparkle without reverting the wider window. The corner side is relative-clamped (_CORNER_PROMOTE_FRAC 0.20 of the short side, clamped to [_CORNER_PROMOTE_MIN 96, _CORNER_PROMOTE_MAX 384]): a fixed 256px is a true corner on a large image but covers ~70% of a small portrait, where a real photo raw-matches the star at ~0.81 (relative tightening drops that worst case to ~0.69, while the upper clamp stops the corner ballooning on huge images where a real photo reached ~0.83 at 512px). The 0.85 gate sits midway between the worst real-photo corner match (~0.78 across native + downscaled negatives) and a genuine faint sparkle (~0.93), so promotion adds true detections with zero corpus false positives (Gemini's sparkle sits ~60-160px from the corner at fixed margins, covered by the [96, 384] band at every measured size). Regression-guarded by test_gemini_engine.py::TestCornerPromotion.

Top-K fusion selection (osachub follow-up 2026-06-12): _corner_promote's 0.85 raw-NCC gate still missed a class the 256->512 widening exposed — a genuine MID-scale sparkle whose raw NCC sits below 0.85 but is buried by a LARGER, low-fidelity decoy that wins the size weight. The reporter's image (a scale-48 sparkle on light bedding) measured spatial 0.775 / grad 0.960 / fusion 0.676 at the true sparkle, but the size-weighted argmax instead locked onto a decoy at spatial 0.628 / grad 0.036 (fusion 0.325) — so identify read unknown on v0.8-0.11 where v0.7.2 (256px window) had caught it at 0.676. Fix: detect_watermark now keeps the top-_SELECT_TOPK (3) size-weighted candidates (NMS-deduped by location) plus the corner-promote candidate, scores EACH by the full fusion (spatial+gradient+variance) via the extracted _grad_var_scores helper, and selects the highest — the gradient term (the discriminator a contrast-invariant spatial NCC lacks) lifts the true sparkle over the decoy. Critically, selection ranks by the SIZE-WEIGHTED score, NOT raw NCC: a raw-NCC argmax (tried first) re-admitted the exact tiny-patch (scale 16-18) false positives the size weight exists to suppress — it flagged 14/65 doubao + 4/11 jimeng visible-corpus images (non-Gemini content) as Gemini sparkles. Top-K keeps tiny-patch suppression intact: a coincidental 16px match never ranks in the size-weighted top-K, so widening selection added zero flips on the doubao/jimeng corpora and left the 495-image Gemini set unchanged (479 detected, both before and after) while recovering the reporter's image. Regression-guarded by test_gemini_engine.py::TestCornerPromotion::test_low_gradient_decoy_loses_to_high_gradient_corner_sparkle (mirrors the real spatial/grad signature via a monkeypatched scan) and test_size_weighted_search_alone_traps_on_the_decoy.

Square-image residual misses are NOT fixable by lowering the detector threshold (measured + REJECTED 2026-06-11): osachub (#36 follow-up) reported the corner-promote still misses Gemini sparkles on Google square (1:1) outputs. Reproduced on the spaces corpus: of 330 square Google-C2PA images, 140 score below the identify 0.5 threshold, and visual review confirmed a real class -- faint white sparkles on dark/textured/colored backgrounds (raw NCC 0.46-0.73, below the 0.85 promote gate) landing at fusion conf 0.41-0.47. A margin-gated promote (promote when raw NCC >= 0.50 AND _core_ring_margin >= 40) rescued 32/33 confirmed misses at an apparent 0 FP, but that 0 was a measurement artifact -- the negative set was the margin<40 misses, which a margin>=40 gate excludes by construction. On an honest 518-image non-Google pool the same gate fired on ~174 (≈33%), visually content (screenshots, Chinese "AI生成" Doubao/Jimeng text marks, logos, bright textures), not sparkles. Adding an achromatic-core constraint (chroma <= 15) did not separate them either (kept 15/33 POS, 41 NEG still firing). Root cause is the documented contrast-invariant-NCC wall: a faint sparkle on a busy background is indistinguishable from a bright/ornate content corner at the (shape-NCC, brightness-margin, core-chroma) feature level.

Conclusion: keep the 0.85 corner gate; do NOT add a margin/chroma-gated lower promote.

The cost (mislabel ~8-33% of non-Gemini content as Gemini) outweighs the benefit -- the visible sparkle is a medium-confidence stripped-metadata fallback, and intact Gemini is caught by C2PA in identify regardless. Remaining square misses are an accepted known limitation; a real fix would need a sparkle-specific discriminator (template match on a background-subtracted image, or a hard fixed-margin position prior), which is open research, not a threshold tweak.

Removal is reverse-alpha with an over-subtraction guard (remove_watermark_reverse_alpha_blend, else _inpaint_footprint): the sparkle alpha is computed (alpha = max(R,G,B)/255) from the bundled sparkle-on-black captures assets/gemini_bg_{96,48}.png (the capture max is ~130, NOT 255 — the sparkle is a ~51%-opaque white overlay, so alpha maxes at ~0.51, which is CORRECT for the capture, not under-exposed). The alpha is near-exact only when the real mark's effective opacity matches the capture, which holds on bright/flat backgrounds — re-verified clean on demo_banana_before.png 2026-05-31.

Issue #30 (dark-background black pit): on a dark/textured background (e.g. grass, ~73) the real sparkle's effective opacity is LOWER than the captured 0.51, so the fixed-alpha reverse blend OVER-subtracts (watermarked - a*logo goes negative) and drives the footprint to black — the white sparkle becomes a black diamond. remove_watermark now detects this via _reverse_alpha_oversubtracts (fraction of footprint pixels with alpha >= _FOOTPRINT_ALPHA 0.1 whose numerator < 0 exceeds _OVERSUB_FOOTPRINT_FRAC 0.05) and inpaints the footprint (_inpaint_footprint, cv2 NS over the dilated alpha mask) from the surrounding pixels instead.

Behavior-neutral on the working case: a bright background over-subtracts at ~0% so reverse-alpha is used and the output is byte-identical to before (verified: demo_banana 0.0 frac vs issue-#30 grass 0.61 frac; regression-guarded by test_gemini_engine.py::TestOverSubtractionGuard, which composites the sparkle at a reduced effective alpha to reproduce the mismatch).

Under-subtraction (the symmetric case, fixed 2026-06-03): some real Gemini sparkles are rendered MORE opaque than the captured ~0.51, so the fixed-alpha reverse blend UNDER-subtracts and leaves a bright sparkle residual the detector still fires on (measured on the spaces corpus: a visible-removal audit through the registry path left a detectable sparkle on a meaningful fraction of marks, all under-removals, NOT a background-brightness class — failures and successes had the same input confidence and the same background-luma distribution; the discriminator was the removal delta itself). remove_watermark now estimates a per-image alpha gain (_estimate_alpha_gain: effective sparkle opacity at the bright core vs the local background ring, a_eff/a_cap, clamped [1.0, _ALPHA_GAIN_MAX 1.94]) and scales the alpha to match before the over-sub/blend branch. The gain cleanly separates on the corpus (under-removed marks ~1.47, cleanly-removed ~1.00), and a deadband (_ALPHA_GAIN_DEADBAND 1.05) keeps a matching sparkle byte-identical to the pre-fix output, so the fix is purely additive (0 regressions on the audit set; the over-sub guard still runs on the scaled alpha as the safety net for an over-shooting estimate). Regression-guarded by test_gemini_engine.py::TestUnderSubtractionGain (composites a more-opaque-than-capture sparkle; asserts on footprint pixels, NOT the detector — the detector's NCC is degenerate on a flat synthetic background, so a re-detect conf is meaningless there; the real corpus removal drops the detector from ~0.80 to ~0.27).

False-positive gate (added 2026-06-03): detect_watermark's shape-only NCC (spatial*0.5 + gradient*0.3 + var*0.2) fires on ornate/flat content (text strips, banners, hatching) that coincidentally matches the diamond shape — a real Gemini sparkle is a bright WHITE overlay, so its core sits above the local background, but the NCC is contrast-invariant and cannot see that. The fusion now demotes (caps confidence to 0.30) any match that is BOTH low-confidence (< _SPARKLE_FP_CONF 0.65) AND has a low core-ring brightness margin (_core_ring_margin < _SPARKLE_FP_MARGIN 5). Real sparkles escape via EITHER high confidence (white-bg sparkles score ≥0.79 despite a low margin — the NCC shape match is strong) OR high margin (dark/mid backgrounds, incl. the #36 faint-corner case, lift well clear), so BOTH must fail to demote. The gate is monotonic (only ever removes detections, never adds), so it cannot regress the verified-negative corpus (already 0 FPs). On the spaces corpus it demoted 16/495 flagged sparkles (13 carried no AI metadata = content FPs; the 3 AI-meta were visually FPs / a near-invisible white-on-white sparkle whose AI verdict is held by metadata anyway), and dropped the removal-audit failures 20→15 (post-removal flat footprints the NCC re-fired on). _core_ring_margin and _estimate_alpha_gain share the _core_and_bg helper (core 75th-pct brightness vs background-ring median). Regression-guarded by test_gemini_engine.py::TestSparkleFalsePositiveGate.

Self-verify repair (added 2026-06-04): the gain estimate corrects most under-subtractions, but a tail of strong sparkles still survived reverse-alpha (position jitter, or a gain the [1.0, 1.94] clamp could not fully reach). After the reverse blend, remove_watermark re-detects via _verify_and_repair; when a sparkle at or above _VERIFY_FALLBACK_CONF 0.5 (the registry's real fail line) remains, it inpaints the footprint and keeps that only when it lowers the re-detect confidence — purely additive (the common clean removal re-detects below 0.5 and is returned untouched, so it can never regress). On the spaces corpus this rescued 4 of the 15 remaining gemini removal-audit failures (15→11, doubao/jimeng still 0), verified through the registry/CLI path. Costs one extra detect_watermark per removal (two when the fallback fires). Regression-guarded by test_gemini_engine.py::TestVerifyAndRepair (stubs detect_watermark to drive the keep-best control flow, since the NCC is degenerate on flat synthetics).

An offset+scale alignment search was prototyped on the remaining 11 fails and REJECTED (2026-06-04): an audit "ceiling" test suggested it could rescue 4 more (e.g. a5a9 0.577→0.417), but direct inspection showed those were NCC-gaming, not removal — the lower-scoring placement left the sparkle as bright or BRIGHTER (a5a9: first-pass slot 99.5th-pct ~76 at background level, the "aligned win" slot ~164), it just reshaped the residual so the contrast-invariant shape-NCC scored lower. A slot-brightness sanity gate rejected every one, so alignment contributed 0 genuine rescues and was removed (the footprint inpaint stays because it physically reconstructs the slot from its darker surroundings, so its rescues are real).

Lesson: the visible-audit pass/fail metric (re-detect conf < 0.5) is gameable by reshaping the residual — optimizing it directly finds NCC-gaming placements, not clean removals; gate any removal candidate on a physical brightness check, not the detector alone.

The 11 survivors are near-white ill-conditioning (reverse-alpha divides by 1-a≈0.02) or detector false positives (before≈after≈0.51) that no reverse-alpha placement fixes. The registry's optional inpaint_residual (edge cleanup) is a no-op on a clean reverse-alpha removal (and on the same corpus it lowered the re-detect conf on 3 marks, raised it on 10, no-op on 466 — net-neutral on pass/fail, so the self-verify repair, not it, drives the removal tail); an earlier "Gemini smears" read was a misjudged soft-fur original, not an artifact.

The bg assets are now rebuilt from OUR OWN controlled captures (data/gemini_capture/captures/, committed) by scripts/visible_alpha_solve.py gemini, which locates the 96px sparkle on the black capture and crops it to the two logo sizes; our capture matched the previously third-party-sourced gemini_bg_96.png to NCC 0.9998, validating the asset and making it reproducible. Gemini's multi-size fixed-slot model is genuinely different from the Doubao/Jimeng text-strip engines (so it stays a separate engine, not part of the shared-base refactor).

_text_mark_engine.py

_text_mark_engine.pyshared base for the three reverse-alpha text-mark engines (Doubao/Jimeng/Samsung), extracted 2026-06-09 (they were ~90% byte-identical clones). TextMarkEngine(config: TextMarkConfig) owns the whole locate → extract_mask → detect → _fixed/_aligned_alpha_map → _apply_reverse_alpha → remove_watermark_reverse_alpha pipeline (+ the asset-keyed load_alpha_template/glyph_silhouette/template_match_score caches). Each engine module is now a thin subclass: it supplies only its TextMarkConfig (the tuned constants, the bundled asset, and the bounded structural deltas — corner br/bl, margin_floor 4/2, morph_open_size 5/3, min_gw 8/16) plus the test-facing module shims (_alpha_template/_glyph_silhouette/_template_match_score + the constants). Behavior is byte-exact vs the old per-engine code (the three engine test suites pass unchanged). Gemini stays a SEPARATE engine (its multi-size fixed-slot sparkle model is genuinely different). Add a new text mark = a new TextMarkConfig + a thin subclass + one registry _text_mark(...) row. The engine bullets below describe each mark's calibration history; the LOGIC lives here.

_apply_reverse_alpha runs on the glyph crop only: amap is zero outside the glyph region (x, y, w, h), so the blend is a no-op there ((wm - 0)/(1 - 0) == wm, and a uint8→float32→uint8 round-trip is exact). It copies the frame through and computes the reverse-alpha math on the region crop only — byte-identical to the old full-frame pass (verified: Doubao 130 + Jimeng 22 placements, 0 mismatches) but O(glyph) not O(image). The full-frame pass cost ~275 ms on a 12 MP frame for a glyph that is <0.1% of it, once per candidate placement (fixed + aligned ≈ 2×/removal); the crop drops that to ~2 ms. Mirror of the Gemini _core_and_bg crop. remove_watermark_reverse_alpha passes the region each _fixed/_aligned_alpha_map returns.

doubao_engine.py

doubao_engine.pya thin _text_mark_engine.TextMarkEngine subclass (config only) since 2026-06-09. visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). DoubaoEngine.locate anchors a bottom-right box by geometry (mark scales with image WIDTH), extract_mask pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy sat = roi.max(axis=2) - roi.min(axis=2) (no HSV conversion). detect is shape-consistent: it matches the bundled alpha glyph silhouette (assets/doubao_alpha.png) against the candidate via zero-mean normalized correlation (_template_match_score, cv2 TM_CCOEFF_NORMED), gated at DETECT_NCC_THRESHOLD 0.4 over a small DETECT_MIN_COVERAGE floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243).

Removal = reverse-alpha + thin residual inpaint (remove_watermark_reverse_alpha): original = (wm - a*logo)/(1-a) from the bundled alpha map + _ALPHA_LOGO_BGR (pure white) + _ALPHA_*_FRAC geometry, then a deliberately THIN inpaint (_RESIDUAL_*, INPAINT_NS) over the glyph footprint clears leftover edges without smearing.

Alpha is rebuilt by scripts/visible_alpha_solve.py (the careful gray-self solve: cubic background fit, mean over channels, full halo, unblurred), same recipe as Jimeng — the captures are committed in data/doubao_capture/captures/.

Removal aligns ALWAYS (no _ALPHA_NATIVE_BAND fast-path): it tries fixed geometry AND _aligned_alpha_map's TM_CCOEFF_NORMED scale+position search and keeps the lower-residual one — the mark is re-rasterized and a few px off per image, so fixed geometry alone leaves a visible outline even at 2048.

The locate box (WM_*) is generous (0.22 wide, margins 0.004) and reaches close to the corner — a tight box (the old 0.185 / margin 0.012) let a corner-ward shift fall OUTSIDE the alignment search, so the align missed and a readable outline survived; regression-guarded by test_recovers_shifted_mark_on_texture (composes the alpha shifted on a known texture; old box ~29 vs new ~1 mean residual).

Issue #13 follow-up defect (found 2026-05-31): the SHIPPED Doubao removal left a clearly READABLE "豆包AI生成" outline on the real doubao-1.png sample, while detect returned conf 0.0 (it is fooled by a thin outline) so test_reverse_alpha_removes_mark passed and the old "56/56 clean" claim was detector-measured, not visual.

Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight box; the careful rebuild + always-align + thin inpaint + wide box takes it from a readable outline to faint texture-level traces (parity with Jimeng — a single capture cannot pixel-cancel a per-image re-rasterized mark).

Lesson: a detector-only removal test is insufficient; assert visual residual (the textured-shift test). extract_mask guards a degenerate ROI (bh < 16 or bw < 16 -> empty mask, skips cv2): the always-align removal scores each placement with a residual detect(out), and on an extremely wide/short image (e.g. 2048x1, test_wide_short_does_not_raise) that fed cv2's GaussianBlur a ~1-px-tall ROI and faulted natively on Windows py3.12 (access violation, non-deterministic — one CI cell went red while a re-run passed); the old at-native path never ran detect on degenerate sizes. Real images always clear the guard (the WM_* box floors are max(16, …) height / max(40, …) width), so it only short-circuits slivers. reverse_alpha_available is just "asset present"; the registry gates removal on detect. The shipped third-party _refs/zhengsuanfa_doubao_alpha_120x20.png is NOT a usable alpha (verified 2026-05-29). Arbitrary-region inpainting is region_eraser/erase.

jimeng_engine.py

jimeng_engine.pya thin TextMarkEngine subclass (config only) since 2026-06-09. visible Jimeng / Dreamina "★ 即梦AI" remover/detector (cv2/numpy, no GPU), built 2026-05-30 from issue #13's solid captures (@powersee). Shares the base with doubao_engine: locate anchors a bottom-right box by geometry (scales with WIDTH), extract_mask pulls the light low-chroma glyphs (white top-hat + grayish + min-luma), detect matches the bundled "即梦AI" glyph silhouette (assets/jimeng_alpha.png) via TM_CCOEFF_NORMED over a coverage floor. Threshold DETECT_NCC_THRESHOLD 0.45 cleanly separates real Jimeng marks (>=0.81) from the Doubao strip (0.21) and other AI output (0.0), so the two ByteDance marks don't cross-fire in --mark auto.

Logo is pure white (255,255,255) (_ALPHA_LOGO_BGR; the white capture + an L-pair-solve confirm ~254.6); compositing is sRGB, not linear (a linear-light solve tripled the cross-residual).

Alpha rebuilt by scripts/visible_alpha_solve.py from the GRAY capture (data/jimeng_capture/captures/, the solid captures now committed): a = (I - B)/(255 - B), B a per-capture cubic background fit over the non-glyph pixels, averaged over channels, full halo extent (down to a~0.02), unblurred. Gray (bg ~132) is the deliberate choice over black: it is the best proxy for real content (the mark sits on bright photo areas, not on black), and the careful build drops the gray self-residual to ~1.3.

The mask quality, not the method, was the earlier limit — a max-channel / quadratic-bg / blurred / halo-truncated build (and a black-dominated LS) left a visible outline (lesson from issue #13: when reverse-alpha leaves a ghost, suspect the captured alpha map before adding heuristics or switching method). Geometry emitted by the solver at _ALPHA_NATIVE_WIDTH 2048: _ALPHA_WIDTH_FRAC 0.202, _ALPHA_HEIGHT_FRAC 0.058, margins ~0.029.

Removal = reverse-alpha + a deliberately THIN residual inpaint (remove_watermark_reverse_alpha, _RESIDUAL_DILATE 5 over the _RESIDUAL_ALPHA_FLOOR 0.05 footprint, _RESIDUAL_INPAINT_RADIUS 2, INPAINT_NS): a single 2048 alpha cannot pixel-cancel the mark re-rasterized at another resolution (alpha maps from independent captures correlate 0.998, not 1.0; off-native reverse-alpha alone only halves the mark), so a tight inpaint clears the residual edges WITHOUT the texture/edge smear a wide full-footprint pass caused.

Placement ALWAYS tries fixed geometry AND _aligned_alpha_map's NCC scale+position search, keeping the lower-residual — the mark re-rasterizes + jitters a few px per image even at the captured width, so fixed geometry alone misses (there is no _ALPHA_NATIVE_BAND fast-path; the scale search _ALPHA_ALIGN_SEARCH is fine-stepped, and the WM_* locate box is generous so a corner-ward shift stays inside the search — the same widen that fixed Doubao). Verified clean on the solid captures (native 2048; faint self-residual ~1.3 visible only on a dead-flat field, hidden by real texture) and a real 1440-wide Jimeng download (off-native, table edge preserved). reverse_alpha_available is just "asset present"; the registry gates on detect.

No committed real sample (the real content download stays gitignored; only the solid calibration captures are committed) — tests/test_jimeng_engine.py synthesizes a mark from the bundled alpha asset, and test_recovers_shifted_mark_on_texture guards the align-on-shift path that the Doubao defect exposed. Jimeng images are independently caught by the China TC260 AIGC label in metadata/identify, so this engine is the visible-mark removal path, not a new identify signal.

samsung_engine.py

samsung_engine.pya thin TextMarkEngine subclass (config only) since 2026-06-09. visible Samsung Galaxy AI "✦ Contenuti generati dall'AI" remover/detector (cv2/numpy, no GPU), built 2026-06-05 from issue #37's flat captures (@f-liva). Shares the base but anchored bottom-LEFT (Doubao/Jimeng are bottom-right): locate anchors a bottom-left box by geometry (scales with WIDTH), extract_mask pulls the light low-chroma glyphs (white top-hat + grayish + min-luma — LOGO_MIN_LUMA is lowered to 110 because the mark is faint, peak alpha ~0.38, so on a mid/dark background its glyph luma is lower than Jimeng's), detect matches the bundled glyph silhouette (assets/samsung_alpha.png) via TM_CCOEFF_NORMED over a coverage floor. Threshold DETECT_NCC_THRESHOLD 0.40 (real marks ~0.79 on a real photo, ~0.57/0.71 on the black/gray captures; 0.0 on Doubao/Jimeng captures, and Doubao/Jimeng score 0.0 on a real Samsung photo — no cross-fire, also because the corner differs).

Logo is pure white (255,255,255) (_ALPHA_LOGO_BGR; white capture confirms).

Alpha solved by scripts/visible_alpha_solve.py samsung from the GRAY capture (data/samsung_capture/captures/, the flat black/gray/white captures committed; the solver gained a corner="bl" mode + left-margin logging for this), same careful recipe as Jimeng (cubic background, mean-channel, full halo, unblurred). Geometry emitted at _ALPHA_NATIVE_WIDTH 1086 (the flat-edit capture width): _ALPHA_WIDTH_FRAC 0.3195, _ALPHA_HEIGHT_FRAC 0.0378, _ALPHA_MARGIN_LEFT_FRAC 0.0110, _ALPHA_MARGIN_BOTTOM_FRAC 0.0064.

Removal = reverse-alpha + a deliberately THIN residual inpaint (remove_watermark_reverse_alpha, same _RESIDUAL_* recipe as Jimeng) with always-try fixed AND _aligned_alpha_map NCC scale+position search, keep the lower-residual (_ALPHA_ALIGN_SEARCH widened to (0.85, 1.18, 23) because the flat captures are far off the real-photo width).

Resolution caveat: the flat captures arrived at 1086 wide while real photos are ~2958 wide (the mark scales with width, so the captured glyph ~334px is ~2.7x smaller than the ~903px real-photo glyph); width-scale + NCC-align still removes it cleanly (verified on a real 2958-wide @f-liva photo: re-detect 0.79→0.00, no readable text or outline on the recovered wooden table — checked visually, not just by the detector, per the Gemini self-verify lesson), but a flat capture at the real photo resolution would make the alpha pixel-sharp instead of upscaled (open quality upgrade, noted in data/samsung_capture/README.md).

The mark is locale-specific (text differs per language); this build is the Italian "Contenuti generati dall'AI" variant — other locales need their own captured template. reverse_alpha_available is just "asset present"; the registry gates on detect.

No committed real sample (the real photo stays gitignored; only the flat calibration captures are committed) — tests/test_samsung_engine.py synthesizes a mark from the bundled alpha asset (bottom-left geometry), with test_recovers_shifted_mark_on_texture guarding the align-on-shift path. Samsung Galaxy AI edits are independently caught by C2PA + the genAIType marker in metadata/identify, so this engine is the visible-mark removal path; it also feeds identify as the medium-confidence visible_samsung signal via the registry (the stripped-metadata fallback).

region_eraser.py

region_eraser.py — universal region eraser (erase CLI). erase(image, boxes=|mask=, backend=) accepts grayscale (2D) and RGBA (4-channel) inputs on both backends (erase_cv2 and erase_lama each split off any alpha plane and re-attach it unchanged, and promote grayscale to BGR for processing — LaMa would otherwise crash on grayscale and drop alpha on BGRA): boxes_to_maskcv2.inpaint (cv2 backend, default, no deps) or big-LaMa via onnxruntime (lama backend, extra lama, Carve/LaMa-ONNX Apache-2.0 model downloaded on first use, never bundled). erase_lama crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy _get_lama_session singleton; lama_available() guards the optional import.

LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU (FFC working set, not arena — enable_cpu_mem_arena=False does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal.

invisible_watermark.py

invisible_watermark.pydetect_invisible_watermark(path) decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via the imwatermark library. Known fixed patterns (verified against upstream source) live in _BITS_48 (SDXL 48-bit, FLUX.2 48-bit) and _SD1_STRING ("StableDiffusionV1", SD 1.x/2.x). Optional dep (extra detect); returns None when absent. The detect extra pulls torch transitively (invisible-watermark declares torch a hard dep, and WatermarkDecoder eagerly imports rivaGan -> torch at import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separate gpu extra required.

Unlike SynthID this is locally detectable, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs.

trustmark_detector.py

trustmark_detector.pydetect_trustmark(path) decodes the OPEN, keyless Adobe TrustMark watermark (the soft binding behind Adobe Durable Content Credentials, alg com.adobe.trustmark.P) via the optional trustmark package (extra trustmark; pulls torch, downloads model weights on first use). Mirrors invisible_watermark.py (lazy singleton guarded by a double-checked threading.Lock so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects provenance, not AI origin as such (TrustMark also marks human-authored content), so identify lists it as a watermark without setting is_ai_generated. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only named via the C2PA_SOFT_BINDINGS scan, not decoded.

False-positive gate (added 2026-05-29): TrustMark's wm_present is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a durable soft binding engineered to survive re-encoding, so detect_trustmark re-decodes after a mild JPEG round-trip (_survives_reencode, _REENCODE_QUALITY 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise.

noai/watermark_remover.py

noai/watermark_remover.py — the WatermarkRemover class has two diffusion pipelines, selected by the explicit pipeline ctor arg (NOT inferred from model_id -- both use the same SDXL base, DEFAULT_MODEL_ID).

sdxl (renamed from default 2026-06-09; default kept as a back-compat alias via normalize_profile) runs plain SDXL img2img (_run_img2img); it is the lighter opt-down alternative (no ControlNet weights).

controlnet (the DEFAULT pipeline since 2026-06-09 for invisible/all/batch and both engine ctors; _run_controlnet, _load_controlnet_pipeline) runs StableDiffusionXLControlNetImg2ImgPipeline with the SDXL-native canny ControlNet xinsir/controlnet-canny-sdxl-1.0 (watermark_profiles.CONTROLNET_CANNY_MODEL): the control image is cv2.Canny(gray, 100, 200) stacked to 3 channels (_CANNY_LOW/_CANNY_HIGH, prompt _CONTROLNET_PROMPT / _CONTROLNET_NEGATIVE).

Removal comes from the img2img regeneration (strength); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.

No original pixels are copied or frozen, BUT validation 2026-06-04 disproved the old "so SynthID does not survive" claim: SynthID CAN survive controlnet on photoreal/high-detail content.

At the shared low removal strength the canny edge-conditioning keeps the regeneration so close to the original that the pixel perturbation that destroys SynthID does not happen (oracle-confirmed: an OpenAI bracelet photo + a 9-face grid read SynthID-detected after controlnet at strength 0.10/0.15, but SynthID-not-detected after the default pipeline at the SAME strength + resolution -- only the pipeline differed).

But the reverse also holds: a flat-graphic logo/poster SURVIVED default while clearing controlnet -- removal at the low strength is content×pipeline dependent and neither pipeline is universally safe; the real lever is a higher strength. See the controlnet Known-limitations bullet for the full table + root cause. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity). The drifted cleaned face is the LEAST-AI state we can reach without re-introducing SynthID; the library does NOT ship a face-restore extra. Every restore approach we evaluated (GFPGAN-on-cleaned, PhotoMaker-V2 txt2img, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps, 2026-06-04 - 2026-06-08 Modal cert sweeps) regenerated the face from an ArcFace embedding via SDXL diffusion -- which makes the output face look MORE AI-generated, not less. Empirical conclusion in docs/synthid-robust-identity-research-2026-06-08.md "Empirical follow-up". For production face preservation, ship the cleaned image as-is. controlnet_conditioning_scale (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as default (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE _SDXL_FP16_VAE_ID is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once).

auto_config.py (REMOVED 2026-06-09)

auto_config.py + the content-detection layer were REMOVED 2026-06-09.

History: auto_config.plan() was a content-adaptive planner that detected faces/text/edges (bundled OpenCV YuNet + PP-OCRv3 DBNet models) to route the pipeline and toggle the adaptive polish. Once controlnet became the default-and-only auto pipeline (it no longer downgrades a structure-less image to sdxl) and the adaptive polish was confirmed to self-gate by detail level (humanizer.adaptive_polish no-ops when the cleaned image already meets the input's Laplacian variance, so it does real work only on over-smoothed photo/face texture and ~nothing on text/flat), the detection no longer changed any behavior — it only annotated a reason string. So the whole layer was deleted: auto_config.py, tests/test_auto_config.py, and the two detection assets (assets/face_detection_yunet_2023mar.onnx, assets/text_detection_ppocrv3_2023may.onnx, ~2.6 MB).

--auto is now a DEPRECATED no-op (cli._resolve_auto_polish): controlnet is already the default pipeline AND the adaptive polish is ON by default, so --auto has nothing left to do — it only prints a deprecation warning and passes adaptive_polish through unchanged (an explicit --no-adaptive-polish still wins). (Originally it re-enabled the polish; once the polish default flipped to ON the same day, the parameter-source branch became dead and was dropped.) The adaptive polish itself lives on in humanizer.adaptive_polish (CLI --adaptive-polish/--no-adaptive-polish, ON by default since 2026-06-09 — it self-gates to a no-op where there is no detail deficit, so default-on is safe; uses the full-res original as the detail reference) — see the humanizer test note. batch resolves the polish once before the loop (one warning) and caches the invisible engine per pipeline (ctx.obj["_inv_engines"]).

upscaler.py

upscaler.py — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). is_available() gates on spandrel+torch (via importlib.util.find_spec); upscale(bgr, device=None) loads a lazily-built spandrel ImageModelDescriptor singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (RealESRGAN_x2plus.pth, BSD-3-Clause) download on first use to the torch.hub checkpoints cache; never bundled. Used only when UPscaling to the min_resolution floor (a max_resolution downscale always uses Lanczos). The wiring is InvisibleEngine._esrgan_upscale(pil, target) — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default --upscaler is lanczos (cv2, no deps).

ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it best fits photo/texture content and can degrade faces (glassy/asymmetric eyes -- the diffusion pass regenerates faces so the full-pipeline final recovers) and thin/small text (the GAN invents wrong strokes, and low-strength diffusion will not fix it). Verified 2026-06-04: isolated upscale lap-var ~5x Lanczos on faces+textures but glassy eyes; end-to-end invisible final lap-var 1634 vs Lanczos 663 with natural faces (diffusion cleaned the artifact). Kept a manual opt-in knob (the auto plan never selects it) with lanczos the default; not content-gated by design (use Lanczos for text-heavy inputs). spandrel is MIT and pulls no basicsr. Unit-tested without the model: tests/test_upscaler.py (availability guard + the not-installed RuntimeError) and tests/test_invisible_engine.py::TestEsrganUpscale (the three _esrgan_upscale branches via a monkeypatched upscaler).

image_io.py

image_io.py — Unicode-safe cv2 IO (issue #17). imread(path, flags=None) / imwrite(path, img) wrap np.fromfile+cv2.imdecode / cv2.imencode+tofile so non-ASCII paths work on Windows -- bare cv2.imread/cv2.imwrite use the platform ANSI code-page API there and fail (empty decode + can't open/read file) on Chinese/Cyrillic/accented filenames. imread keeps cv2.imread semantics (defaults to IMREAD_COLOR, returns None on missing/empty/undecodable).

Every cv2 file read/write in the package routes through here; do not call cv2.imread/cv2.imwrite directly.

imwrite returns False on an unwritable path (OSError caught) instead of raising, matching cv2.imwrite semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows).

to_bgr(image) (added 2026-06-09) is the shared channel normalizer: promotes 2D grayscale / (h,w,1) / 4-channel BGRA to 3-channel BGR (a 3-channel input is returned unchanged, no copy). Use it instead of inlining the cvtColor(GRAY2BGR/BGRA2BGR) branch — the gemini engine and the TextMarkEngine base both route through it so a grayscale/BGRA input (a real Gemini-app export is opaque RGBA) does not crash the axis=2 channel reductions. cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env.