diff --git a/CLAUDE.md b/CLAUDE.md index af9831d..6dd7e89 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -10,7 +10,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector - `uv run remove-ai-watermarks metadata --check` — inspect AI metadata (C2PA, EXIF, PNG chunks) - `uv run remove-ai-watermarks metadata --remove -o ` — strip all AI metadata -- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, and `--restore-faces/--no-restore-faces` + `--restore-faces-weight` for the GFPGAN face-identity post-pass +- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, `--restore-faces/--no-restore-faces` + `--restore-faces-weight` for the GFPGAN face-identity post-pass, and `--auto` (+ `--adaptive-polish/--no-adaptive-polish`) for the content-adaptive quality mode (re-planned per image; one engine cached per resolved pipeline) ## Test and lint @@ -28,6 +28,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - GPU/ML modules (invisible_engine, watermark_remover) are optional — guard imports with `is_available()` checks - Optional detection extras: `detect` (imwatermark — open SD/SDXL/FLUX watermark) and `trustmark` (Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded by `is_available()` and skipped by `identify` when absent. - Optional `restore` extra (gfpgan/facexlib/basicsr): the GFPGAN face-identity post-pass (`face_restore.py`, CLI `--restore-faces`, **EXPERIMENTAL, opt-in, OFF by default**). Guarded by `face_restore.is_available()`; when enabled it auto-skips with a debug log when the extra is absent or no face is detected. numpy<2-pinned and Python-3.12-pinned (see the `face_restore.py` Key-modules bullet). +- Optional `esrgan` extra (spandrel only): Real-ESRGAN pre-diffusion super-resolution for small inputs (`upscaler.py`, CLI `--upscaler esrgan` on `invisible`/`all`/`batch`). Guarded by `upscaler.is_available()`; the default upscaler stays Lanczos (cv2, no deps) and the engine falls back to Lanczos when the extra is absent or the model errors. spandrel is MIT and pulls NO basicsr (only torch/torchvision/safetensors/numpy/einops), sidestepping the `restore` extra's basicsr breakage; Real-ESRGAN weights are BSD-3-Clause and download on first use via `torch.hub` (never bundled). Kept OUT of `all` (heavy + model download), same as `restore`. - Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask`/`adaptive_polish` (`test_humanizer.py`), `auto_config.plan`/detectors (`test_auto_config.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do. ## Key modules @@ -37,7 +38,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `metadata.py` — `scan_head(path, size=1MB)` is the shared input for every C2PA/AIGC/IPTC byte scan: first `size` bytes plus the payloads of any provenance metadata found beyond that window — for ISOBMFF, the late provenance boxes from `isobmff.scan_c2pa_region` (catches a manifest after a large `mdat`); for **PNG**, the late `tEXt`/`iTXt`/`zTXt`/`eXIf`/`iCCP` chunks from `_png_late_metadata` (catches an XMP/EXIF packet appended after a large `IDAT`, e.g. a TC260 AIGC label at ~2.7 MB). Behavior-neutral (`f.read(size)`) for non-ISOBMFF inputs and for any file that fits within `size`. Use it instead of `open().read(1MB)` for any new marker scan. `synthid_source(path)` returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). `get_ai_metadata` surfaces the verdict, and `metadata --check` prints it as a callout. Both `get_ai_metadata` and `has_ai_metadata` guard the PIL open with `except Exception` (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. `xai_signature(path)` detects xAI/Grok's EXIF-only scheme (`ImageDescription` = `Signature: ` + UUID `Artist`); it feeds `has_ai_metadata`, `get_ai_metadata` (key `xai_signature`), and `identify`. `iptc_ai_system(path)` detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (`IPTC_AI_FIELD_MARKERS` = `AISystemUsed`/`AISystemVersionUsed`/`AIPromptInformation`/`AIPromptWriterName`) and returns the `AISystemUsed` generator name (or `"fields present"`). `remove_ai_metadata` routes **ISOBMFF video** (`.mp4`/`.mov`/`.m4v`) through the same `isobmff.strip_c2pa_boxes` as AVIF/HEIF (MP4 is ISOBMFF), and `_scrub_ai_exif` removes the xAI signature + AI-generator EXIF tags on JPEG output. `strip_c2pa_boxes` is **fail-safe** on a malformed box: it returns the original bytes unchanged with a logged warning instead of truncating the tail to EOF (detection-only `scan_c2pa_region` still stops at a malformed box). `_png_late_metadata` clamps each late-chunk read to the remaining file size (`safe_length = min(length, remaining)`) so a malformed `length` cannot drive a multi-GB allocation. - `identify.py` — the OpenAI rollout caveat is keyed on `_vendor_of(synthid) == "OpenAI"` (not a raw substring over the issuer + verdict blob). `identify(path)` aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1 `AISystemUsed`, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via `metadata.xai_signature`, the China TC260 AIGC label via `metadata.aigc_label`, the HuggingFace `hf-job-id` job marker via `metadata.huggingface_job`, the Samsung Galaxy AI editing marker via `metadata.samsung_genai`, the visible marks — Gemini sparkle plus the ByteDance Doubao 豆包AI生成 / Jimeng 即梦AI text marks via the `watermark_registry` — open invisible watermark, Adobe TrustMark via `trustmark_detector`) into one `ProvenanceReport`. `is_ai_generated` is True or None (never asserted False — stripped metadata is not proof of clean origin). The `hf_job`, visible-mark, and Samsung `samsung_genai` signals are **medium** confidence: each lifts an otherwise-Unknown verdict to a tentative AI (`hf_only` / `visible_only` / `samsung_only`, parallel branches; `visible_only` fires on any `visible_*` signal) but is excluded from the high-confidence `ai_from_metadata` set, so none overrides a hard metadata signal. **Visible-mark detection** (`check_visible`, signals `visible_sparkle` / `visible_doubao` / `visible_jimeng`): the Gemini sparkle keeps its own file-level path (`_visible_sparkle` → `gemini_engine.detect_sparkle_confidence`, promoted only at confidence ≥ `_SPARKLE_THRESHOLD` 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng reuse the registry detectors (`_visible_text_marks` → `watermark_registry`), each gated by its own engine NCC threshold via `MarkDetection.detected` (Doubao 0.4, Jimeng 0.45). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label, so the visible path is their stripped-metadata fallback. Visible marks set `platform` only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here. **`import identify` is deliberately light** (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full `check_visible` run): it imports only the pure `noai.c2pa`/`noai.constants` submodules, and `noai/__init__` is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full `gpu`/`detect` install — fits a 512 MB host. The heavy paths are opt-in: `check_invisible=True` needs the `detect`/`trustmark` extras (each pulls **torch**; TrustMark also **downloads weights**), so on a core-only deploy leave `check_invisible` off (it is a no-op there anyway). Before the lazy `__init__`, the mere presence of torch in the env inflated `import identify` to ~420 MB. **C2PA platform attribution is device-token-first, issuer-scan fallback** (`_device_platform` scans manifest bytes for `_DEVICE_C2PA_PLATFORM` tokens, then `_attribute_platform`/`_ISSUER_PLATFORM`). **Why, verified on real signed files 2026-05-26:** the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. **Token distinctiveness is load-bearing:** bare `b"Truepic"` mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI `chatgpt-1.png` fixture), so the token is the specific `b"Truepic_Lens"` from the Lens SDK claim generator; likewise `b"Pixel Camera"` (cert CN) not bare `b"Pixel"`. `_DEVICE_C2PA_PLATFORM` lists ONLY tokens **verified against a real C2PA file**: Leica (`lc_c2pa`/`Leica Camera`), Nikon (`NIKON`), Pixel (`Pixel Camera` -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (`sony.sig`/`sony.cert` -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (`Truepic_Lens`). Canon/Bria have **no public direct-download C2PA sample** (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the `sony.*` namespace but are not separately verified. **Samsung Galaxy + ASUS Gallery live in a separate `_SIGNER_C2PA_PLATFORM` (scanned after `_device_platform`, before the issuer fallback), NOT in `_DEVICE_C2PA_PLATFORM`** — verified on real signed files 2026-05-29. Reason: a Galaxy phone stamps BOTH its device cert AND a `trainedAlgorithmicMedia`/genAIType AI marker on a Generative-Edit image, so treating it as a "genuine camera capture" would false-fire integrity-clash rule 2 on every Galaxy AI edit. The signer tokens (`b"Samsung Galaxy"` cert org — distinct from the EXIF `SM-xxxx` model string on ordinary Samsung photos; `b"com.asus.gallery"` claim generator) only resolve the platform label; the AI verdict still comes from the source-type / genAIType. ASUS Gallery is a C2PA-signed edit with no AI marker, so it attributes the platform without asserting `is_ai`. **Samsung's `genAIType` (in the proprietary `PhotoEditor_Re_Edit_Data` JSON) is an undocumented Galaxy-AI editing marker** (`metadata.samsung_genai`, gated on the `PhotoEditor_Re_Edit_Data` container; non-zero value = AI tool used, values {1,5} observed): medium-confidence because the field has no public spec (verified 2026-05-29: absent from C2PA spec + Samsung docs), but it co-occurred with `trainedAlgorithmicMedia` in 3/3 verified files that record a source-type and was the SOLE AI marker on a Galaxy S24 file that omits the source type. Camera C2PA marks capture authenticity, not AI (Pixel carries `computationalCapture`, not `trainedAlgorithmicMedia`), so these never set `is_ai` -- that stays driven by digital-source-type. `c2pa.cbor_text_after` (now public) is best-effort for the `generator` detail string only and can be None when the manifest keys it `claim_generator_info` (Pixel). **Issuer→generator mapping is `is_ai`-gated** (`_attribute_platform(issuers, is_ai=c2pa_is_ai)`): a specific AI-generator platform is named only when the digital-source-type is `trainedAlgorithmicMedia`; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an *unmapped* Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). `_attribute_platform` defaults `is_ai=True` so the mapping stays unit-testable in isolation. Add capture-camera tokens to `_DEVICE_C2PA_PLATFORM`, editing-app/AI-device signer tokens to `_SIGNER_C2PA_PLATFORM`, generator/issuer platforms to the `C2PA_AI_VENDORS` registry in `constants.py` (which derives `_ISSUER_PLATFORM`), not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (`_issuers_in`) and generator (`_ai_tools_in`, reusing `C2PA_AI_TOOLS`) are recovered by binary-scanning the first MB. EXIF `Software` / `Make` / `Artist` / `ImageDescription` and XMP `CreatorTool` generator tags are read by `metadata.exif_generator` (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against `AI_GENERATOR_TOKENS` so ordinary editors (plain "Adobe Photoshop") and real-camera `Make` ("Apple"/"Canon") are not flagged. **Ideogram tags its output with EXIF `Make="Ideogram AI"`** (verified on a real download 2026-05-24) — that's why `Make` is read. **Integrity-clash detection** (`_integrity_clashes`, surfaced as `ProvenanceReport.integrity_clashes`, printed in red by `identify` and serialized to `--json`): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by **independent** signals (e.g. C2PA OpenAI + EXIF `Make="Ideogram AI"`), and (2) a camera-capture C2PA device (`_DEVICE_C2PA_PLATFORM`) coexisting with any AI-generation marker. **Independence is source-grouped (`_CLASH_SOURCE`, added 2026-06-02):** the C2PA issuer attribution (`c2pa`) and the SynthID proxy (`synthid`) are NOT independent — the proxy is inferred from the *same* manifest — so they share one source and two vendors named within a single manifest do not clash. This killed a false-positive class found on the spaces corpus: legitimate multi-actor manifests where a product wraps another vendor's engine (Microsoft Designer on OpenAI → `OpenAI, Microsoft`; Microsoft on Google → `Microsoft, Google LLC, Google C2PA Core Generator Library`) or an edit chain re-signs (Adobe over a Gemini original → Adobe c2pa + Google synthid) — 19 such files across the 2026-06-01/02 batches read as clashes before the fix. Rule 1 still fires when a manifest vendor disagrees with a genuinely independent stamp (EXIF/XMP generator, IPTC `AISystemUsed`, AIGC, xAI); each non-`c2pa`/`synthid` family is its own source (`test_identify.py::TestIntegrityClashes::{test_multi_actor_manifest_no_clash,test_manifest_vendor_vs_independent_signal_clashes}`). Vendor normalization is `_vendor_of` over `_AI_VENDOR_TOKENS` (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). **High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`). - `watermark_registry.py` — **single catalog of known visible watermarks**, the unified "find known marks in their usual places, recognize, remove" entry. **Reverse-alpha based by policy**: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (`original = (wm - a*logo)/(1-a)`) — Gemini recovers cleanly with no inpaint (its sparkle alpha comes from a pure-black capture, so it is near-exact), while **Doubao and Jimeng both add an always-on THIN residual inpaint** over the glyph footprint (their text marks re-rasterize + jitter a few px per image, so a single capture cannot pixel-cancel them; the inpaint blends into the reverse-alpha-recovered pixels). Arbitrary-region inpainting still lives in `region_eraser`/`erase`. Each `KnownMark` ties a key to {usual `location`, `in_auto` flag, `recovery` (="reverse-alpha"), a `detect` adapter → uniform `MarkDetection`, a `remove` adapter}. Entries today: `gemini` (bottom-right sparkle), `doubao` (bottom-right "豆包AI生成"), and `jimeng` (bottom-right "★ 即梦AI"). `detect_marks` scans all; `best_auto_mark` picks the highest-confidence detection. **Cross-engine confidences aren't directly comparable**, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (`_GEMINI_AUTO_MIN_CONF`) for its `detected` flag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacks `auto`. The shape-keyed Doubao/Jimeng NCC detectors don't cross-fire (jimeng scores ~0.22 on the Doubao strip, well under its 0.45 threshold), so `auto` picks the right one on a Doubao vs Jimeng image. `cli.cmd_visible` is registry-driven: `--mark auto` → `best_auto_mark`, `--mark ` → that mark; `--mark` choices come from `mark_keys()`. `_doubao_remove`/`_jimeng_remove` apply reverse-alpha only when the mark is detected AND `reverse_alpha_available`; outside that, removal is **skipped** (not inpainted). Add a new visible mark = one `KnownMark` entry + its engine (with a captured alpha map); do not re-add per-mark `if` branches in the CLI. **Alpha-on-save policy (issue #30):** `cli._write_bgr_with_alpha` rejoins the input's alpha plane **unchanged** — it must NOT zero alpha in the watermark bbox. Reverse-alpha (and `erase` inpaint) recover real pixels there, so zeroing alpha punched a transparent hole that renders as a solid **white box** on any non-transparent viewer (Gemini app exports are opaque RGBA, so every user hit it; regression-guarded by `test_visible_keeps_alpha_opaque_in_watermark_region`). The registry `remove()` still returns its region (used for `inpaint_residual` positioning), but the CLI no longer uses it to clear alpha. -- `gemini_engine.py` — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). `detect_sparkle_confidence(path)` is the file-level entry point used by `identify.py`. The public entry points normalize a grayscale (2D) or RGBA (4-channel) input to BGR up front so a non-BGR image does not crash the cv2 pipeline. **Detection localization (issue #36):** `detect_watermark`'s global multi-scale NCC search applies a size weight (`(scale/96)**0.5`) that suppresses tiny-patch false positives but can let a larger, mediocre match (e.g. a bright collar in a portrait) outrank a small, near-perfect sparkle in the corner — so a faint sparkle on a busy background scored below threshold and read as clean (the regression osachub reported from widening the search window 256px->512px between v0.7.2 and v0.8.8). `_corner_promote` adds a bottom-right-corner raw-NCC pass on top of the global search: a match with raw NCC >= `_CORNER_PROMOTE_NCC` 0.85 that beats the global pick overrides it (it only ever replaces a lower-fidelity pick, so it cannot weaken an existing detection), rescuing the buried sparkle without reverting the wider window. The corner side is **relative-clamped** (`_CORNER_PROMOTE_FRAC` 0.20 of the short side, clamped to `[_CORNER_PROMOTE_MIN` 96, `_CORNER_PROMOTE_MAX` 384`]`): a fixed 256px is a true corner on a large image but covers ~70% of a small portrait, where a real photo raw-matches the star at ~0.81 (relative tightening drops that worst case to ~0.69, while the upper clamp stops the corner ballooning on huge images where a real photo reached ~0.83 at 512px). The 0.85 gate sits midway between the worst real-photo corner match (~0.78 across native + downscaled negatives) and a genuine faint sparkle (~0.93), so promotion adds true detections with zero corpus false positives (Gemini's sparkle sits ~60-160px from the corner at fixed margins, covered by the [96, 384] band at every measured size). Regression-guarded by `test_gemini_engine.py::TestCornerPromotion`. **Removal is reverse-alpha with an over-subtraction guard** (`remove_watermark` → `_reverse_alpha_blend`, else `_inpaint_footprint`): the sparkle alpha is computed (`alpha = max(R,G,B)/255`) from the bundled sparkle-on-black captures `assets/gemini_bg_{96,48}.png` (the capture max is ~130, NOT 255 — the sparkle is a ~51%-opaque white overlay, so `alpha` maxes at ~0.51, which is CORRECT for the capture, not under-exposed). The alpha is near-exact only when the real mark's effective opacity matches the capture, which holds on bright/flat backgrounds — re-verified clean on `demo_banana_before.png` 2026-05-31. **Issue #30 (dark-background black pit):** on a dark/textured background (e.g. grass, ~73) the real sparkle's effective opacity is LOWER than the captured 0.51, so the fixed-alpha reverse blend OVER-subtracts (`watermarked - a*logo` goes negative) and drives the footprint to black — the white sparkle becomes a black diamond. `remove_watermark` now detects this via `_reverse_alpha_oversubtracts` (fraction of footprint pixels with `alpha >= _FOOTPRINT_ALPHA` 0.1 whose numerator < 0 exceeds `_OVERSUB_FOOTPRINT_FRAC` 0.05) and **inpaints the footprint** (`_inpaint_footprint`, cv2 NS over the dilated alpha mask) from the surrounding pixels instead. **Behavior-neutral on the working case:** a bright background over-subtracts at ~0% so reverse-alpha is used and the output is byte-identical to before (verified: demo_banana 0.0 frac vs issue-#30 grass 0.61 frac; regression-guarded by `test_gemini_engine.py::TestOverSubtractionGuard`, which composites the sparkle at a reduced effective alpha to reproduce the mismatch). **Under-subtraction (the symmetric case, fixed 2026-06-03):** some real Gemini sparkles are rendered MORE opaque than the captured ~0.51, so the fixed-alpha reverse blend UNDER-subtracts and leaves a bright sparkle residual the detector still fires on (measured on the spaces corpus: a visible-removal audit through the registry path left a detectable sparkle on a meaningful fraction of marks, all under-removals, NOT a background-brightness class — failures and successes had the same input confidence and the same background-luma distribution; the discriminator was the removal delta itself). `remove_watermark` now estimates a per-image alpha gain (`_estimate_alpha_gain`: effective sparkle opacity at the bright core vs the local background ring, `a_eff/a_cap`, clamped `[1.0, _ALPHA_GAIN_MAX` 1.94`]`) and scales the alpha to match before the over-sub/blend branch. The gain cleanly separates on the corpus (under-removed marks ~1.47, cleanly-removed ~1.00), and a deadband (`_ALPHA_GAIN_DEADBAND` 1.05) keeps a matching sparkle **byte-identical** to the pre-fix output, so the fix is purely additive (0 regressions on the audit set; the over-sub guard still runs on the scaled alpha as the safety net for an over-shooting estimate). Regression-guarded by `test_gemini_engine.py::TestUnderSubtractionGain` (composites a more-opaque-than-capture sparkle; **asserts on footprint pixels, NOT the detector** — the detector's NCC is degenerate on a flat synthetic background, so a re-detect conf is meaningless there; the real corpus removal drops the detector from ~0.80 to ~0.27). **False-positive gate (added 2026-06-03):** `detect_watermark`'s shape-only NCC (`spatial*0.5 + gradient*0.3 + var*0.2`) fires on ornate/flat content (text strips, banners, hatching) that coincidentally matches the diamond shape — a real Gemini sparkle is a bright WHITE overlay, so its core sits above the local background, but the NCC is contrast-invariant and cannot see that. The fusion now **demotes** (caps confidence to 0.30) any match that is BOTH low-confidence (`< _SPARKLE_FP_CONF` 0.65) AND has a low core-ring brightness margin (`_core_ring_margin < _SPARKLE_FP_MARGIN` 5). Real sparkles escape via EITHER high confidence (white-bg sparkles score ≥0.79 despite a low margin — the NCC shape match is strong) OR high margin (dark/mid backgrounds, incl. the #36 faint-corner case, lift well clear), so BOTH must fail to demote. The gate is **monotonic** (only ever removes detections, never adds), so it cannot regress the verified-negative corpus (already 0 FPs). On the spaces corpus it demoted 16/495 flagged sparkles (13 carried no AI metadata = content FPs; the 3 AI-meta were visually FPs / a near-invisible white-on-white sparkle whose AI verdict is held by metadata anyway), and dropped the removal-audit failures 20→15 (post-removal flat footprints the NCC re-fired on). `_core_ring_margin` and `_estimate_alpha_gain` share the `_core_and_bg` helper (core 75th-pct brightness vs background-ring median). Regression-guarded by `test_gemini_engine.py::TestSparkleFalsePositiveGate`. The registry's optional `inpaint_residual` (edge cleanup) is a no-op on a clean reverse-alpha removal; an earlier "Gemini smears" read was a misjudged soft-fur original, not an artifact. **The bg assets are now rebuilt from OUR OWN controlled captures** (`data/gemini_capture/captures/`, committed) by `scripts/visible_alpha_solve.py gemini`, which locates the 96px sparkle on the black capture and crops it to the two logo sizes; our capture matched the previously third-party-sourced `gemini_bg_96.png` to **NCC 0.9998**, validating the asset and making it reproducible. Gemini's multi-size fixed-slot model is genuinely different from the Doubao/Jimeng text-strip engines (so it stays a separate engine, not part of the shared-base refactor). +- `gemini_engine.py` — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). `detect_sparkle_confidence(path)` is the file-level entry point used by `identify.py`. The public entry points normalize a grayscale (2D) or RGBA (4-channel) input to BGR up front so a non-BGR image does not crash the cv2 pipeline. **Detection localization (issue #36):** `detect_watermark`'s global multi-scale NCC search applies a size weight (`(scale/96)**0.5`) that suppresses tiny-patch false positives but can let a larger, mediocre match (e.g. a bright collar in a portrait) outrank a small, near-perfect sparkle in the corner — so a faint sparkle on a busy background scored below threshold and read as clean (the regression osachub reported from widening the search window 256px->512px between v0.7.2 and v0.8.8). `_corner_promote` adds a bottom-right-corner raw-NCC pass on top of the global search: a match with raw NCC >= `_CORNER_PROMOTE_NCC` 0.85 that beats the global pick overrides it (it only ever replaces a lower-fidelity pick, so it cannot weaken an existing detection), rescuing the buried sparkle without reverting the wider window. The corner side is **relative-clamped** (`_CORNER_PROMOTE_FRAC` 0.20 of the short side, clamped to `[_CORNER_PROMOTE_MIN` 96, `_CORNER_PROMOTE_MAX` 384`]`): a fixed 256px is a true corner on a large image but covers ~70% of a small portrait, where a real photo raw-matches the star at ~0.81 (relative tightening drops that worst case to ~0.69, while the upper clamp stops the corner ballooning on huge images where a real photo reached ~0.83 at 512px). The 0.85 gate sits midway between the worst real-photo corner match (~0.78 across native + downscaled negatives) and a genuine faint sparkle (~0.93), so promotion adds true detections with zero corpus false positives (Gemini's sparkle sits ~60-160px from the corner at fixed margins, covered by the [96, 384] band at every measured size). Regression-guarded by `test_gemini_engine.py::TestCornerPromotion`. **Removal is reverse-alpha with an over-subtraction guard** (`remove_watermark` → `_reverse_alpha_blend`, else `_inpaint_footprint`): the sparkle alpha is computed (`alpha = max(R,G,B)/255`) from the bundled sparkle-on-black captures `assets/gemini_bg_{96,48}.png` (the capture max is ~130, NOT 255 — the sparkle is a ~51%-opaque white overlay, so `alpha` maxes at ~0.51, which is CORRECT for the capture, not under-exposed). The alpha is near-exact only when the real mark's effective opacity matches the capture, which holds on bright/flat backgrounds — re-verified clean on `demo_banana_before.png` 2026-05-31. **Issue #30 (dark-background black pit):** on a dark/textured background (e.g. grass, ~73) the real sparkle's effective opacity is LOWER than the captured 0.51, so the fixed-alpha reverse blend OVER-subtracts (`watermarked - a*logo` goes negative) and drives the footprint to black — the white sparkle becomes a black diamond. `remove_watermark` now detects this via `_reverse_alpha_oversubtracts` (fraction of footprint pixels with `alpha >= _FOOTPRINT_ALPHA` 0.1 whose numerator < 0 exceeds `_OVERSUB_FOOTPRINT_FRAC` 0.05) and **inpaints the footprint** (`_inpaint_footprint`, cv2 NS over the dilated alpha mask) from the surrounding pixels instead. **Behavior-neutral on the working case:** a bright background over-subtracts at ~0% so reverse-alpha is used and the output is byte-identical to before (verified: demo_banana 0.0 frac vs issue-#30 grass 0.61 frac; regression-guarded by `test_gemini_engine.py::TestOverSubtractionGuard`, which composites the sparkle at a reduced effective alpha to reproduce the mismatch). **Under-subtraction (the symmetric case, fixed 2026-06-03):** some real Gemini sparkles are rendered MORE opaque than the captured ~0.51, so the fixed-alpha reverse blend UNDER-subtracts and leaves a bright sparkle residual the detector still fires on (measured on the spaces corpus: a visible-removal audit through the registry path left a detectable sparkle on a meaningful fraction of marks, all under-removals, NOT a background-brightness class — failures and successes had the same input confidence and the same background-luma distribution; the discriminator was the removal delta itself). `remove_watermark` now estimates a per-image alpha gain (`_estimate_alpha_gain`: effective sparkle opacity at the bright core vs the local background ring, `a_eff/a_cap`, clamped `[1.0, _ALPHA_GAIN_MAX` 1.94`]`) and scales the alpha to match before the over-sub/blend branch. The gain cleanly separates on the corpus (under-removed marks ~1.47, cleanly-removed ~1.00), and a deadband (`_ALPHA_GAIN_DEADBAND` 1.05) keeps a matching sparkle **byte-identical** to the pre-fix output, so the fix is purely additive (0 regressions on the audit set; the over-sub guard still runs on the scaled alpha as the safety net for an over-shooting estimate). Regression-guarded by `test_gemini_engine.py::TestUnderSubtractionGain` (composites a more-opaque-than-capture sparkle; **asserts on footprint pixels, NOT the detector** — the detector's NCC is degenerate on a flat synthetic background, so a re-detect conf is meaningless there; the real corpus removal drops the detector from ~0.80 to ~0.27). **False-positive gate (added 2026-06-03):** `detect_watermark`'s shape-only NCC (`spatial*0.5 + gradient*0.3 + var*0.2`) fires on ornate/flat content (text strips, banners, hatching) that coincidentally matches the diamond shape — a real Gemini sparkle is a bright WHITE overlay, so its core sits above the local background, but the NCC is contrast-invariant and cannot see that. The fusion now **demotes** (caps confidence to 0.30) any match that is BOTH low-confidence (`< _SPARKLE_FP_CONF` 0.65) AND has a low core-ring brightness margin (`_core_ring_margin < _SPARKLE_FP_MARGIN` 5). Real sparkles escape via EITHER high confidence (white-bg sparkles score ≥0.79 despite a low margin — the NCC shape match is strong) OR high margin (dark/mid backgrounds, incl. the #36 faint-corner case, lift well clear), so BOTH must fail to demote. The gate is **monotonic** (only ever removes detections, never adds), so it cannot regress the verified-negative corpus (already 0 FPs). On the spaces corpus it demoted 16/495 flagged sparkles (13 carried no AI metadata = content FPs; the 3 AI-meta were visually FPs / a near-invisible white-on-white sparkle whose AI verdict is held by metadata anyway), and dropped the removal-audit failures 20→15 (post-removal flat footprints the NCC re-fired on). `_core_ring_margin` and `_estimate_alpha_gain` share the `_core_and_bg` helper (core 75th-pct brightness vs background-ring median). Regression-guarded by `test_gemini_engine.py::TestSparkleFalsePositiveGate`. **Self-verify repair (added 2026-06-04):** the gain estimate corrects most under-subtractions, but a tail of strong sparkles still survived reverse-alpha (position jitter, or a gain the `[1.0, 1.94]` clamp could not fully reach). After the reverse blend, `remove_watermark` re-detects via `_verify_and_repair`; when a sparkle at or above `_VERIFY_FALLBACK_CONF` 0.5 (the registry's real fail line) remains, it inpaints the footprint and **keeps that only when it lowers the re-detect confidence** — purely additive (the common clean removal re-detects below 0.5 and is returned untouched, so it can never regress). On the spaces corpus this rescued **4 of the 15 remaining gemini removal-audit failures** (15→11, doubao/jimeng still 0), verified through the registry/CLI path. Costs one extra `detect_watermark` per removal (two when the fallback fires). Regression-guarded by `test_gemini_engine.py::TestVerifyAndRepair` (stubs `detect_watermark` to drive the keep-best control flow, since the NCC is degenerate on flat synthetics). The registry's optional `inpaint_residual` (edge cleanup) is a no-op on a clean reverse-alpha removal (and on the same corpus it lowered the re-detect conf on 3 marks, raised it on 10, no-op on 466 — net-neutral on pass/fail, so the self-verify repair, not it, drives the removal tail); an earlier "Gemini smears" read was a misjudged soft-fur original, not an artifact. **The bg assets are now rebuilt from OUR OWN controlled captures** (`data/gemini_capture/captures/`, committed) by `scripts/visible_alpha_solve.py gemini`, which locates the 96px sparkle on the black capture and crops it to the two logo sizes; our capture matched the previously third-party-sourced `gemini_bg_96.png` to **NCC 0.9998**, validating the asset and making it reproducible. Gemini's multi-size fixed-slot model is genuinely different from the Doubao/Jimeng text-strip engines (so it stays a separate engine, not part of the shared-base refactor). - `doubao_engine.py` — visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). `DoubaoEngine.locate` anchors a bottom-right box by **geometry** (mark scales with image WIDTH), `extract_mask` pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy `sat = roi.max(axis=2) - roi.min(axis=2)` (no HSV conversion). `detect` is **shape-consistent**: it matches the bundled alpha glyph silhouette (`assets/doubao_alpha.png`) against the candidate via zero-mean normalized correlation (`_template_match_score`, cv2 `TM_CCOEFF_NORMED`), gated at `DETECT_NCC_THRESHOLD` 0.4 over a small `DETECT_MIN_COVERAGE` floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243). **Removal = reverse-alpha + thin residual inpaint** (`remove_watermark_reverse_alpha`): `original = (wm - a*logo)/(1-a)` from the bundled alpha map + `_ALPHA_LOGO_BGR` (pure white) + `_ALPHA_*_FRAC` geometry, then a deliberately THIN inpaint (`_RESIDUAL_*`, `INPAINT_NS`) over the glyph footprint clears leftover edges without smearing. **Alpha is rebuilt by `scripts/visible_alpha_solve.py` (the careful gray-self solve: cubic background fit, mean over channels, full halo, unblurred), same recipe as Jimeng** — the captures are committed in `data/doubao_capture/captures/`. **Removal aligns ALWAYS** (no `_ALPHA_NATIVE_BAND` fast-path): it tries fixed geometry AND `_aligned_alpha_map`'s `TM_CCOEFF_NORMED` scale+position search and keeps the lower-residual one — the mark is re-rasterized and a few px off per image, so fixed geometry alone leaves a visible outline even at 2048. **The locate box (`WM_*`) is generous (0.22 wide, margins 0.004) and reaches close to the corner** — a tight box (the old 0.185 / margin 0.012) let a corner-ward shift fall OUTSIDE the alignment search, so the align missed and a readable outline survived; regression-guarded by `test_recovers_shifted_mark_on_texture` (composes the alpha shifted on a known texture; old box ~29 vs new ~1 mean residual). **Issue #13 follow-up defect (found 2026-05-31): the SHIPPED Doubao removal left a clearly READABLE "豆包AI生成" outline on the real `doubao-1.png` sample, while `detect` returned conf 0.0 (it is fooled by a thin outline) so `test_reverse_alpha_removes_mark` passed and the old "56/56 clean" claim was detector-measured, not visual.** Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight box; the careful rebuild + always-align + thin inpaint + wide box takes it from a readable outline to faint texture-level traces (parity with Jimeng — a single capture cannot pixel-cancel a per-image re-rasterized mark). **Lesson: a detector-only removal test is insufficient; assert visual residual (the textured-shift test).** **`extract_mask` guards a degenerate ROI (`bh < 16 or bw < 16` -> empty mask, skips cv2):** the always-align removal scores each placement with a residual `detect(out)`, and on an extremely wide/short image (e.g. 2048x1, `test_wide_short_does_not_raise`) that fed cv2's GaussianBlur a ~1-px-tall ROI and **faulted natively on Windows py3.12 (access violation, non-deterministic — one CI cell went red while a re-run passed)**; the old at-native path never ran `detect` on degenerate sizes. Real images always clear the guard (the `WM_*` box floors are `max(16, …)` height / `max(40, …)` width), so it only short-circuits slivers. `reverse_alpha_available` is just "asset present"; the registry gates removal on `detect`. The shipped third-party `_refs/zhengsuanfa_doubao_alpha_120x20.png` is NOT a usable alpha (verified 2026-05-29). Arbitrary-region inpainting is `region_eraser`/`erase`. - `jimeng_engine.py` — visible Jimeng / Dreamina "★ 即梦AI" remover/detector (cv2/numpy, no GPU), built 2026-05-30 from issue #13's solid captures (@powersee). Mirrors `doubao_engine`: `locate` anchors a bottom-right box by **geometry** (scales with WIDTH), `extract_mask` pulls the light low-chroma glyphs (white top-hat + grayish + min-luma), `detect` matches the bundled "即梦AI" glyph silhouette (`assets/jimeng_alpha.png`) via `TM_CCOEFF_NORMED` over a coverage floor. Threshold `DETECT_NCC_THRESHOLD` **0.45** cleanly separates real Jimeng marks (>=0.81) from the Doubao strip (0.21) and other AI output (0.0), so the two ByteDance marks don't cross-fire in `--mark auto`. **Logo is pure white (255,255,255)** (`_ALPHA_LOGO_BGR`; the white capture + an L-pair-solve confirm ~254.6); compositing is **sRGB, not linear** (a linear-light solve tripled the cross-residual). **Alpha rebuilt by `scripts/visible_alpha_solve.py` from the GRAY capture** (`data/jimeng_capture/captures/`, the solid captures now committed): `a = (I - B)/(255 - B)`, B a per-capture **cubic** background fit over the non-glyph pixels, **averaged over channels, full halo extent (down to a~0.02), unblurred**. Gray (bg ~132) is the deliberate choice over black: it is the best proxy for real content (the mark sits on bright photo areas, not on black), and the careful build drops the gray self-residual to ~1.3. **The mask quality, not the method, was the earlier limit** — a max-channel / quadratic-bg / blurred / halo-truncated build (and a black-dominated LS) left a visible outline (lesson from issue #13: when reverse-alpha leaves a ghost, suspect the captured alpha map before adding heuristics or switching method). Geometry emitted by the solver at `_ALPHA_NATIVE_WIDTH` 2048: `_ALPHA_WIDTH_FRAC` 0.202, `_ALPHA_HEIGHT_FRAC` 0.058, margins ~0.029. **Removal = reverse-alpha + a deliberately THIN residual inpaint** (`remove_watermark_reverse_alpha`, `_RESIDUAL_DILATE` 5 over the `_RESIDUAL_ALPHA_FLOOR` 0.05 footprint, `_RESIDUAL_INPAINT_RADIUS` 2, `INPAINT_NS`): a single 2048 alpha cannot pixel-cancel the mark re-rasterized at another resolution (alpha maps from independent captures correlate 0.998, not 1.0; off-native reverse-alpha alone only halves the mark), so a tight inpaint clears the residual edges WITHOUT the texture/edge smear a wide full-footprint pass caused. **Placement ALWAYS tries fixed geometry AND `_aligned_alpha_map`'s NCC scale+position search, keeping the lower-residual** — the mark re-rasterizes + jitters a few px per image even at the captured width, so fixed geometry alone misses (there is no `_ALPHA_NATIVE_BAND` fast-path; the scale search `_ALPHA_ALIGN_SEARCH` is fine-stepped, and the `WM_*` locate box is generous so a corner-ward shift stays inside the search — the same widen that fixed Doubao). Verified clean on the solid captures (native 2048; faint self-residual ~1.3 visible only on a dead-flat field, hidden by real texture) and a real 1440-wide Jimeng download (off-native, table edge preserved). `reverse_alpha_available` is just "asset present"; the registry gates on `detect`. **No committed real sample** (the real content download stays gitignored; only the solid calibration captures are committed) — `tests/test_jimeng_engine.py` synthesizes a mark from the bundled alpha asset, and `test_recovers_shifted_mark_on_texture` guards the align-on-shift path that the Doubao defect exposed. Jimeng images are independently caught by the China TC260 AIGC label in `metadata`/`identify`, so this engine is the visible-mark *removal* path, not a new `identify` signal. - `region_eraser.py` — universal region eraser (`erase` CLI). `erase(image, boxes=|mask=, backend=)` accepts grayscale (2D) and RGBA (4-channel) inputs on **both** backends (`erase_cv2` and `erase_lama` each split off any alpha plane and re-attach it unchanged, and promote grayscale to BGR for processing — LaMa would otherwise crash on grayscale and drop alpha on BGRA): `boxes_to_mask` → `cv2.inpaint` (`cv2` backend, default, no deps) or big-LaMa via onnxruntime (`lama` backend, extra `lama`, `Carve/LaMa-ONNX` Apache-2.0 model downloaded on first use, never bundled). `erase_lama` crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy `_get_lama_session` singleton; `lama_available()` guards the optional import. **LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU** (FFC working set, not arena — `enable_cpu_mem_arena=False` does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal. @@ -45,7 +46,8 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton guarded by a double-checked `threading.Lock` so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise. - `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map -- no original pixels are copied or frozen, so SynthID does not survive.** Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is preserved by the optional `--restore-faces` GFPGAN post-pass (EXPERIMENTAL, opt-in, OFF by default) -- see `face_restore.py`). `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). - `face_restore.py` — optional GFPGAN face-restoration post-pass (cv2/torch/gfpgan boundary, top-of-file pyright pragma). **EXPERIMENTAL, opt-in, OFF by default.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark`, params `restore_faces=False` / `restore_faces_weight=0.5`; CLI `--restore-faces`/`--no-restore-faces` + `--restore-faces-weight` on `invisible`/`all`/`batch`). **Restores face IDENTITY while still scrubbing the pixel watermark:** GFPGAN re-synthesizes each face from a StyleGAN2 prior (codebook/GAN pixels, NOT the original), so the composited face regions carry no watermark and no pixel-copy -- oracle-validated clean at weight 0.5 with identity preserved. Flow: GFPGANer.enhance runs on the ORIGINAL (watermarked) image -> identity faces + RetinaFace boxes (`restorer.face_helper.det_faces`); `_composite_faces` feather-composites those restored face REGIONS into the diffusion-cleaned image. `is_available()` gates on gfpgan + facexlib; lazily-built `GFPGANer` singleton forces CPU unless CUDA (the pip GFPGANer has an MPS device-mismatch bug; it is a cheap post-pass on a few face crops). `_apply_basicsr_shim()` recreates the removed `torchvision.transforms.functional_tensor` module that basicsr imports. The pure `_composite_faces` helper (Gaussian-feathered rectangular alpha per box, `out = restored*a + base*(1-a)`) is unit-tested without the model (`tests/test_face_restore.py`); the model-running path is gated behind `is_available()`. **Commercial-safe** (GFPGAN Apache-2.0 + RetinaFace MIT); the CodeFormer alternative is NON-COMMERCIAL and is NOT shipped. The `restore` extra (gfpgan/facexlib/basicsr) is kept OUT of `all` (heavy + the GFPGANv1.4 + RetinaFace weights download on first use, never bundled). **`restore` pins numpy<2** (same trap class as the removed faceid/insightface extra): basicsr/gfpgan/facexlib are an old ecosystem, so the extra caps `scipy<1.18` (>=1.18 uses `np.long`, gone in numpy 1.24-1.26) and `numba<0.60` to keep the whole env on one numpy 1.26 resolution; verified the `--extra dev --extra gpu` gate env stays numpy 1.26.4 + `diffusers.loaders.peft` importable with `restore` present. **basicsr 1.4.2 builds only on Python <3.13** (its `setup.py get_version()` uses `exec(...)` + `locals()['__version__']`, which the 3.13 fast-locals change broke -> `KeyError: '__version__'`), so the project is pinned to Python 3.12 via `.python-version` and `[tool.uv.extra-build-dependencies] basicsr = ["setuptools<69"]`. basicsr ships sdist-only (no wheel). -- `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). `restore_faces` is on when a face is present. When a smoothing pass (controlnet/restore) ran, the **adaptive polish** (`humanizer.adaptive_polish`) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while **sparing text** (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, a Canny edge-density + MSER region heuristic for text/structure (the text part is a rough Phase-1 placeholder — DBNet via `cv2.dnn` is the planned precision upgrade; it only ever ADDS controlnet so a miss is backstopped by edge-density and a false positive only costs a controlnet run), and `edge_density`. `min_resolution` stays 1024. **Every auto decision is independently overridable** (interface principle): `_apply_auto` (cli.py) overrides only the three content-adaptive modes the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — `--pipeline`, `--restore-faces`/`--no-restore-faces`, and **`--adaptive-polish`/`--no-adaptive-polish`** always win; `--min-resolution`/`--strength`/`--unsharp`/`--humanize` are independent knobs. `--adaptive-polish` also works WITHOUT `--auto` (manual detail-targeted polish; the engine's `adaptive_polish` param uses the full-res original as the detail reference). Prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible` (not `batch` yet — its engine is cached per-mode, auto needs a per-image pipeline). **Adds ZERO new pip deps** (all cv2 core + the bundled MIT model + the cv2-only adaptive polish). Still deferred: Real-ESRGAN-via-Spandrel upscaling (a new `esrgan` extra) and a DBNet text detector (replacing the MSER heuristic). Unit-tested without the model where possible (`tests/test_auto_config.py`): flat/text synthetic images for routing, monkeypatched `detect_face`/`detect_text` for the face/text branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. +- `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). `restore_faces` is on when a face is present. When a smoothing pass (controlnet/restore) ran, the **adaptive polish** (`humanizer.adaptive_polish`) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while **sparing text** (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, **DBNet** (PP-OCRv3 differentiable-binarization via `cv2.dnn.TextDetectionModel_DB`, a 2.4 MB Apache-2.0 model bundled at `assets/text_detection_ppocrv3_2023may.onnx`) for text, with the old Canny+MSER region heuristic kept as a fallback if the DBNet model can't load (`_detect_text_dbnet` returns None → `_detect_text_mser`). The en/cn opencv_zoo PP-OCRv3 detection models are byte-identical, so it is bundled language-neutral. Text only ever ADDS controlnet, so a miss is backstopped by edge-density and a false positive only costs a controlnet run. Plus `edge_density`. `min_resolution` stays 1024. **Every auto decision is independently overridable** (interface principle): `_apply_auto` (cli.py) overrides only the three content-adaptive modes the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — `--pipeline`, `--restore-faces`/`--no-restore-faces`, and **`--adaptive-polish`/`--no-adaptive-polish`** always win; `--min-resolution`/`--strength`/`--unsharp`/`--humanize` are independent knobs. `--adaptive-polish` also works WITHOUT `--auto` (manual detail-targeted polish; the engine's `adaptive_polish` param uses the full-res original as the detail reference). Prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible`/`cmd_batch` — in `batch` the plan is recomputed per image and the invisible engine is cached **per resolved pipeline** (`ctx.obj["_inv_engines"]`, keyed `default`/`controlnet`) instead of a single shared instance, so a mixed directory builds at most one engine of each kind. **Adds ZERO new pip deps** (all cv2 core + the bundled MIT YuNet + Apache-2.0 DBNet models + the cv2-only adaptive polish). The auto plan does NOT select the `esrgan` upscaler (that needs the optional extra and would make auto's behavior install-dependent); `--upscaler esrgan` stays a separate manual knob. Unit-tested without a heavy download (`tests/test_auto_config.py`): flat/text synthetic images for routing (the bundled DBNet fires on a real text card), monkeypatched `detect_face`/`_detect_text_dbnet`/`_detect_text_mser` for the face/text/fallback branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. +- `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps). **ESRGAN is a generic photo/texture GAN with no face/glyph prior**, so it best fits photo/texture content and can degrade faces (glassy/asymmetric eyes -- the diffusion pass regenerates faces so the full-pipeline final recovers; that is what GFPGAN/`--restore-faces` is for) and thin/small text (the GAN invents wrong strokes, and low-strength diffusion will not fix it). Verified 2026-06-04: isolated upscale lap-var ~5x Lanczos on faces+textures but glassy eyes; end-to-end `invisible` final lap-var 1634 vs Lanczos 663 with natural faces (diffusion cleaned the artifact). Kept a **manual opt-in knob** (the auto plan never selects it) with `lanczos` the default; not content-gated by design (use Lanczos for text-heavy inputs). spandrel is MIT and pulls no basicsr, unlike the `restore` extra. Unit-tested without the model: `tests/test_upscaler.py` (availability guard + the not-installed RuntimeError) and `tests/test_invisible_engine.py::TestEsrganUpscale` (the three `_esrgan_upscale` branches via a monkeypatched `upscaler`). - `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** `imwrite` returns `False` on an unwritable path (`OSError` caught) instead of raising, matching `cv2.imwrite` semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env. ### Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29) @@ -77,7 +79,7 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are ## Known limitations -- `invisible` pipeline processes at **native resolution for inputs whose long side is >= 1024px**, and **auto-upscales smaller inputs UP to a 1024px floor** (`min_resolution=1024`, the default; `--min-resolution 0` disables) before diffusion -- SDXL img2img distorts badly on a tiny latent (a 381x512 portrait wrecks at native, the #36 follow-up), and the output is restored to the original input size so the floor is a transparent quality boost (it adds time/memory on small inputs). `max_resolution=0` (default) means no downscale cap, matching the hosted raiw.cc backend (fal fast-sdxl, no pre-downscale). The old forced downscale-to-1024 -> upscale-back round-trip for LARGE images was the main quality loss (issue #10) and is gone; at strength ~0.05 SDXL img2img does not need a downscale. **Final `--unsharp` post-filter (`humanizer.unsharp_mask`, opt-in, default 0):** applied LAST (after the GFPGAN face pass, else it would be smoothed over) to counter the soft/over-smoothed look diffusion + restoration leave (an AI tell); ~0.5-0.8 safe, higher risks halos. Pairs with `--humanize` (grain adds sensor-noise texture, unsharp adds crispness). `--max-resolution N` re-introduces an opt-in long-side cap purely to bound GPU/MPS memory on very large inputs (it reintroduces the lossy round-trip). For huge images that OOM at native, tile-based diffusion is still the proper long-term fix. **Concrete MPS data points (the OOM is memory-tier-dependent, NOT a hard MPS limit):** on a ~24 GB unified-memory machine (verified 2026-05-25, 1254x1254 gpt-image SDXL, fp32) native res OOMs at the *UNet* step (peak ~17 GiB), not only the VAE decode, and the auto-fallback in `img2img_runner` reloads on CPU and finishes (slow, ~13 min) -- the output is still weight-identical and defeats SynthID, so "looks hung/crashed" on Mac is usually this CPU fallback, not a pipeline error. On a **32 GB** unified-memory machine the same default SDXL pass runs entirely on MPS with **no CPU fallback** (verified 2026-05-31, 1122x1402 gpt-image, `all`/default, ~155 s end-to-end), so 32 GB clears the native-res UNet peak that 24 GB could not. Adding `enable_vae_tiling()` alone does NOT prevent the 24 GB OOM (the peak is the UNet, not the VAE). The fast Mac workarounds for memory-constrained machines are fp16 on MPS (roughly halves memory) or `--max-resolution` to cap the long side; neither is wired as the default. The `controlnet` pipeline adds the canny ControlNet weights on top of SDXL, so its peak is a bit higher than the plain `default` pass; the same MPS->CPU fallback covers an OOM. The native-vs-cap-vs-floor decision lives in the pure helper `invisible_engine._target_size(w, h, max_resolution, min_resolution)` (returns `None` for native, a target tuple for a downscale cap OR an upscale floor; cap takes precedence, the floor is skipped on a min>max misconfig) so it is unit-tested (`tests/test_invisible_engine.py::TestTargetSize`, the #10/#15/#36 regression guard) without loading the model -- keep that logic in the helper, don't re-inline it. +- `invisible` pipeline processes at **native resolution for inputs whose long side is >= 1024px**, and **auto-upscales smaller inputs UP to a 1024px floor** (`min_resolution=1024`, the default; `--min-resolution 0` disables) before diffusion -- SDXL img2img distorts badly on a tiny latent (a 381x512 portrait wrecks at native, the #36 follow-up), and the output is restored to the original input size so the floor is a transparent quality boost (it adds time/memory on small inputs). The floor upscale uses Lanczos by default; **`--upscaler esrgan`** (opt-in, the `esrgan` extra) runs Real-ESRGAN first for better detail before the Lanczos resize to the exact target (`upscaler.py` / `InvisibleEngine._esrgan_upscale`, falls back to Lanczos if the extra is absent). `max_resolution=0` (default) means no downscale cap, matching the hosted raiw.cc backend (fal fast-sdxl, no pre-downscale). The old forced downscale-to-1024 -> upscale-back round-trip for LARGE images was the main quality loss (issue #10) and is gone; at strength ~0.05 SDXL img2img does not need a downscale. **Final `--unsharp` post-filter (`humanizer.unsharp_mask`, opt-in, default 0):** applied LAST (after the GFPGAN face pass, else it would be smoothed over) to counter the soft/over-smoothed look diffusion + restoration leave (an AI tell); ~0.5-0.8 safe, higher risks halos. Pairs with `--humanize` (grain adds sensor-noise texture, unsharp adds crispness). `--max-resolution N` re-introduces an opt-in long-side cap purely to bound GPU/MPS memory on very large inputs (it reintroduces the lossy round-trip). For huge images that OOM at native, tile-based diffusion is still the proper long-term fix. **Concrete MPS data points (the OOM is memory-tier-dependent, NOT a hard MPS limit):** on a ~24 GB unified-memory machine (verified 2026-05-25, 1254x1254 gpt-image SDXL, fp32) native res OOMs at the *UNet* step (peak ~17 GiB), not only the VAE decode, and the auto-fallback in `img2img_runner` reloads on CPU and finishes (slow, ~13 min) -- the output is still weight-identical and defeats SynthID, so "looks hung/crashed" on Mac is usually this CPU fallback, not a pipeline error. On a **32 GB** unified-memory machine the same default SDXL pass runs entirely on MPS with **no CPU fallback** (verified 2026-05-31, 1122x1402 gpt-image, `all`/default, ~155 s end-to-end), so 32 GB clears the native-res UNet peak that 24 GB could not. Adding `enable_vae_tiling()` alone does NOT prevent the 24 GB OOM (the peak is the UNet, not the VAE). The fast Mac workarounds for memory-constrained machines are fp16 on MPS (roughly halves memory) or `--max-resolution` to cap the long side; neither is wired as the default. The `controlnet` pipeline adds the canny ControlNet weights on top of SDXL, so its peak is a bit higher than the plain `default` pass; the same MPS->CPU fallback covers an OOM. The native-vs-cap-vs-floor decision lives in the pure helper `invisible_engine._target_size(w, h, max_resolution, min_resolution)` (returns `None` for native, a target tuple for a downscale cap OR an upscale floor; cap takes precedence, the floor is skipped on a min>max misconfig) so it is unit-tested (`tests/test_invisible_engine.py::TestTargetSize`, the #10/#15/#36 regression guard) without loading the model -- keep that logic in the helper, don't re-inline it. - **fp16 VAE black-output fix (issue #29, 2026-05-30):** on a **CUDA/XPU fp16** backend the stock SDXL VAE overflows to NaN and the *plain* img2img path decodes to an **all-black** image (reproduced on the raiw.cc result: a 1086x1448 input -> a uniformly black 4.6 KB PNG, mean 0). `watermark_remover._load_pipeline` / `_load_controlnet_pipeline` swap in the fp16-fixed SDXL VAE (`madebyollin/sdxl-vae-fp16-fix` = `_SDXL_FP16_VAE_ID`) when `_needs_fp16_vae_fix(model_id, DEFAULT_MODEL_ID, is_fp16)` is true -- only the default SDXL checkpoint on fp16. **cpu/mps run fp32** (the stock VAE is fine there, which is why the bug never reproduces on Mac). A custom non-SDXL `model_id` keeps its own VAE (the fp16-fix VAE is SDXL-architecture-specific). The decision is a pure helper, unit-tested without a download (`tests/test_platform.py::TestFp16VaeFix`); the actual black->clean recovery needs a CUDA GPU. **Confirmed on real CUDA hardware 2026-06-03:** running `all` on a 1086x1448 OpenAI gpt-image (the #29 repro size) at fp16 produced a normal (non-black) output, so the fp16-fix VAE swap resolves the all-black decode. (It was not reproducible on this MPS machine, which runs fp32, so the verification had to happen on an NVIDIA box.) - Pyright first run is slow (2-3 min) due to ML deps (torch/diffusers/transformers stubs); full-project `uv run pyright` can stall for many minutes — scope it to changed files. - A third-party PIL plugin autoload (e.g. an HEIF/AVIF plugin) can raise a non-OSError (`ModuleNotFoundError`), not `UnidentifiedImageError`, when opening a file. Code that opens user-supplied or unknown-format files should `except Exception`, not just `OSError`/`UnidentifiedImageError`. diff --git a/README.md b/README.md index 1d45978..49b5992 100644 --- a/README.md +++ b/README.md @@ -113,7 +113,7 @@ image → encode to latent space (VAE) at native resolution → decode back to pixels (VAE) ``` -- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. +- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. The floor upscale uses Lanczos by default; `--upscaler esrgan` (the `esrgan` extra) runs Real-ESRGAN first for sharper detail and falls back to Lanczos if the extra is absent. ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it is best for photo/texture content -- it can degrade faces (the diffusion pass regenerates them, so the final recovers) and thin text; keep Lanczos for text-heavy inputs. > **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength that clears it with the least quality loss: **OpenAI gpt-image → `0.10`**, **Google Gemini → `0.15`**, **unknown source → `0.15`**. An oracle-verified June 2026 study (clean pipeline, per-image openai.com/verify or Gemini app) found OpenAI's watermark clears at `0.05` across `1024`-`1600` px (resolution-independent) while Google's is ~3x more robust and needs `0.15`. The dominant factor is the vendor, not resolution. There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine text, lower it. (Caveat: Google's `0.15` was validated on the capped `--max-resolution 1536` path; a very large native Gemini image may need more.) > @@ -213,6 +213,14 @@ After installation the `remove-ai-watermarks` command is available system-wide. > ```bash > pip install -e ".[restore]" # or: uv pip install -e ".[restore]" > ``` +> +> For sharper upscaling of small inputs before diffusion (`--upscaler esrgan`, +> Real-ESRGAN), install the `esrgan` extra. It loads via spandrel (MIT, no basicsr); +> the Real-ESRGAN weights (BSD-3-Clause) download on first use: +> +> ```bash +> pip install -e ".[esrgan]" # or: uv pip install -e ".[esrgan]" +> ``` #### Invisible watermark removal @@ -280,7 +288,8 @@ remove-ai-watermarks erase image.png --region 1640,1930,400,100 -o clean.png remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0.5 # --humanize adds film grain, --unsharp counters the soft "AI" look (both opt-in). # Large images run at native resolution; small ones are upscaled to a 1024 floor -# first (disable with --min-resolution 0). On a very large image that OOMs the +# first (disable with --min-resolution 0); --upscaler esrgan uses Real-ESRGAN for +# that floor upscale (needs the 'esrgan' extra). On a very large image that OOMs the # GPU/MPS, cap the long side: --max-resolution 2048 # Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override # with --strength. To preserve text/face structure, use --pipeline controlnet @@ -301,6 +310,10 @@ remove-ai-watermarks metadata image.png --remove # Batch with a specific mode remove-ai-watermarks batch ./images/ --mode visible + +# Batch also accepts --auto (and --adaptive-polish): the plan is recomputed per +# image, so a mixed directory routes each file to the right pipeline +remove-ai-watermarks batch ./images/ --mode all --auto ``` ### Python API diff --git a/pyproject.toml b/pyproject.toml index db81075..11c7b83 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -92,6 +92,19 @@ restore = [ "scipy<1.18", "numba<0.60", ] +# Optional pre-diffusion super-resolution for small inputs (Real-ESRGAN). Loaded via +# spandrel (MIT) -- a pure model-loader with NO basicsr dependency (it pulls only +# torch / torchvision / safetensors / numpy / einops), which sidesteps the +# basicsr / torchvision.functional_tensor breakage that the `restore` extra fights. +# The Real-ESRGAN weights (BSD-3-Clause) download on first use and are cached; they +# are never bundled. CPU works but is slow on large inputs -- it is meant for the +# pre-diffusion upscale of SMALL inputs (and the GPU worker). Guarded by +# upscaler.is_available(); the default upscaler stays Lanczos (cv2, no deps). The +# weights are fetched with torch.hub (bundled with spandrel's torch), so no extra +# download dependency is needed. +esrgan = [ + "spandrel>=0.3.0", +] dev = [ "pytest>=8.0.0", "pytest-cov>=4.1.0", diff --git a/src/remove_ai_watermarks/assets/text_detection_ppocrv3_2023may.onnx b/src/remove_ai_watermarks/assets/text_detection_ppocrv3_2023may.onnx new file mode 100644 index 0000000..baaeabb Binary files /dev/null and b/src/remove_ai_watermarks/assets/text_detection_ppocrv3_2023may.onnx differ diff --git a/src/remove_ai_watermarks/auto_config.py b/src/remove_ai_watermarks/auto_config.py index 4e29975..ee08f94 100644 --- a/src/remove_ai_watermarks/auto_config.py +++ b/src/remove_ai_watermarks/auto_config.py @@ -17,14 +17,15 @@ text/graphics (already high-frequency, so almost no polish) and spares text/edge masking the grain. Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for -faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- plus a Canny -edge-density + MSER region heuristic for text/structure. The whole planner peaks -~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs anywhere -the pipeline runs. +faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- DBNet (PP-OCRv3 +differentiable-binarization via ``cv2.dnn.TextDetectionModel_DB``, a 2.4 MB Apache-2.0 +model bundled in ``assets/``) for text, and a Canny ``edge_density``. The whole planner +peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs +anywhere the pipeline runs. -The text heuristic is a deliberately rough Phase-1 placeholder (DBNet via cv2.dnn is -the planned precision upgrade); it only ever ADDS controlnet, so a miss is backstopped -by the edge-density route and a false positive only costs a controlnet run. +The text detector falls back to the old MSER region heuristic if the DBNet model can't +load. Either way text only ever ADDS controlnet, so a miss is backstopped by the +edge-density route and a false positive only costs a controlnet run. """ # cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules @@ -47,15 +48,29 @@ logger = logging.getLogger(__name__) # preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat # gradient/solid image falls under 0.008. _STRUCTURELESS_EDGE_MAX = 0.008 -# MSER regions per megapixel above this -> likely text. Rough Phase-1 heuristic: a -# no-text portrait measures a few hundred/MP, dense text far more. Set high so it -# rarely false-fires; it only ever ADDS controlnet so miscalibration is low-harm. +# MSER regions per megapixel above this -> likely text. The MSER path is now only the +# FALLBACK when the bundled DBNet model can't load; DBNet (below) is the primary text +# detector. Rough heuristic: a no-text portrait measures a few hundred/MP, dense text +# far more. Set high so it rarely false-fires; text only ever ADDS controlnet. _TEXT_MSER_PER_MP = 1500.0 _FACE_SCORE = 0.6 # YuNet confidence for a face to count # Downscale the long side to this for DETECTION only (faces stay detectable down to -# ~10px, and this bounds YuNet/MSER cost on huge inputs). Removal runs at full res. +# ~10px, and this bounds YuNet/DBNet/MSER cost on huge inputs). Removal runs at full res. _DETECT_MAX_SIDE = 1024 +# DBNet (PP-OCRv3 differentiable-binarization) text-region detector via cv2.dnn -- the +# primary "has meaningful text" signal. The model is the shared PP-OCRv3 detection net +# from OpenCV Zoo (Apache-2.0); en/cn variants are byte-identical, so it is bundled +# language-neutral. cv2.dnn is core OpenCV, so this adds NO new pip dependency. +_DBNET_ASSET = "text_detection_ppocrv3_2023may.onnx" # Apache-2.0 (OpenCV Zoo PP-OCRv3 DB) +_DBNET_BINARY_THRESHOLD = 0.3 +_DBNET_POLYGON_THRESHOLD = 0.5 +_DBNET_MAX_CANDIDATES = 200 +_DBNET_UNCLIP_RATIO = 2.0 +_DBNET_INPUT_SIDE = 736 # square input, multiple of 32 (PP-OCRv3 default) +_DBNET_MEAN = (122.67891434, 116.66876762, 104.00698793) # ImageNet mean * 255 +_dbnet: Any = None # lazy singleton; set to False after a load failure (-> MSER fallback) + # When a smoothing pass ran (controlnet or face restore), the adaptive polish # (humanizer.adaptive_polish) restores the input's detail level, sparing text -- # replacing the old fixed unsharp/grain which over-/under-corrected and speckled text. @@ -152,8 +167,41 @@ def detect_face(image: NDArray[Any]) -> bool: return faces is not None and len(faces) > 0 -def detect_text(image: NDArray[Any]) -> bool: - """Rough MSER-based text-presence heuristic (Phase-1 placeholder for DBNet).""" +def _detect_text_dbnet(image: NDArray[Any]) -> bool | None: + """DBNet (PP-OCRv3) text-region presence via cv2.dnn. + + Returns True/False on a successful run, or None if the bundled model can't load + (the caller then falls back to the MSER heuristic). Loads once, lazily. + """ + import cv2 + + global _dbnet + if _dbnet is False: # a prior load failed; skip straight to the MSER fallback + return None + img = _to_bgr(image) + h, w = img.shape[:2] + if h < 1 or w < 1: + return False + try: + if _dbnet is None: + model = Path(__file__).parent / "assets" / _DBNET_ASSET + net = cv2.dnn.TextDetectionModel_DB(str(model)) + net.setBinaryThreshold(_DBNET_BINARY_THRESHOLD) + net.setPolygonThreshold(_DBNET_POLYGON_THRESHOLD) + net.setMaxCandidates(_DBNET_MAX_CANDIDATES) + net.setUnclipRatio(_DBNET_UNCLIP_RATIO) + net.setInputParams(1.0 / 255.0, (_DBNET_INPUT_SIDE, _DBNET_INPUT_SIDE), _DBNET_MEAN) + _dbnet = net + boxes, _ = _dbnet.detect(img) + except Exception as e: # model load / inference can raise cv2.error or others + logger.debug("DBNet text detect failed (%s); falling back to MSER", e) + _dbnet = False + return None + return boxes is not None and len(boxes) > 0 + + +def _detect_text_mser(image: NDArray[Any]) -> bool: + """Fallback MSER-based text-presence heuristic (used only if DBNet can't load).""" import cv2 gray = _to_gray(image) @@ -166,6 +214,12 @@ def detect_text(image: NDArray[Any]) -> bool: return per_mp > _TEXT_MSER_PER_MP +def detect_text(image: NDArray[Any]) -> bool: + """Text-presence: DBNet (cv2.dnn) when the bundled model loads, else the MSER heuristic.""" + dbnet = _detect_text_dbnet(image) + return _detect_text_mser(image) if dbnet is None else dbnet + + def edge_density(image: NDArray[Any]) -> float: """Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1].""" import cv2 @@ -190,9 +244,9 @@ def plan(image_path: Path) -> AutoConfig | None: h, w = image.shape[:2] small = _downscale_for_detection(image) - gray = _to_gray(small) # convert once; the text/edge detectors pass a gray input through + gray = _to_gray(small) # convert once; edge density + the MSER fallback use gray has_face = detect_face(small) # YuNet needs the 3-channel image - has_text = detect_text(gray) + has_text = detect_text(small) # DBNet wants BGR; the MSER fallback grays it internally edges = edge_density(gray) structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index 308b8ff..2b99123 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -159,6 +159,16 @@ _unsharp_option = click.option( "--unsharp", type=float, default=0.0, help="Unsharp-mask sharpening strength (0 = off, typical: 0.3-0.8)." ) +_upscaler_option = click.option( + "--upscaler", + type=click.Choice(["lanczos", "esrgan"]), + default="lanczos", + help="How to upscale a small input to the --min-resolution floor: lanczos (default, cv2, no deps) or " + "esrgan (Real-ESRGAN via the 'esrgan' extra; better detail, slower on CPU). Best for photo/texture " + "content -- as a generic GAN with no face/glyph prior it can degrade faces (diffusion mitigates) and " + "thin text, so lanczos stays the default. Falls back to lanczos if the extra is absent. Only when upscaling.", +) + _auto_option = click.option( "--auto", is_flag=True, @@ -210,6 +220,21 @@ def _apply_auto( return pipeline, restore_faces, adaptive_polish +def _warn_if_esrgan_unavailable(upscaler: str) -> None: + """Tell the user once if ``--upscaler esrgan`` will silently fall back to Lanczos. + + The engine downgrades to Lanczos when the ``esrgan`` extra is absent (fail-safe, so + a batch never breaks mid-run) -- but without this notice the user would believe + Real-ESRGAN ran. Surfaced at the CLI layer, once per invocation (not per image). + """ + if upscaler != "esrgan": + return + from remove_ai_watermarks import upscaler as _upscaler + + if not _upscaler.is_available(): + console.print(" Note: --upscaler esrgan needs the 'esrgan' extra; falling back to Lanczos.") + + def _restore_faces_options(f: Any) -> Any: """Attach the shared GFPGAN face-restoration flags to an invisible-pipeline command.""" restore_flag = click.option( @@ -557,6 +582,7 @@ def cmd_erase( @_restore_faces_options @_min_resolution_option @_unsharp_option +@_upscaler_option @_auto_option @_adaptive_polish_option @click.pass_context @@ -577,6 +603,7 @@ def cmd_invisible( controlnet_scale: float, restore_faces: bool, restore_faces_weight: float, + upscaler: str, auto: bool, adaptive_polish: bool, ) -> None: @@ -596,6 +623,7 @@ def cmd_invisible( from remove_ai_watermarks.invisible_engine import InvisibleEngine source = _validate_image(source) + _warn_if_esrgan_unavailable(upscaler) if auto: pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish) if output is None: @@ -634,6 +662,7 @@ def cmd_invisible( adaptive_polish=adaptive_polish, max_resolution=max_resolution, min_resolution=min_resolution, + upscaler=upscaler, vendor=vendor, restore_faces=restore_faces, restore_faces_weight=restore_faces_weight, @@ -815,6 +844,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo @_restore_faces_options @_min_resolution_option @_unsharp_option +@_upscaler_option @_auto_option @_adaptive_polish_option @click.pass_context @@ -838,6 +868,7 @@ def cmd_all( controlnet_scale: float, restore_faces: bool, restore_faces_weight: float, + upscaler: str, auto: bool, adaptive_polish: bool, ) -> None: @@ -854,6 +885,7 @@ def cmd_all( _banner() source = _validate_image(source) + _warn_if_esrgan_unavailable(upscaler) if auto: pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish) @@ -941,6 +973,7 @@ def cmd_all( adaptive_polish=adaptive_polish, max_resolution=max_resolution, min_resolution=min_resolution, + upscaler=upscaler, vendor=vendor, restore_faces=restore_faces, restore_faces_weight=restore_faces_weight, @@ -1001,6 +1034,9 @@ def _process_batch_image( restore_faces: bool = False, restore_faces_weight: float = 0.5, controlnet_scale: float = 1.0, + upscaler: str = "lanczos", + auto: bool = False, + adaptive_polish: bool = False, ) -> None: """Process a single image for batch mode. @@ -1046,14 +1082,22 @@ def _process_batch_image( if invisible_available(): from remove_ai_watermarks.invisible_engine import InvisibleEngine - if "_inv_engine" not in ctx.obj: - ctx.obj["_inv_engine"] = InvisibleEngine( + # --auto re-plans the pipeline / face-restore / polish per image; only the + # pipeline choice changes the engine ctor, so cache one engine per pipeline + # (controlnet vs default) rather than a single shared instance. + if auto: + pipeline, restore_faces, adaptive_polish = _apply_auto( + ctx, img_path, pipeline, restore_faces, adaptive_polish + ) + engines = ctx.obj.setdefault("_inv_engines", {}) + if pipeline not in engines: + engines[pipeline] = InvisibleEngine( device=None if device == "auto" else device, pipeline=pipeline, hf_token=hf_token, controlnet_conditioning_scale=controlnet_scale, ) - engine_inv = ctx.obj["_inv_engine"] + engine_inv = engines[pipeline] engine_inv.remove_watermark( img_path if mode == "invisible" else out_path, out_path, @@ -1062,8 +1106,10 @@ def _process_batch_image( seed=seed, humanize=humanize, unsharp=unsharp, + adaptive_polish=adaptive_polish, max_resolution=max_resolution, min_resolution=min_resolution, + upscaler=upscaler, restore_faces=restore_faces, restore_faces_weight=restore_faces_weight, # Detect the vendor from the pristine original (`img_path`), not the @@ -1126,7 +1172,10 @@ def _process_batch_image( @_restore_faces_options @_min_resolution_option @_unsharp_option +@_upscaler_option @_controlnet_scale_option +@_auto_option +@_adaptive_polish_option @click.pass_context def cmd_batch( ctx: click.Context, @@ -1147,6 +1196,9 @@ def cmd_batch( restore_faces: bool, restore_faces_weight: float, controlnet_scale: float, + upscaler: str, + auto: bool, + adaptive_polish: bool, ) -> None: """Process all images in a directory.""" _banner() @@ -1164,6 +1216,8 @@ def cmd_batch( console.print(f" Found {len(images)} images in {directory}") console.print(f" Output -> {output_dir}") console.print(f" Mode: {mode}") + if mode in ("invisible", "all"): + _warn_if_esrgan_unavailable(upscaler) processed = 0 errors = 0 @@ -1202,6 +1256,9 @@ def cmd_batch( restore_faces=restore_faces, restore_faces_weight=restore_faces_weight, controlnet_scale=controlnet_scale, + upscaler=upscaler, + auto=auto, + adaptive_polish=adaptive_polish, ) processed += 1 diff --git a/src/remove_ai_watermarks/invisible_engine.py b/src/remove_ai_watermarks/invisible_engine.py index f3a3b5d..a96af97 100644 --- a/src/remove_ai_watermarks/invisible_engine.py +++ b/src/remove_ai_watermarks/invisible_engine.py @@ -126,6 +126,32 @@ class InvisibleEngine: """Eagerly load the pipeline so download progress is visible.""" self._remover.preload() + def _esrgan_upscale(self, image: Any, target: tuple[int, int]) -> Any: + """Upscale a PIL image to ``target`` with Real-ESRGAN, else Lanczos. + + Runs Real-ESRGAN at its native factor (on the remover's device, CPU fallback), + then resizes to the exact ``target`` with Lanczos. Falls back to a plain Lanczos + resize when the ``esrgan`` extra is absent or the model errors. + """ + import cv2 + import numpy as np + from PIL import Image + + from remove_ai_watermarks import upscaler + + if not upscaler.is_available(): + logger.debug("esrgan upscaler requested but the extra is absent; using Lanczos") + return image.resize(target, Image.Resampling.LANCZOS) + try: + bgr = cv2.cvtColor(np.array(image.convert("RGB")), cv2.COLOR_RGB2BGR) + big = upscaler.upscale(bgr, device=self._remover.device) + if (big.shape[1], big.shape[0]) != target: + big = cv2.resize(big, target, interpolation=cv2.INTER_LANCZOS4) + return Image.fromarray(cv2.cvtColor(big, cv2.COLOR_BGR2RGB)) + except Exception as e: # never let an optional upscaler break removal + logger.warning("Real-ESRGAN upscale failed (%s); using Lanczos", e) + return image.resize(target, Image.Resampling.LANCZOS) + def remove_watermark( self, image_path: Path, @@ -142,6 +168,7 @@ class InvisibleEngine: restore_faces_weight: float = 0.5, unsharp: float = 0.0, adaptive_polish: bool = False, + upscaler: str = "lanczos", ) -> Path: """Remove invisible watermark from an image. @@ -180,6 +207,11 @@ class InvisibleEngine: (default) = on; 0 = off. The output is restored to the original input size, so this is a transparent quality boost; it adds time and memory on small inputs. Ignored on a min > max misconfig. + upscaler: How to upscale a small input to the ``min_resolution`` floor: + ``"lanczos"`` (default, cv2, no deps) or ``"esrgan"`` (Real-ESRGAN + via the ``esrgan`` extra). Only applies when UPscaling (the floor + case); a ``max_resolution`` downscale always uses Lanczos. Falls back + to Lanczos if the extra is absent. Returns: Path to the cleaned image. @@ -202,8 +234,8 @@ class InvisibleEngine: target = _target_size(image.width, image.height, max_resolution, min_resolution) if target is not None: + upscaling = max(target) > max(image.width, image.height) if self._progress_callback: - upscaling = max(target) > max(image.width, image.height) reason = ( f"min-resolution floor {min_resolution}px" if upscaling @@ -211,7 +243,12 @@ class InvisibleEngine: ) verb = "Upscaling" if upscaling else "Downscaling" self._progress_callback(f"{verb} {image.width}x{image.height} to {target[0]}x{target[1]} ({reason})...") - image = image.resize(target, Image.Resampling.LANCZOS) + # Real-ESRGAN only helps when UPscaling (the floor case); a downscale cap + # always uses Lanczos. _esrgan_upscale falls back to Lanczos if the extra is absent. + if upscaling and upscaler == "esrgan": + image = self._esrgan_upscale(image, target) + else: + image = image.resize(target, Image.Resampling.LANCZOS) # Always persist to a temp file, even without downscaling: WatermarkRemover # reloads by path, so the EXIF-transposed pixels must be saved or rotation diff --git a/src/remove_ai_watermarks/upscaler.py b/src/remove_ai_watermarks/upscaler.py new file mode 100644 index 0000000..121c65b --- /dev/null +++ b/src/remove_ai_watermarks/upscaler.py @@ -0,0 +1,125 @@ +"""Optional pre-diffusion super-resolution for small inputs (Real-ESRGAN via spandrel). + +Mirrors ``region_eraser``'s optional-backend pattern: ``is_available()`` guards the +``spandrel`` import, a lazy singleton (double-checked lock) holds the loaded model, and +the weights download on first use (cached by ``torch.hub``) -- they are never bundled. + +The DEFAULT upscaler stays Lanczos (cv2, no deps); this is opt-in via the ``esrgan`` +extra and feeds the ``--upscaler esrgan`` path. ``spandrel`` is a pure model-loader +(MIT) with NO basicsr dependency -- it pulls only torch/torchvision/safetensors/numpy/ +einops -- so it sidesteps the basicsr / ``torchvision.transforms.functional_tensor`` +breakage that the ``restore`` (GFPGAN) extra has to shim. Real-ESRGAN weights are +BSD-3-Clause. + +CPU works but is slow on large inputs, so this is meant for the pre-diffusion upscale of +SMALL inputs (and the GPU worker). On a memory-constrained host it is a no-op (the extra +is absent), and the caller falls back to Lanczos. +""" + +# torch/spandrel boundary: these libs ship no usable element types; relax the +# unknown-type rules for this file only. +# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false +from __future__ import annotations + +import importlib.util +import logging +import threading +from pathlib import Path +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + from numpy.typing import NDArray + +logger = logging.getLogger(__name__) + +# Real-ESRGAN x2plus (BSD-3-Clause), official release. x2 is the right native factor for +# the pre-diffusion floor upscale (small inputs ~512 -> ~1024); spandrel infers the +# architecture and scale from the checkpoint, so swapping the URL is enough to change it. +_MODEL_URL = "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth" +_MODEL_FILENAME = "RealESRGAN_x2plus.pth" + +_model: Any = None # lazy singleton (spandrel ImageModelDescriptor) +_model_device: str = "cpu" +_lock = threading.Lock() + + +def is_available() -> bool: + """True if the ``esrgan`` extra (spandrel + torch) is importable.""" + return importlib.util.find_spec("spandrel") is not None and importlib.util.find_spec("torch") is not None + + +def _model_cache_path() -> Path: + """Path the weights are cached at (the torch.hub checkpoints dir).""" + import torch + + cache_dir = Path(torch.hub.get_dir()) / "checkpoints" + cache_dir.mkdir(parents=True, exist_ok=True) + return cache_dir / _MODEL_FILENAME + + +def _get_model(device: str) -> Any: + """Load the Real-ESRGAN model once (downloading the weights on first use).""" + global _model, _model_device + if _model is not None and _model_device == device: + return _model + with _lock: + if _model is None: + import torch + from spandrel import ImageModelDescriptor, ModelLoader + + dst = _model_cache_path() + if not dst.exists(): + logger.info("Downloading Real-ESRGAN weights to %s", dst) + torch.hub.download_url_to_file(_MODEL_URL, str(dst), progress=False) + model = ModelLoader().load_from_file(str(dst)) + if not isinstance(model, ImageModelDescriptor): + raise RuntimeError(f"Unexpected spandrel model type: {type(model).__name__}") + _model = model.eval() + if _model_device != device: + _model.to(device) + _model_device = device + return _model + + +def scale() -> int: + """The model's native upscale factor (e.g. 2 for x2plus). Loads the model if needed.""" + return int(_get_model("cpu").scale) + + +def upscale(image: NDArray[Any], device: str | None = None) -> NDArray[Any]: + """Upscale a BGR uint8 image by the model's native factor with Real-ESRGAN. + + Returns a BGR uint8 array. Falls back to CPU if the requested device errors (an + MPS/CUDA OOM or unsupported-op on the small pre-diffusion input), mirroring the + diffusion engine's MPS->CPU fallback. + + Raises: + RuntimeError: if the ``esrgan`` extra is not installed (guard with + ``is_available()`` first). + """ + if not is_available(): + raise RuntimeError("Real-ESRGAN upscaler needs the 'esrgan' extra (spandrel). Install it or use Lanczos.") + import cv2 + import numpy as np + import torch + + target_device = (device or "cpu").lower() + if target_device not in {"cpu", "mps", "cuda", "xpu"}: + target_device = "cpu" + rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + tensor = torch.from_numpy(rgb).permute(2, 0, 1).float().div(255.0).unsqueeze(0) + + def _run(dev: str) -> NDArray[Any]: + model = _get_model(dev) + with torch.no_grad(): + out = model(tensor.to(dev)) + arr = out.clamp(0.0, 1.0).squeeze(0).permute(1, 2, 0).cpu().numpy() * 255.0 + return cv2.cvtColor(arr.round().astype(np.uint8), cv2.COLOR_RGB2BGR) + + try: + return _run(target_device) + except Exception as e: # GPU OOM / unsupported op: fall back to CPU + if target_device == "cpu": + raise + logger.warning("Real-ESRGAN on %s failed (%s); retrying on CPU", target_device, e) + return _run("cpu") diff --git a/tests/test_auto_config.py b/tests/test_auto_config.py index 3dadc10..09f2429 100644 --- a/tests/test_auto_config.py +++ b/tests/test_auto_config.py @@ -34,6 +34,26 @@ class TestDetectors: cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3) assert auto_config.edge_density(text) > auto_config.edge_density(blank) + def test_dbnet_detects_text_card(self): + """The bundled PP-OCRv3 DBNet model fires on a clear text card and not on flat.""" + card = np.full((300, 500, 3), 255, dtype=np.uint8) + cv2.putText(card, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4) + assert auto_config._detect_text_dbnet(card) is True + assert auto_config._detect_text_dbnet(np.full((300, 500, 3), 128, dtype=np.uint8)) is False + + def test_detect_text_falls_back_to_mser_when_dbnet_unavailable(self, monkeypatch): + """If DBNet can't load (returns None), detect_text uses the MSER heuristic.""" + monkeypatch.setattr(auto_config, "_detect_text_dbnet", lambda _img: None) + called = {} + + def _fake_mser(_img): + called["mser"] = True + return True + + monkeypatch.setattr(auto_config, "_detect_text_mser", _fake_mser) + assert auto_config.detect_text(np.full((100, 100, 3), 128, dtype=np.uint8)) is True + assert called.get("mser") is True + class TestPlan: def test_unreadable_returns_none(self, tmp_path): diff --git a/tests/test_cli.py b/tests/test_cli.py index de922e4..1eb0a4c 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -514,6 +514,45 @@ class TestBatchCommand: assert out[0, 0, 3] == 0 assert out[100, 100, 3] == 255 + def test_batch_auto_plans_pipeline_per_image(self, runner, tmp_path): + """--auto in batch re-plans the pipeline/restore/polish per image and + builds one engine per resolved pipeline.""" + from remove_ai_watermarks import auto_config + + input_dir = _make_batch_dir(tmp_path, count=2) + output_dir = tmp_path / "output" + plan = auto_config.AutoConfig( + pipeline="controlnet", + restore_faces=True, + adaptive_polish=True, + unsharp=0.0, + humanize=0.0, + min_resolution=1024, + has_face=True, + has_text=False, + edge_density=0.05, + width=200, + height=200, + ) + mock_cls, mock_engine = _mock_invisible_engine() + with ( + patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), + patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), + patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True), + patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), + patch("remove_ai_watermarks.auto_config.plan", return_value=plan), + ): + result = runner.invoke( + main, + ["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto"], + ) + assert result.exit_code == 0, result.output + assert "2 processed" in result.output + # Engine built with the auto-resolved controlnet pipeline. + assert mock_cls.call_args.kwargs["pipeline"] == "controlnet" + # The auto plan's adaptive polish reached the engine call. + assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True + def test_batch_default_output_dir(self, runner, tmp_path): input_dir = _make_batch_dir(tmp_path) result = runner.invoke( diff --git a/tests/test_invisible_engine.py b/tests/test_invisible_engine.py index c7ad063..c46991d 100644 --- a/tests/test_invisible_engine.py +++ b/tests/test_invisible_engine.py @@ -101,3 +101,70 @@ class TestTargetSize: # min(1024) > max(800) is a misconfig: the floor must not upscale above the # cap, so it is skipped and the (within-cap) input stays native. assert _target_size(500, 400, 800, 1024) is None + + +class TestEsrganUpscale: + """Branches of InvisibleEngine._esrgan_upscale (no diffusion model loaded). + + A SimpleNamespace stands in for the engine so we exercise the helper without + constructing a real InvisibleEngine (which would load WatermarkRemover). + """ + + @staticmethod + def _fake_engine(): + from types import SimpleNamespace + + return SimpleNamespace(_remover=SimpleNamespace(device="cpu")) + + @staticmethod + def _pil(w=120, h=80): + import numpy as np + from PIL import Image + + return Image.fromarray(np.full((h, w, 3), 128, dtype=np.uint8)) + + def test_falls_back_to_lanczos_when_extra_absent(self, monkeypatch): + import numpy as np + from PIL import Image + + from remove_ai_watermarks import upscaler + + monkeypatch.setattr(upscaler, "is_available", lambda: False) + img = self._pil() + out = InvisibleEngine._esrgan_upscale(self._fake_engine(), img, (1024, 683)) + assert out.size == (1024, 683) + # Identical to a plain Lanczos resize (the fallback path). + assert np.array_equal(np.asarray(out), np.asarray(img.resize((1024, 683), Image.Resampling.LANCZOS))) + + def test_resizes_esrgan_output_to_exact_target(self, monkeypatch): + import cv2 + + from remove_ai_watermarks import upscaler + + monkeypatch.setattr(upscaler, "is_available", lambda: True) + + # Fake a 2x upscale that does NOT match the requested target; the helper must + # resize it to the exact target. + def _fake_upscale(bgr, device=None): + return cv2.resize(bgr, (bgr.shape[1] * 2, bgr.shape[0] * 2), interpolation=cv2.INTER_NEAREST) + + monkeypatch.setattr(upscaler, "upscale", _fake_upscale) + out = InvisibleEngine._esrgan_upscale(self._fake_engine(), self._pil(), (1024, 683)) + assert out.size == (1024, 683) + + def test_falls_back_to_lanczos_when_upscale_raises(self, monkeypatch): + import numpy as np + from PIL import Image + + from remove_ai_watermarks import upscaler + + monkeypatch.setattr(upscaler, "is_available", lambda: True) + + def _boom(bgr, device=None): + raise RuntimeError("model exploded") + + monkeypatch.setattr(upscaler, "upscale", _boom) + img = self._pil() + out = InvisibleEngine._esrgan_upscale(self._fake_engine(), img, (512, 341)) + assert out.size == (512, 341) + assert np.array_equal(np.asarray(out), np.asarray(img.resize((512, 341), Image.Resampling.LANCZOS))) diff --git a/tests/test_upscaler.py b/tests/test_upscaler.py new file mode 100644 index 0000000..e504c44 --- /dev/null +++ b/tests/test_upscaler.py @@ -0,0 +1,32 @@ +"""Tests for the optional Real-ESRGAN upscaler (no model download). + +The model-running path is exercised manually (it downloads ~67 MB of BSD-3-Clause +weights on first use); these tests cover the availability guard and the no-model +control flow, mirroring the repo convention for ML-adjacent modules. +""" + +from __future__ import annotations + +import numpy as np +import pytest + +from remove_ai_watermarks import upscaler + + +class TestIsAvailable: + def test_returns_bool(self): + assert isinstance(upscaler.is_available(), bool) + + +class TestUpscaleGuard: + def test_raises_without_extra(self, monkeypatch): + monkeypatch.setattr(upscaler, "is_available", lambda: False) + with pytest.raises(RuntimeError, match="esrgan"): + upscaler.upscale(np.full((32, 32, 3), 128, dtype=np.uint8)) + + +class TestModelCachePath: + def test_cache_path_uses_model_filename(self): + if not upscaler.is_available(): + pytest.skip("esrgan extra (torch) not installed") + assert upscaler._model_cache_path().name == upscaler._MODEL_FILENAME diff --git a/uv.lock b/uv.lock index 345075b..5dc83a1 100644 --- a/uv.lock +++ b/uv.lock @@ -3075,6 +3075,9 @@ dev = [ { name = "pytest-cov" }, { name = "ruff" }, ] +esrgan = [ + { name = "spandrel" }, +] gpu = [ { name = "accelerate" }, { name = "diffusers" }, @@ -3125,12 +3128,13 @@ requires-dist = [ { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.4.0" }, { name = "safetensors", marker = "extra == 'gpu'" }, { name = "scipy", marker = "extra == 'restore'", specifier = "<1.18" }, + { name = "spandrel", marker = "extra == 'esrgan'", specifier = ">=0.3.0" }, { name = "tokenizers", marker = "extra == 'gpu'", specifier = ">=0.22,<0.23" }, { name = "torch", marker = "extra == 'gpu'", specifier = ">=2.0.0" }, { name = "transformers", marker = "extra == 'gpu'", specifier = ">=5,<6" }, { name = "trustmark", marker = "extra == 'trustmark'", specifier = ">=0.8.0" }, ] -provides-extras = ["gpu", "detect", "trustmark", "lama", "restore", "dev", "all"] +provides-extras = ["gpu", "detect", "trustmark", "lama", "restore", "esrgan", "dev", "all"] [[package]] name = "requests" @@ -3494,6 +3498,23 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" }, ] +[[package]] +name = "spandrel" +version = "0.4.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "einops" }, + { name = "numpy" }, + { name = "safetensors" }, + { name = "torch" }, + { name = "torchvision" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/2a/8f/ab4565c23dd67a036ab72101a830cebd7ca026b2fddf5771bbf6284f6228/spandrel-0.4.2.tar.gz", hash = "sha256:fefa4ea966c6a5b7721dcf24f3e2062a5a96a395c8bedcb570fb55971fdcbccb", size = 247544, upload-time = "2026-02-21T01:52:26.342Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/74/31/411ea965835534c43d4b98d451968354876e0e867ea1fd42669e4cca0732/spandrel-0.4.2-py3-none-any.whl", hash = "sha256:6c93e3ecbeb0e548fd2df45a605472b34c1614287c56b51bb33cdef7ae5235b5", size = 320811, upload-time = "2026-02-21T01:52:25.015Z" }, +] + [[package]] name = "sympy" version = "1.14.0"