diff --git a/CLAUDE.md b/CLAUDE.md index a9be3f5..9c83850 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,13 +4,14 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r ## How to run -- `uv run remove-ai-watermarks all -o ` +- `uv run remove-ai-watermarks all -o ` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. +- `uv run remove-ai-watermarks invisible -o ` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), and `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). - `uv run remove-ai-watermarks visible -o ` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. - `uv run remove-ai-watermarks erase --region x,y,w,h -o ` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable. - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector - `uv run remove-ai-watermarks metadata --check` — inspect AI metadata (C2PA, EXIF, PNG chunks) - `uv run remove-ai-watermarks metadata --remove -o ` — strip all AI metadata -- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the same `--strength`/`--steps`/`--pipeline`/`--controlnet-scale`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token` knobs as `invisible`, `--inpaint/--no-inpaint` for the visible pass, `--humanize` for the Analog Humanizer + `--unsharp` for the final sharpening post-filter, and `--auto` (+ `--adaptive-polish/--no-adaptive-polish`) for the content-adaptive quality mode (re-planned per image; one engine cached per resolved pipeline) +- `uv run remove-ai-watermarks batch ` — process every supported image in a directory (output defaults to `_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the **full `invisible` knob set above** (`--strength`/`--steps`/`--guidance-scale`/`--pipeline`/`--controlnet-scale`/`--model`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token`/`--humanize`/`--unsharp`/`--adaptive-polish`), plus `--inpaint/--no-inpaint` for the visible pass. `--adaptive-polish` is ON by default; `--auto` is deprecated and a no-op that only warns. One engine cached per pipeline; the polish is resolved once before the loop. ## Test and lint @@ -28,7 +29,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - GPU/ML modules (invisible_engine, watermark_remover) are optional — guard imports with `is_available()` checks - Optional detection extras: `detect` (imwatermark — open SD/SDXL/FLUX watermark) and `trustmark` (Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded by `is_available()` and skipped by `identify` when absent. - Optional `esrgan` extra (spandrel only): Real-ESRGAN pre-diffusion super-resolution for small inputs (`upscaler.py`, CLI `--upscaler esrgan` on `invisible`/`all`/`batch`). Guarded by `upscaler.is_available()`; the default upscaler stays Lanczos (cv2, no deps) and the engine falls back to Lanczos when the extra is absent or the model errors. spandrel is MIT and pulls NO basicsr (only torch/torchvision/safetensors/numpy/einops); Real-ESRGAN weights are BSD-3-Clause and download on first use via `torch.hub` (never bundled). Kept OUT of `all` (heavy + model download). -- Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask`/`adaptive_polish` (`test_humanizer.py`), `auto_config.plan`/detectors (`test_auto_config.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do. +- Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask`/`adaptive_polish` (`test_humanizer.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do. ## Key modules @@ -44,8 +45,8 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `region_eraser.py` — universal region eraser (`erase` CLI). `erase(image, boxes=|mask=, backend=)` accepts grayscale (2D) and RGBA (4-channel) inputs on **both** backends (`erase_cv2` and `erase_lama` each split off any alpha plane and re-attach it unchanged, and promote grayscale to BGR for processing — LaMa would otherwise crash on grayscale and drop alpha on BGRA): `boxes_to_mask` → `cv2.inpaint` (`cv2` backend, default, no deps) or big-LaMa via onnxruntime (`lama` backend, extra `lama`, `Carve/LaMa-ONNX` Apache-2.0 model downloaded on first use, never bundled). `erase_lama` crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy `_get_lama_session` singleton; `lama_available()` guards the optional import. **LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU** (FFC working set, not arena — `enable_cpu_mem_arena=False` does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal. - `invisible_watermark.py` — `detect_invisible_watermark(path)` decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via the `imwatermark` library. Known fixed patterns (verified against upstream source) live in `_BITS_48` (SDXL 48-bit, FLUX.2 48-bit) and `_SD1_STRING` ("StableDiffusionV1", SD 1.x/2.x). Optional dep (extra `detect`); returns None when absent. The `detect` extra pulls **torch** transitively (invisible-watermark declares torch a hard dep, and `WatermarkDecoder` eagerly imports `rivaGan` -> `torch` at import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separate `gpu` extra required. **Unlike SynthID this is locally detectable**, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs. - `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton guarded by a double-checked `threading.Lock` so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise. -- `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.** No original pixels are copied or frozen, BUT **validation 2026-06-04 disproved the old "so SynthID does not survive" claim: SynthID CAN survive controlnet on photoreal/high-detail content.** At the shared low removal strength the canny edge-conditioning keeps the regeneration so close to the original that the pixel perturbation that destroys SynthID does not happen (oracle-confirmed: an OpenAI bracelet photo + a 9-face grid read **SynthID-detected** after controlnet at strength 0.10/0.15, but **SynthID-not-detected** after the `default` pipeline at the SAME strength + resolution -- only the pipeline differed). **But the reverse also holds: a flat-graphic logo/poster SURVIVED `default` while clearing controlnet** -- removal at the low strength is content×pipeline dependent and neither pipeline is universally safe; the real lever is a higher strength. See the controlnet Known-limitations bullet for the full table + root cause. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity). The drifted cleaned face is the LEAST-AI state we can reach without re-introducing SynthID; the library does NOT ship a face-restore extra. Every restore approach we evaluated (GFPGAN-on-cleaned, PhotoMaker-V2 txt2img, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps, 2026-06-04 - 2026-06-08 Modal cert sweeps) regenerated the face from an ArcFace embedding via SDXL diffusion -- which makes the output face look MORE AI-generated, not less. Empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md` "Empirical follow-up". For production face preservation, ship the cleaned image as-is. `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). -- `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). **CAVEAT (oracle-validated 2026-06-04, see the controlnet Known-limitations bullet): at the low vendor-adaptive strength NEITHER pipeline removes SynthID on all content -- it is content×pipeline dependent (photoreal SURVIVES controlnet / clears default; flat graphics SURVIVE default / clear controlnet; flat text clears both). So `--auto` picking controlnet for faces/photos leaves SynthID on exactly those, and plain `default` would leave it on flat graphics -- pipeline choice alone does NOT guarantee removal. The real lever is a HIGHER strength, oracle-validated per content type. Removal-priority callers (raiw.cc) must oracle-validate strength across content types BEFORE adopting auto; the "must keep SynthID removed" gate in the adoption note below is the blocker this caught.** When the controlnet smoothing pipeline ran, the **adaptive polish** (`humanizer.adaptive_polish`) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while **sparing text** (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, **DBNet** (PP-OCRv3 differentiable-binarization via `cv2.dnn.TextDetectionModel_DB`, a 2.4 MB Apache-2.0 model bundled at `assets/text_detection_ppocrv3_2023may.onnx`) for text, with the old Canny+MSER region heuristic kept as a fallback if the DBNet model can't load (`_detect_text_dbnet` returns None → `_detect_text_mser`). The en/cn opencv_zoo PP-OCRv3 detection models are byte-identical, so it is bundled language-neutral. Text only ever ADDS controlnet, so a miss is backstopped by edge-density and a false positive only costs a controlnet run. Plus `edge_density`. `min_resolution` stays 1024. **Every auto decision is independently overridable** (interface principle): `_apply_auto` (cli.py) overrides only the content-adaptive modes the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — `--pipeline` and **`--adaptive-polish`/`--no-adaptive-polish`** always win; `--min-resolution`/`--strength`/`--unsharp`/`--humanize` are independent knobs. `--adaptive-polish` also works WITHOUT `--auto` (manual detail-targeted polish; the engine's `adaptive_polish` param uses the full-res original as the detail reference). Prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible`/`cmd_batch` — in `batch` the plan is recomputed per image and the invisible engine is cached **per resolved pipeline** (`ctx.obj["_inv_engines"]`, keyed `default`/`controlnet`) instead of a single shared instance, so a mixed directory builds at most one engine of each kind. **Adds ZERO new pip deps** (all cv2 core + the bundled MIT YuNet + Apache-2.0 DBNet models + the cv2-only adaptive polish). The auto plan does NOT select the `esrgan` upscaler (that needs the optional extra and would make auto's behavior install-dependent); `--upscaler esrgan` stays a separate manual knob. Unit-tested without a heavy download (`tests/test_auto_config.py`): flat/text synthetic images for routing (the bundled DBNet fires on a real text card), monkeypatched `detect_face`/`_detect_text_dbnet`/`_detect_text_mser` for the face/text/fallback branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. +- `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights). **`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.** No original pixels are copied or frozen, BUT **validation 2026-06-04 disproved the old "so SynthID does not survive" claim: SynthID CAN survive controlnet on photoreal/high-detail content.** At the shared low removal strength the canny edge-conditioning keeps the regeneration so close to the original that the pixel perturbation that destroys SynthID does not happen (oracle-confirmed: an OpenAI bracelet photo + a 9-face grid read **SynthID-detected** after controlnet at strength 0.10/0.15, but **SynthID-not-detected** after the `default` pipeline at the SAME strength + resolution -- only the pipeline differed). **But the reverse also holds: a flat-graphic logo/poster SURVIVED `default` while clearing controlnet** -- removal at the low strength is content×pipeline dependent and neither pipeline is universally safe; the real lever is a higher strength. See the controlnet Known-limitations bullet for the full table + root cause. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity). The drifted cleaned face is the LEAST-AI state we can reach without re-introducing SynthID; the library does NOT ship a face-restore extra. Every restore approach we evaluated (GFPGAN-on-cleaned, PhotoMaker-V2 txt2img, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps, 2026-06-04 - 2026-06-08 Modal cert sweeps) regenerated the face from an ArcFace embedding via SDXL diffusion -- which makes the output face look MORE AI-generated, not less. Empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md` "Empirical follow-up". For production face preservation, ship the cleaned image as-is. `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). +- **`auto_config.py` + the content-detection layer were REMOVED 2026-06-09.** History: `auto_config.plan()` was a content-adaptive planner that detected faces/text/edges (bundled OpenCV YuNet + PP-OCRv3 DBNet models) to route the pipeline and toggle the adaptive polish. Once `controlnet` became the default-and-only auto pipeline (it no longer downgrades a structure-less image to `sdxl`) and the adaptive polish was confirmed to **self-gate by detail level** (`humanizer.adaptive_polish` no-ops when the cleaned image already meets the input's Laplacian variance, so it does real work only on over-smoothed photo/face texture and ~nothing on text/flat), the detection no longer changed any behavior — it only annotated a `reason` string. So the whole layer was deleted: `auto_config.py`, `tests/test_auto_config.py`, and the two detection assets (`assets/face_detection_yunet_2023mar.onnx`, `assets/text_detection_ppocrv3_2023may.onnx`, ~2.6 MB). **`--auto` is now a DEPRECATED no-op** (`cli._resolve_auto_polish`): controlnet is already the default pipeline AND the adaptive polish is ON by default, so `--auto` has nothing left to do — it only prints a deprecation warning and passes `adaptive_polish` through unchanged (an explicit `--no-adaptive-polish` still wins). (Originally it re-enabled the polish; once the polish default flipped to ON the same day, the parameter-source branch became dead and was dropped.) The **adaptive polish itself lives on** in `humanizer.adaptive_polish` (CLI `--adaptive-polish/--no-adaptive-polish`, **ON by default since 2026-06-09** — it self-gates to a no-op where there is no detail deficit, so default-on is safe; uses the full-res original as the detail reference) — see the `humanizer` test note. `batch` resolves the polish once before the loop (one warning) and caches the invisible engine per pipeline (`ctx.obj["_inv_engines"]`). - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps). **ESRGAN is a generic photo/texture GAN with no face/glyph prior**, so it best fits photo/texture content and can degrade faces (glassy/asymmetric eyes -- the diffusion pass regenerates faces so the full-pipeline final recovers) and thin/small text (the GAN invents wrong strokes, and low-strength diffusion will not fix it). Verified 2026-06-04: isolated upscale lap-var ~5x Lanczos on faces+textures but glassy eyes; end-to-end `invisible` final lap-var 1634 vs Lanczos 663 with natural faces (diffusion cleaned the artifact). Kept a **manual opt-in knob** (the auto plan never selects it) with `lanczos` the default; not content-gated by design (use Lanczos for text-heavy inputs). spandrel is MIT and pulls no basicsr. Unit-tested without the model: `tests/test_upscaler.py` (availability guard + the not-installed RuntimeError) and `tests/test_invisible_engine.py::TestEsrganUpscale` (the three `_esrgan_upscale` branches via a monkeypatched `upscaler`). - `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** `imwrite` returns `False` on an unwritable path (`OSError` caught) instead of raising, matching `cv2.imwrite` semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env. @@ -88,6 +89,6 @@ Who embeds what, and whether it is locally detectable (so we know which gaps are - **SynthID detection is metadata-only.** There is no reliable *local* detector of the SynthID *pixel* watermark — Google's decoder is proprietary, no public spec or API (only a waitlisted portal). Authoritative confirmation: Google DeepMind's own paper "SynthID-Image: Image watermarking at internet scale" (Gowal et al., arXiv:2510.09263) states the verification service is restricted to "trusted testers" and does not release detector weights or a reproducible algorithm — so a local pixel detector is infeasible by design, not just unbuilt. https://arxiv.org/abs/2510.09263 We detect SynthID by its C2PA companion (`synthid_source` / `SYNTHID_C2PA_ISSUERS`), which is reliable while the manifest is intact but says nothing once C2PA is stripped. **Surface-dependent blind spot (verified 2026-05-24):** the same Google model emits different metadata per surface -- the Gemini *app* wraps outputs in Google C2PA, but the *API/playground* (AI Studio, Nano Banana / gemini-2.5-flash-image) emits the SynthID *pixel* watermark (confirmed via the Gemini-app oracle) + the visible sparkle but **no C2PA/IPTC at all**, so `synthid_source` returns None despite SynthID being present. Only the pixel oracle or the visible-sparkle detector catches those. (Meta AI is another surface mismatch: it writes the IPTC `digitalSourceType=trainedAlgorithmicMedia` marker, not C2PA and not SynthID.) Google→SynthID is long-standing; OpenAI→SynthID is confirmed by OpenAI's Help Center (ChatGPT/Codex/API "include both C2PA metadata and SynthID watermarks", updated 2026-05-21) but time-gated (pre-rollout OpenAI images carry C2PA without SynthID), so the OpenAI verdict is hedged "likely". Oracles: Gemini app "Verify with SynthID" (Google), openai.com/verify (OpenAI). **Each vendor's oracle detects only its OWN content (verified on the page 2026-05-31):** `openai.com/research/verify` states verbatim "OpenAI generation signals will only be detected if the image was generated with our tools" and "Content could also still be AI-generated by another company's model, which the tool currently does not detect" -- SynthID is shared tech but the verifier is keyed to its own vendor's payload, so a Google-SynthID image reads clean on OpenAI's verifier and vice-versa. **This explains the recurring "oracle says clean but `identify` still flags SynthID" report (#14):** the oracle reads the *pixel* watermark (gone after our SDXL pass), while `identify` reads the *C2PA-metadata proxy* (still present if the manifest survived). Different signals, not a contradiction -- strip the metadata too (`metadata --remove` / `all`) and the proxy goes quiet, but a quiet proxy is not proof the pixel watermark is gone. **SynthID is durable to JPEG re-encode by design, so a GitHub-recompressed issue attachment is still a valid SynthID test subject** (verified 2026-06-01 on issue #14's pic3: the GitHub-served JPEG survived re-encoding and openai.com/verify still detected SynthID). Do NOT dismiss issue-attachment JPEGs as "not faithful originals" when reproducing a SynthID-survival report: the recompression strips the **C2PA metadata** (so `identify` reads Unknown on the attachment) but NOT the **pixel watermark** that openai.com/verify reads. A true byte-original only matters for the metadata/C2PA path, not for the pixel-SynthID-removal test. (Contrast the open imwatermark above, which IS fragile to JPEG.) The spectral phase-coherence approach from `github.com/aloshdenny/reverse-SynthID` was evaluated (May 2026) and **does not work for real-content detection**: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation **0.92**, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24, `scripts/synthid_pixel_probe.py`, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate *content* (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content. **Correction (deeper re-examination 2026-05-25):** the carrier IS real on solid fills — the earlier "no carrier" was a *method* artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed *phase* at specific low frequencies, so the right metric is **per-bin phase coherence**. On 8 white `gemini-2.5-flash-image` fills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers `(0,±7..±12,±20..±23)` = **0.86** vs **0.31** random; single-image leave-one-out phase-match **+0.83** vs real photos **-0.24**. (Black `2.5-flash` fills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.) **But it does not generalize:** (a) carriers are model-version + resolution + color specific — the repo's v4 codebook (built for `gemini-3.1-flash-image-preview` + `nano-banana-pro-preview`) scores ~0.527 on my 2.5-flash white fills, indistinguishable from negatives (~0.50), i.e. carriers shift across model versions and need a per-model codebook; (b) on real content (30 `2.5-flash` images) the carrier collapses — set phase coherence at carriers 0.37 ≈ random 0.42, and the repo's v4 detector gives content 0.518 ≈ negatives 0.504 (no separation; a faint +0.24 single-image lean is likely a brightness confound). Net: the spectral/phase approach is a real *controlled-fill* characterizer, NOT an arbitrary-real-content detector, and is brittle to model version. Metadata proxy + visible sparkle + online oracles remain the ceiling for real content. - **External AI-vs-real classifier models are out of scope (decided 2026-05-24).** Generic HuggingFace detectors (`Organika/sdxl-detector` Swin Transformer, `umm-maybe/AI-image-detector`, and fine-tunes) exist and report ~0.98 on their *own* SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency. -- **DEFAULT STRENGTH IS NOW VENDOR-ADAPTIVE (2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next).** `resolve_strength(strength, profile, vendor)` + `vendor_for_strength(path)` (`watermark_profiles.py`) read the C2PA issuer (`metadata.synthid_source`) on the ORIGINAL input and pick `OPENAI_STRENGTH` **0.10** / `GEMINI_STRENGTH` **0.15** / `UNKNOWN_STRENGTH` **0.15** when `--strength` is unset; explicit `--strength` always wins. The CLI detects the vendor from the pristine source (before the visible pass / metadata-strip removes C2PA from the temp file) and passes it to the engine, so display and execution agree; `cmd_invisible`/`cmd_all`/`batch` + the module-level `remove_watermark` all thread `vendor`. **This replaces the single 0.30 default AND the prior "do NOT build a vendor-adaptive default" policy** -- both came from the now-debunked region-rescrub-contaminated study (the per-region re-scrub that contaminated those numbers was removed in the controlnet refactor). Basis: the oracle-verified June 2026 controlled study (clean v0.8.6, protect OFF): OpenAI clears at 0.05 across 1024-1600 (n=4, resolution-independent); Google needs 0.15 on the capped-1536 path (n=4). `docs/synthid.md` §2.2 (data) + §5.2 (the adaptive default) are authoritative. **CAVEAT (oracle pass 2026-06-04): the OpenAI 0.10 default is content-dependent, NOT universal -- a flat-graphic OpenAI logo/poster still read SynthID-detected after `default` at 0.10, and photoreal images after controlnet at 0.10/0.15 (low-change regions under-perturbed). Removal at 0.10/0.15 is content×pipeline dependent (see the controlnet Known-limitations bullet); the lever is a higher strength, oracle-revalidated per content type. Do NOT assume the vendor-adaptive default clears every image.** CAVEAT: Google's 0.15 was validated only on `--max-resolution 1536`; native large Gemini (2816) was not locally measurable (OOM on M-series) and is pending GPU validation on raiw.cc -- if it survives 0.15 native, raise `--strength`. **Everything below in this bullet about a fixed 0.10/0.30 default is HISTORICAL; trust the vendor-adaptive constants + docs/synthid.md.** -- **SynthID removal: strength + oracle scope.** Default strength is vendor-adaptive (see the bullet above); `docs/synthid.md` §2.2 is authoritative for the numbers. **Oracle scope (load-bearing):** the Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle (detects Google's mark on any image); `openai.com/verify` is scoped to OpenAI provenance (its own C2PA), NOT a SynthID oracle -- a negative there is meaningless for SynthID. There is no local SynthID detector, so the tool cannot self-check; if the oracle still reads SynthID, raise `--strength` to the lowest value that verifies clean. Only the `default` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet) profiles exist; the local `invisible` default is weight-for-weight identical to raiw.cc prod (`fal-ai/fast-sdxl` = `stabilityai/stable-diffusion-xl-base-1.0`, runtime-downloaded, not bundled). **Forensic-stealth caveat** (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility -- independent detectors flag *removal-processed* images vs genuinely-clean ones at >98% TPR@1%FPR, so do not over-claim "indistinguishable from a real photo". -- **`controlnet` pipeline (text/face STRUCTURE preservation, EXPERIMENTAL, opt-in `--pipeline controlnet`).** SDXL + the canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` via `StableDiffusionXLControlNetImg2ImgPipeline` (`watermark_remover._run_controlnet` / `_load_controlnet_pipeline`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map** (`cv2.Canny(gray, 100, 200)`, 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness). The drifted cleaned face is the LEAST-AI state we can reach without re-introducing SynthID; **the library does NOT ship a face-restore extra** (every approach evaluated 2026-06-04 - 2026-06-08 -- GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps -- regenerated the face via SDXL and made it look MORE AI-generated). Full empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md` "Empirical follow-up". For production face preservation, ship the cleaned image as-is. No original pixels are copied or frozen, **BUT removal at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated against the OpenAI verifier 2026-06-04 (8 images, strength 0.10/0.15, `--max-resolution 1536`).** The survivors FLIP by content type: **photoreal** (a 9-face grid, a bracelet product photo) SURVIVES controlnet but CLEARS `default` (controlnet's dense edge map keeps the regen too close to the original, so the SynthID-destroying perturbation never happens; plain img2img perturbs photoreal texture enough); **flat graphic** (a logo/poster with large flat color fills) SURVIVES `default` but CLEARS controlnet (at low strength img2img barely changes flat fills so SynthID persists there, while controlnet repaints them more freely); a flat **text** card cleared under both. **Root cause is insufficient STRENGTH, not the pipeline: at 0.10 the low-change regions -- dense-edge photoreal under controlnet, large flat fills under `default` -- are not perturbed enough to destroy SynthID. The vendor-adaptive 0.10 from the June study is NOT universally sufficient (that study's content happened to clear at 0.10).** The robust fix is a HIGHER strength, oracle-revalidated per content type (controlnet can be cranked harder without losing structure; a lower `controlnet_conditioning_scale` also frees the regen on photoreal). So at today's default strength **both pipelines AND `--auto` can LEAVE SynthID on some content** -- a removal-priority caller (raiw.cc) MUST oracle-validate strength across content types before adopting, not pick a pipeline and assume removal. **Follow-up same day: re-running the two photoreal survivors through controlnet at an explicit `--strength 0.15` cleared BOTH on the oracle -- BUT one of them (the bracelet) had SURVIVED the SAME 0.15 controlnet config in the first pass (only the random, unset seed differed). So removal near the threshold is SEED-NON-DETERMINISTIC: the same image+pipeline+strength+resolution can pass or fail run-to-run (img2img uses `seed=None`/random unless `--seed` is passed, and there is no local SynthID detector to self-verify). 0.15 is the borderline, NOT a robust floor -- pick a strength with MARGIN (controlnet ~>= 0.20) rather than exactly on it; the content×pipeline table's 0.15 data point is near-threshold noise. A confirming run at `--strength 0.20` controlnet cleared BOTH photoreal survivors on the oracle (ladder: 0.10 grid detected → 0.15 borderline/non-deterministic → 0.20 both clean), so **0.20 is the recommended robust controlnet floor for OpenAI photoreal** (one margin run, not an N-run repeatability proof -- a service should add margin or verify repeatability since there is no local SynthID detector to self-check). **Engineering follow-up for raiw.cc: the controlnet pipeline should use a HIGHER vendor strength than `default` -- it currently shares `resolve_strength` (0.10/0.15, tuned for plain img2img), but controlnet's edge map preserves structure so it needs ~0.20+; calibrate per vendor/content on the GPU worker, do NOT just reuse the `default` ladder.** **CERTIFIED 2026-06-04 via the isolated `raiw-controlnet-cert` Modal app (`raiw-app/modal_cert.py`), restore OFF, ≤1536, each vendor on its own oracle: controlnet floors are OpenAI 0.20 (2 photoreal × 3 seeds = 6/6 clean; the 0.15-flipper is seed-robust at 0.20) and Gemini 0.30 (0.20 detected → 0.30 clean on 2/2 seeds). OpenAI 0.20 transfers to prod (resolution-independent); Gemini 0.30 holds only ≤1536 — Gemini is resolution-sensitive and raiw.cc runs NATIVE (`max_resolution=0`), so cap Gemini ≤1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe: controlnet + per-vendor floor in `resolve_strength` (not the default ladder) + FIXED seed (kills the non-determinism). **No face-restore in the library:** every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned, 2026-06-04 - 2026-06-08 cert sweeps) regenerated the face via SDXL diffusion -- the output face inherited SDXL "clean skin" gloss and lost original identity precision, looking MORE AI-generated than the cleaned image, not less. The drifted face from controlnet 0.20 is the least-AI state we can reach; for a paid service that's the prod output. See `docs/synthid-robust-identity-research-2026-06-08.md` "Empirical follow-up".** See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (certified floors table).** **Lesson: visual-quality + face-recovery validation does NOT prove watermark removal -- only the SynthID oracle does, across MULTIPLE content types; never infer removal from sharpness/identity, and never conclude from a partial result (the photoreal-only data first read as "controlnet shields, default removes" -- the flat-graphic result reversed it).** `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. +- **DEFAULT STRENGTH IS VENDOR-ADAPTIVE, ONE LADDER FOR BOTH PIPELINES (raised + unified 2026-06-09; vendor-adaptive since 2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next).** `resolve_strength(strength, vendor)` + `vendor_for_strength(path)` (`watermark_profiles.py`) read the C2PA issuer (`metadata.synthid_source`) on the ORIGINAL input and pick `OPENAI_STRENGTH` **0.20** / `GEMINI_STRENGTH` **0.30** / `UNKNOWN_STRENGTH` **0.30** when `--strength` is unset; explicit `--strength` always wins. **The SAME ladder applies to BOTH pipelines** (`sdxl` and `controlnet`) -- these are the 2026-06-04 Modal-cert controlnet floors. **Why one ladder (NOT a per-pipeline split):** the cert was run on controlnet and does NOT transfer to `sdxl` by symmetry (opposite hard cases -- controlnet leaves SynthID on photoreal, `sdxl` on flat graphics), BUT on its OWN hard case (flat fills) `sdxl` is the WEAKER remover (plain img2img barely perturbs a flat region at low strength), so it needs AT LEAST controlnet's strength -- hence the certified floor is the right floor for `sdxl` too. It is a MARGIN argument for `sdxl`, not a fresh certification (no local SynthID detector to self-verify); raise `--strength` if an oracle still reads a flat `sdxl` output. The higher strength costs little quality because `controlnet` is now the default pipeline AND the only `--auto` pick, so `sdxl` is reached only via an explicit `--pipeline sdxl` (a deliberate opt-down for inputs without faces/text), where over-regeneration has nothing to damage. (A short-lived per-pipeline split ladder -- `sdxl` 0.15/0.20 vs controlnet 0.20/0.30 -- existed on 2026-06-09 before being unified the same day; the `resolve_strength` `pipeline` param and the `CONTROLNET_*_STRENGTH` constants were removed.) The CLI detects the vendor from the pristine source (before the visible pass / metadata-strip removes C2PA from the temp file) and passes it to display calls so display and execution agree; `cmd_invisible`/`cmd_all`/`batch` thread `vendor`. **This replaces the single 0.30 default AND the prior "do NOT build a vendor-adaptive default" policy** -- both came from the now-debunked region-rescrub-contaminated study (the per-region re-scrub that contaminated those numbers was removed in the controlnet refactor). Basis: the oracle-verified June 2026 controlled study (clean v0.8.6, protect OFF): OpenAI clears at 0.05 across 1024-1600 (n=4, resolution-independent); Google needs 0.15 on the capped-1536 path (n=4). `docs/synthid.md` §2.2 (data) + §5.2 (the adaptive default) are authoritative. **CAVEAT (oracle pass 2026-06-04): the OpenAI 0.10 default is content-dependent, NOT universal -- a flat-graphic OpenAI logo/poster still read SynthID-detected after `default` at 0.10, and photoreal images after controlnet at 0.10/0.15 (low-change regions under-perturbed). Removal at 0.10/0.15 is content×pipeline dependent (see the controlnet Known-limitations bullet); the lever is a higher strength, oracle-revalidated per content type. Do NOT assume the vendor-adaptive default clears every image.** CAVEAT: Google's 0.15 was validated only on `--max-resolution 1536`; native large Gemini (2816) was not locally measurable (OOM on M-series) and is pending GPU validation on raiw.cc -- if it survives 0.15 native, raise `--strength`. **Everything below in this bullet about a fixed 0.10/0.30 default is HISTORICAL; trust the vendor-adaptive constants + docs/synthid.md.** +- **SynthID removal: strength + oracle scope.** Default strength is vendor-adaptive (see the bullet above); `docs/synthid.md` §2.2 is authoritative for the numbers. **Oracle scope (load-bearing):** the Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle (detects Google's mark on any image); `openai.com/verify` is scoped to OpenAI provenance (its own C2PA), NOT a SynthID oracle -- a negative there is meaningless for SynthID. There is no local SynthID detector, so the tool cannot self-check; if the oracle still reads SynthID, raise `--strength` to the lowest value that verifies clean. Only the `sdxl` (plain SDXL img2img; `default` is a back-compat alias) and `controlnet` (SDXL + canny ControlNet) profiles exist; the local `invisible` default is weight-for-weight identical to raiw.cc prod (`fal-ai/fast-sdxl` = `stabilityai/stable-diffusion-xl-base-1.0`, runtime-downloaded, not bundled). **Forensic-stealth caveat** (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility -- independent detectors flag *removal-processed* images vs genuinely-clean ones at >98% TPR@1%FPR, so do not over-claim "indistinguishable from a real photo". +- **`controlnet` pipeline (text/face STRUCTURE preservation, THE DEFAULT since 2026-06-09; `--pipeline default` opts down to plain SDXL).** SDXL + the canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` via `StableDiffusionXLControlNetImg2ImgPipeline` (`watermark_remover._run_controlnet` / `_load_controlnet_pipeline`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map** (`cv2.Canny(gray, 100, 200)`, 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness). The drifted cleaned face is the LEAST-AI state we can reach without re-introducing SynthID; **the library does NOT ship a face-restore extra** (every approach evaluated 2026-06-04 - 2026-06-08 -- GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps -- regenerated the face via SDXL and made it look MORE AI-generated). Full empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md` "Empirical follow-up". For production face preservation, ship the cleaned image as-is. No original pixels are copied or frozen, **BUT removal at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated against the OpenAI verifier 2026-06-04 (8 images, strength 0.10/0.15, `--max-resolution 1536`).** The survivors FLIP by content type: **photoreal** (a 9-face grid, a bracelet product photo) SURVIVES controlnet but CLEARS `default` (controlnet's dense edge map keeps the regen too close to the original, so the SynthID-destroying perturbation never happens; plain img2img perturbs photoreal texture enough); **flat graphic** (a logo/poster with large flat color fills) SURVIVES `default` but CLEARS controlnet (at low strength img2img barely changes flat fills so SynthID persists there, while controlnet repaints them more freely); a flat **text** card cleared under both. **Root cause is insufficient STRENGTH, not the pipeline: at 0.10 the low-change regions -- dense-edge photoreal under controlnet, large flat fills under `default` -- are not perturbed enough to destroy SynthID. The vendor-adaptive 0.10 from the June study is NOT universally sufficient (that study's content happened to clear at 0.10).** The robust fix is a HIGHER strength, oracle-revalidated per content type (controlnet can be cranked harder without losing structure; a lower `controlnet_conditioning_scale` also frees the regen on photoreal). So at today's default strength **both pipelines AND `--auto` can LEAVE SynthID on some content** -- a removal-priority caller (raiw.cc) MUST oracle-validate strength across content types before adopting, not pick a pipeline and assume removal. **Follow-up same day: re-running the two photoreal survivors through controlnet at an explicit `--strength 0.15` cleared BOTH on the oracle -- BUT one of them (the bracelet) had SURVIVED the SAME 0.15 controlnet config in the first pass (only the random, unset seed differed). So removal near the threshold is SEED-NON-DETERMINISTIC: the same image+pipeline+strength+resolution can pass or fail run-to-run (img2img uses `seed=None`/random unless `--seed` is passed, and there is no local SynthID detector to self-verify). 0.15 is the borderline, NOT a robust floor -- pick a strength with MARGIN (controlnet ~>= 0.20) rather than exactly on it; the content×pipeline table's 0.15 data point is near-threshold noise. A confirming run at `--strength 0.20` controlnet cleared BOTH photoreal survivors on the oracle (ladder: 0.10 grid detected → 0.15 borderline/non-deterministic → 0.20 both clean), so **0.20 is the recommended robust controlnet floor for OpenAI photoreal** (one margin run, not an N-run repeatability proof -- a service should add margin or verify repeatability since there is no local SynthID detector to self-check). **Engineering follow-up DONE 2026-06-09 (three coupled changes):** (1) **strength raised + unified** -- `resolve_strength(strength, vendor)` now applies ONE vendor-adaptive ladder (the certified controlnet floors 0.20/0.30/0.30) to BOTH pipelines; see the DEFAULT STRENGTH bullet above for why one ladder covers `sdxl`. (2) **`controlnet` is now the DEFAULT pipeline** (CLI `--pipeline` default = `controlnet` + both engine ctors). Rationale: with the certified higher ladder it clears BOTH content classes that flipped in the content-x-pipeline table (photoreal AND flat graphic), whereas plain SDXL left SynthID on flat graphics -- so controlnet is the more removal-robust default. Cost: every non-`--auto` run now downloads the canny ControlNet weights + a higher memory peak (MPS->CPU fallback covers OOM). (3) **the plain-SDXL profile was renamed `default` -> `sdxl`** (`watermark_profiles.SDXL_PROFILE`/`normalize_profile`); `default` stays as a back-compat CLI/ctor alias (the `--pipeline` Choice accepts `sdxl`/`controlnet`/`default`, a click callback `_normalize_pipeline` maps `default`->`sdxl` AND warns that `default` is deprecated). (4) **the content-detection layer + `--auto` planner were removed and `--auto` was retired to a deprecated alias for `--adaptive-polish`** -- see the dedicated `auto_config.py`-removal bullet above (controlnet is the default pipeline and the polish self-gates, so detection changed nothing). raiw.cc still needs its own per-vendor/content calibration on the GPU worker for native resolution. The Gemini-native resolution caveat stands: controlnet 0.30 is certified only <=1536.** **CERTIFIED 2026-06-04 via the isolated `raiw-controlnet-cert` Modal app (`raiw-app/modal_cert.py`), restore OFF, ≤1536, each vendor on its own oracle: controlnet floors are OpenAI 0.20 (2 photoreal × 3 seeds = 6/6 clean; the 0.15-flipper is seed-robust at 0.20) and Gemini 0.30 (0.20 detected → 0.30 clean on 2/2 seeds). OpenAI 0.20 transfers to prod (resolution-independent); Gemini 0.30 holds only ≤1536 — Gemini is resolution-sensitive and raiw.cc runs NATIVE (`max_resolution=0`), so cap Gemini ≤1536 + use 0.30, or native-calibrate (~0.35+). Prod recipe: controlnet + per-vendor floor in `resolve_strength` (not the default ladder) + FIXED seed (kills the non-determinism). **No face-restore in the library:** every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned, 2026-06-04 - 2026-06-08 cert sweeps) regenerated the face via SDXL diffusion -- the output face inherited SDXL "clean skin" gloss and lost original identity precision, looking MORE AI-generated than the cleaned image, not less. The drifted face from controlnet 0.20 is the least-AI state we can reach; for a paid service that's the prod output. See `docs/synthid-robust-identity-research-2026-06-08.md` "Empirical follow-up".** See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (certified floors table).** **Lesson: visual-quality + face-recovery validation does NOT prove watermark removal -- only the SynthID oracle does, across MULTIPLE content types; never infer removal from sharpness/identity, and never conclude from a partial result (the photoreal-only data first read as "controlnet shields, default removes" -- the flat-graphic result reversed it).** `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. diff --git a/README.md b/README.md index 9bddf9c..264dd80 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType - **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph) - **Analog Humanizer** — optional film grain and chromatic aberration post-processing -- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID. +- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID. - **Batch processing** — process entire directories - **Detection** — three-stage NCC watermark detection with confidence scoring - **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output) @@ -118,15 +118,16 @@ The removal pipeline (default profile, SDXL): image → encode to latent space (VAE) at native resolution → add controlled noise (forward diffusion) → denoise (reverse diffusion, ~50 steps; strength is vendor-adaptive: - 0.10 OpenAI / 0.15 Google / 0.15 unknown, override with --strength) + 0.20 OpenAI / 0.30 Google / 0.30 unknown, same for both pipelines; + override with --strength) → decode back to pixels (VAE) ``` - Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. The floor upscale uses Lanczos by default; `--upscaler esrgan` (the `esrgan` extra) runs Real-ESRGAN first for sharper detail and falls back to Lanczos if the extra is absent. ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it is best for photo/texture content -- it can degrade faces (the diffusion pass regenerates them, so the final recovers) and thin text; keep Lanczos for text-heavy inputs. -> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength that clears it with the least quality loss: **OpenAI gpt-image → `0.10`**, **Google Gemini → `0.15`**, **unknown source → `0.15`**. An oracle-verified June 2026 study (clean pipeline, per-image openai.com/verify or Gemini app) found OpenAI's watermark clears at `0.05` across `1024`-`1600` px (resolution-independent) while Google's is ~3x more robust and needs `0.15`. The dominant factor is the vendor, not resolution. There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine text, lower it. (Caveat: Google's `0.15` was validated on the capped `--max-resolution 1536` path; a very large native Gemini image may need more.) +> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength accordingly: **OpenAI gpt-image → `0.20`**, **Google Gemini → `0.30`**, **unknown source → `0.30`**. The **same ladder applies to both pipelines** — these are the oracle-certified `controlnet` floors (June 2026 Modal cert, multi-seed). They also cover plain `sdxl`: the two pipelines have opposite hard cases (controlnet leaves SynthID on photoreal, sdxl on flat graphics), but on its own hard case sdxl is the weaker remover, so it needs at least controlnet's strength — using one certified ladder is the safe choice (margin-based for sdxl, not separately certified). The dominant factor is the vendor (Google's SynthID is ~3x more robust). There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine detail, lower it. (Caveat: Google's `0.30` was validated only at `--max-resolution 1536`; a very large native Gemini image may need ~`0.35`+.) > -> **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically). +> **The default pipeline is `controlnet` — it preserves text and face structure.** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen. The default strength ladder (OpenAI `0.20` / Google `0.30`) is the oracle-certified controlnet floor. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically). Pass `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. > > **No face-restore extra in the library.** Every ArcFace-based regeneration approach we evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps, 2026-06-04 - 2026-06-08 Modal cert sweeps) regenerated the face via SDXL diffusion — the output face pixels were diffusion-fresh (SynthID not re-introduced), but the face inherently looked more AI-generated than the cleaned image (SDXL "clean skin" gloss, lost original identity precision). The cleaned image from the main controlnet 0.20 pass is the least-AI face state we can reach without re-introducing SynthID. Empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md`. @@ -136,7 +137,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P > **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203). -**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The library does not ship a face-restore extra (see the callout above). +**Text and face preservation** (the default pipeline; `--pipeline sdxl` opts down to plain SDXL): a canny ControlNet keeps text and face *structure* sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The library does not ship a face-restore extra (see the callout above). **Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.) @@ -292,14 +293,15 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0 # first (disable with --min-resolution 0); --upscaler esrgan uses Real-ESRGAN for # that floor upscale (needs the 'esrgan' extra). On a very large image that OOMs the # GPU/MPS, cap the long side: --max-resolution 2048 -# Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override -# with --strength. To preserve text/face structure, use --pipeline controlnet -# Or let it choose: --auto picks the pipeline and an adaptive polish -# from the image content (controlnet when there is text/structure, polish that -# restores the input's detail level while sparing text). Every choice is -# overridable: --pipeline and --no-adaptive-polish win over the auto pick. -# Experimental. -# (SDXL + canny ControlNet); tune preservation with --controlnet-scale. Add +# Strength is vendor-adaptive by default (OpenAI 0.20 / Google 0.30, same +# for both pipelines); override with --strength. controlnet (text/face +# structure preservation) is the default pipeline; --pipeline sdxl opts down +# to plain SDXL for non-structure inputs. Tune structure preservation with +# --controlnet-scale, the CFG with --guidance-scale (default 7.5), and the +# diffusion model with --model (default: SDXL base). +# --adaptive-polish (ON by default) restores the input's detail level (sparing +# text) to counter the over-smoothed look; it self-limits to a no-op where +# there is no detail deficit. Disable with --no-adaptive-polish. # Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels) # --check also flags SynthID-bearing sources: a C2PA manifest signed by @@ -312,9 +314,9 @@ remove-ai-watermarks metadata image.png --remove # Batch with a specific mode remove-ai-watermarks batch ./images/ --mode visible -# Batch also accepts --auto (and --adaptive-polish): the plan is recomputed per -# image, so a mixed directory routes each file to the right pipeline -remove-ai-watermarks batch ./images/ --mode all --auto +# Batch accepts the full invisible knob set (--strength/--guidance-scale/--model/ +# --pipeline/...); --adaptive-polish is on by default (--no-adaptive-polish to disable) +remove-ai-watermarks batch ./images/ --mode all ``` ### Python API @@ -335,6 +337,30 @@ clean = engine.remove_watermark(image) cv2.imwrite("clean.png", clean) ``` +#### Invisible removal (diffusion) + +```python +from pathlib import Path +from remove_ai_watermarks.invisible_engine import InvisibleEngine + +# pipeline: "controlnet" (default, preserves text/face structure) or "sdxl" (plain). +# model_id=None uses the SDXL base; controlnet_conditioning_scale tunes preservation. +engine = InvisibleEngine(pipeline="controlnet") + +engine.remove_watermark( + Path("watermarked.png"), + Path("clean.png"), + strength=None, # None = vendor-adaptive default (OpenAI 0.20 / Google 0.30) + num_inference_steps=50, + guidance_scale=None, # None = the library default (7.5) + seed=None, # set for reproducible output + adaptive_polish=True, # detail-targeted polish, self-gating (default on in the CLI) + min_resolution=1024, # upscale tiny inputs to this floor before diffusion + max_resolution=0, # 0 = native; set only to cap GPU/MPS memory + upscaler="lanczos", # or "esrgan" for the floor upscale (needs the 'esrgan' extra) +) +``` + ### Metadata stripping ```python diff --git a/docs/synthid.md b/docs/synthid.md index e404608..b42bd6c 100644 --- a/docs/synthid.md +++ b/docs/synthid.md @@ -382,12 +382,10 @@ the payload, reconstituting SynthID in text. The lesson held and shaped the current design: **content is preserved by REGENERATING it under structural conditioning, never by copying original pixels.** -Both preservation features below are **EXPERIMENTAL and opt-in (off by default)**; -the plain `default` SDXL img2img pass is the shippable path. - -- **Text + structure:** `--pipeline controlnet` (SDXL img2img + a canny ControlNet, - experimental/opt-in) conditions the regeneration on the edge map, so text and - structure stay sharp while every pixel is still regenerated. Text legibility is +- **Text + structure:** `--pipeline controlnet` (SDXL img2img + a canny ControlNet) is + **THE DEFAULT pipeline since 2026-06-09** (`--pipeline default` opts down to plain + SDXL img2img for inputs without text/faces). It conditions the regeneration on the + edge map, so text and structure stay sharp while every pixel is still regenerated. Text legibility is better than plain img2img at the same strength (text stays readable where plain garbles it). **BUT removal efficacy at the low vendor-adaptive strength is CONTENT × PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated @@ -407,7 +405,13 @@ the plain `default` SDXL img2img pass is the shippable path. removal guarantee at today's strength -- pick by what you must PRESERVE (controlnet for text/structure), then raise strength until the oracle reads clean. (The earlier "reads clean on the oracle" claim held only for the one flat/text-background case it - was checked on; it does not generalize.) + was checked on; it does not generalize.) **UPDATE 2026-06-09: the default strengths + were raised and made pipeline-aware (controlnet ladder = the certified + 0.20/0.30/0.30 floors, applied to BOTH pipelines as a single ladder -- see §5.2 for + why one ladder covers plain `sdxl` too) and controlnet is now the default pipeline. + The plain-SDXL profile was also renamed `default` -> `sdxl` (`default` stays as an + alias). The 0.10/0.15 numbers in this analysis are the PRE-raise values it was + measured at. See §5.2.** - **Face identity:** canny holds face *structure* but not *identity*. Shipped as the optional `--restore-faces` GFPGAN post-pass (`face_restore.py`, the `restore` extra, experimental/opt-in, off by default). It runs GFPGAN on the ORIGINAL @@ -448,14 +452,25 @@ study (section 2.2) gives empirical floors: resolution stack). Use a GPU or `--max-resolution 1536`. The default is **vendor-adaptive** (`watermark_profiles.resolve_strength` + -`vendor_for_strength`): the tool reads the C2PA issuer on the original input and -picks `OPENAI_STRENGTH` 0.10 / `GEMINI_STRENGTH` 0.15 / `UNKNOWN_STRENGTH` 0.15. -This uses the vendor signal we DO have locally (the C2PA SynthID proxy) to avoid -the overkill of a single high default on OpenAI images, without needing a local -pixel detector. An explicit `--strength` always wins. If the watermark still -survives (e.g. a large native Gemini beyond the capped-1536 validation), raise -toward 0.30 then 0.35-0.40 (0.40 visibly corrupts dense text), using the lowest -value that reads clean on the oracle. +`vendor_for_strength`): the tool reads the C2PA issuer on the original input and picks +`OPENAI_STRENGTH` 0.20 / `GEMINI_STRENGTH` 0.30 / `UNKNOWN_STRENGTH` 0.30. **The SAME +ladder applies to both pipelines** (`sdxl` and `controlnet`) -- these are the +oracle-certified controlnet floors (§5.5, the 2026-06-04 Modal cert). Why one ladder +covers plain `sdxl` too: the certification was run on controlnet and does NOT transfer +by symmetry (the two pipelines have OPPOSITE hard cases -- controlnet leaves SynthID on +photoreal, `sdxl` on flat graphics, the §5.1 content-x-pipeline table), BUT on its own +hard case (flat fills) `sdxl` is the WEAKER remover (plain img2img barely perturbs a +flat region at low strength), so it needs AT LEAST controlnet's strength -- the +certified floor is therefore the right floor for `sdxl` too. This is a MARGIN argument +for `sdxl`, not a separate certification (no local SynthID detector to self-verify). +The higher strength costs little quality where it matters, because `controlnet` is now +the default pipeline, so `sdxl` is reached only via an explicit `--pipeline sdxl` (a +deliberate opt-down), where over-regeneration has no faces/text to damage. +This uses the vendor signal we DO have locally (the C2PA SynthID proxy) to avoid the +overkill of a single high default on OpenAI images, without needing a local pixel +detector. An explicit `--strength` always wins. If the watermark still survives (e.g. a +large native Gemini beyond the capped-1536 validation), raise toward 0.35-0.40 (0.40 +visibly corrupts dense text), using the lowest value that reads clean on the oracle. ### 5.3 Test methodology diff --git a/src/remove_ai_watermarks/assets/face_detection_yunet_2023mar.onnx b/src/remove_ai_watermarks/assets/face_detection_yunet_2023mar.onnx deleted file mode 100644 index f9beb30..0000000 Binary files a/src/remove_ai_watermarks/assets/face_detection_yunet_2023mar.onnx and /dev/null differ diff --git a/src/remove_ai_watermarks/assets/text_detection_ppocrv3_2023may.onnx b/src/remove_ai_watermarks/assets/text_detection_ppocrv3_2023may.onnx deleted file mode 100644 index baaeabb..0000000 Binary files a/src/remove_ai_watermarks/assets/text_detection_ppocrv3_2023may.onnx and /dev/null differ diff --git a/src/remove_ai_watermarks/auto_config.py b/src/remove_ai_watermarks/auto_config.py deleted file mode 100644 index 06ca06b..0000000 --- a/src/remove_ai_watermarks/auto_config.py +++ /dev/null @@ -1,270 +0,0 @@ -"""Automatic pipeline planning for the ``--auto`` quality mode. - -``plan(image_path)`` inspects the INPUT image (before the diffusion model loads) -and returns the quality modes to use, so the pipeline can adapt to content. It is -meant to run as the FIRST step of the invisible/all pipeline, wherever that pipeline -runs (locally, or the raiw.cc Modal GPU worker) -- never on a memory-constrained web -host (image work there OOM-crashes the container). - -Routing is **quality-priority**: ControlNet (text/face-structure preservation) is the -default; it is only skipped for a clearly structure-less image (no face, no text, -near-zero edges), where plain SDXL is cheaper and just as good. A detected face only -routes to controlnet (canny preserves face STRUCTURE, not identity); there is no -identity restoration -- the whole face-restore family was removed (it regenerated the -face via SDXL and looked MORE AI-generated, see -docs/synthid-robust-identity-research-2026-06-08.md). When the controlnet smoothing -pass ran, the **adaptive polish** (``humanizer.adaptive_polish``) restores the input's -detail level -- a capped unsharp + edge-masked grain targeting the input's Laplacian -variance -- to counter the over-smoothed "AI look". It is self-limiting on -text/graphics (already high-frequency, so almost no polish) and spares text/edges by -masking the grain. - -Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for -faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- DBNet (PP-OCRv3 -differentiable-binarization via ``cv2.dnn.TextDetectionModel_DB``, a 2.4 MB Apache-2.0 -model bundled in ``assets/``) for text, and a Canny ``edge_density``. The whole planner -peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs -anywhere the pipeline runs. - -The text detector falls back to the old MSER region heuristic if the DBNet model can't -load. Either way text only ever ADDS controlnet, so a miss is backstopped by the -edge-density route and a false positive only costs a controlnet run. -""" - -# cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules -# for this file only. -# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportOptionalMemberAccess=false, reportOptionalCall=false, reportOptionalSubscript=false, reportOptionalOperand=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false, reportPrivateUsage=false, reportInvalidTypeForm=false, reportConstantRedefinition=false, reportUnnecessaryComparison=false -from __future__ import annotations - -import logging -from dataclasses import dataclass -from pathlib import Path -from typing import TYPE_CHECKING, Any - -if TYPE_CHECKING: - from numpy.typing import NDArray - -logger = logging.getLogger(__name__) - -# ── Routing thresholds (tunable; quality-priority -> controlnet unless clearly flat) ── -# Canny edge-density below this, AND no face AND no text -> plain SDXL (nothing to -# preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat -# gradient/solid image falls under 0.008. -_STRUCTURELESS_EDGE_MAX = 0.008 -# MSER regions per megapixel above this -> likely text. The MSER path is now only the -# FALLBACK when the bundled DBNet model can't load; DBNet (below) is the primary text -# detector. Rough heuristic: a no-text portrait measures a few hundred/MP, dense text -# far more. Set high so it rarely false-fires; text only ever ADDS controlnet. -_TEXT_MSER_PER_MP = 1500.0 -_FACE_SCORE = 0.6 # YuNet confidence for a face to count -# Downscale the long side to this for DETECTION only (faces stay detectable down to -# ~10px, and this bounds YuNet/DBNet/MSER cost on huge inputs). Removal runs at full res. -_DETECT_MAX_SIDE = 1024 - -# DBNet (PP-OCRv3 differentiable-binarization) text-region detector via cv2.dnn -- the -# primary "has meaningful text" signal. The model is the shared PP-OCRv3 detection net -# from OpenCV Zoo (Apache-2.0); en/cn variants are byte-identical, so it is bundled -# language-neutral. cv2.dnn is core OpenCV, so this adds NO new pip dependency. -_DBNET_ASSET = "text_detection_ppocrv3_2023may.onnx" # Apache-2.0 (OpenCV Zoo PP-OCRv3 DB) -_DBNET_BINARY_THRESHOLD = 0.3 -_DBNET_POLYGON_THRESHOLD = 0.5 -_DBNET_MAX_CANDIDATES = 200 -_DBNET_UNCLIP_RATIO = 2.0 -_DBNET_INPUT_SIDE = 736 # square input, multiple of 32 (PP-OCRv3 default) -_DBNET_MEAN = (122.67891434, 116.66876762, 104.00698793) # ImageNet mean * 255 -_dbnet: Any = None # lazy singleton; set to False after a load failure (-> MSER fallback) - -# When the controlnet smoothing pass ran, the adaptive polish -# (humanizer.adaptive_polish) restores the input's detail level, sparing text -- -# replacing the old fixed unsharp/grain which over-/under-corrected and speckled text. -_UPSCALE_FLOOR = 1024 - -_YUNET_ASSET = "face_detection_yunet_2023mar.onnx" # MIT (Shiqi Yu), OpenCV Zoo -_yunet: Any = None # lazy singleton - - -@dataclass(frozen=True) -class AutoConfig: - """Resolved quality modes from content analysis (the ``--auto`` plan).""" - - pipeline: str # "default" | "controlnet" - adaptive_polish: bool # restore the input's detail level (sharpen + masked grain), sparing text - unsharp: float # fixed-polish knobs, 0 in auto (the adaptive polish replaces them) - humanize: float - min_resolution: int - # signals retained for logging / debugging a bad pick - has_face: bool - has_text: bool - edge_density: float - width: int - height: int - - @property - def reason(self) -> str: - """One-line human-readable summary of the plan (logged per image).""" - bits = ["face" if self.has_face else "no-face"] - if self.has_text: - bits.append("text") - bits.append(f"edges={self.edge_density:.3f}") - if self.adaptive_polish: - polish = ", adaptive polish" - elif self.unsharp or self.humanize: - polish = f", unsharp {self.unsharp}/grain {self.humanize}" - else: - polish = "" - return f"{'+'.join(bits)} -> {self.pipeline} pipeline{polish}" - - -def _to_bgr(image: NDArray[Any]) -> NDArray[Any]: - """Normalize a 2D grayscale or 4-channel BGRA array to 3-channel BGR.""" - import cv2 - - if image.ndim == 2: - return cv2.cvtColor(image, cv2.COLOR_GRAY2BGR) - if image.shape[2] == 4: - return cv2.cvtColor(image, cv2.COLOR_BGRA2BGR) - return image - - -def _to_gray(image: NDArray[Any]) -> NDArray[Any]: - """Single-channel grayscale; passes a 2D (already-gray) input through unchanged.""" - import cv2 - - if image.ndim == 3 and image.shape[2] >= 3: - return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) - return image - - -def _downscale_for_detection(image: NDArray[Any]) -> NDArray[Any]: - """Shrink the long side to ``_DETECT_MAX_SIDE`` for cheap, bounded detection.""" - import cv2 - - h, w = image.shape[:2] - long_side = max(h, w) - if long_side <= _DETECT_MAX_SIDE: - return image - scale = _DETECT_MAX_SIDE / long_side - return cv2.resize(image, (max(1, round(w * scale)), max(1, round(h * scale))), interpolation=cv2.INTER_AREA) - - -def detect_face(image: NDArray[Any]) -> bool: - """True if OpenCV YuNet finds at least one face. cv2-only, torch-free.""" - import cv2 - - global _yunet - img = _to_bgr(image) - h, w = img.shape[:2] - if h < 1 or w < 1: - return False - try: - if _yunet is None: - model = Path(__file__).parent / "assets" / _YUNET_ASSET - _yunet = cv2.FaceDetectorYN.create(str(model), "", (w, h), _FACE_SCORE, 0.3, 5000) - _yunet.setInputSize((w, h)) - _, faces = _yunet.detect(img) - except cv2.error as e: # malformed input / model - logger.debug("YuNet face detect failed (%s); assuming no face", e) - return False - return faces is not None and len(faces) > 0 - - -def _detect_text_dbnet(image: NDArray[Any]) -> bool | None: - """DBNet (PP-OCRv3) text-region presence via cv2.dnn. - - Returns True/False on a successful run, or None if the bundled model can't load - (the caller then falls back to the MSER heuristic). Loads once, lazily. - """ - import cv2 - - global _dbnet - if _dbnet is False: # a prior load failed; skip straight to the MSER fallback - return None - img = _to_bgr(image) - h, w = img.shape[:2] - if h < 1 or w < 1: - return False - try: - if _dbnet is None: - model = Path(__file__).parent / "assets" / _DBNET_ASSET - net = cv2.dnn.TextDetectionModel_DB(str(model)) - net.setBinaryThreshold(_DBNET_BINARY_THRESHOLD) - net.setPolygonThreshold(_DBNET_POLYGON_THRESHOLD) - net.setMaxCandidates(_DBNET_MAX_CANDIDATES) - net.setUnclipRatio(_DBNET_UNCLIP_RATIO) - net.setInputParams(1.0 / 255.0, (_DBNET_INPUT_SIDE, _DBNET_INPUT_SIDE), _DBNET_MEAN) - _dbnet = net - boxes, _ = _dbnet.detect(img) - except Exception as e: # model load / inference can raise cv2.error or others - logger.debug("DBNet text detect failed (%s); falling back to MSER", e) - _dbnet = False - return None - return boxes is not None and len(boxes) > 0 - - -def _detect_text_mser(image: NDArray[Any]) -> bool: - """Fallback MSER-based text-presence heuristic (used only if DBNet can't load).""" - import cv2 - - gray = _to_gray(image) - h, w = gray.shape[:2] - try: - regions, _ = cv2.MSER_create().detectRegions(gray) - except cv2.error: - return False - per_mp = len(regions) / max(1e-6, (h * w) / 1e6) - return per_mp > _TEXT_MSER_PER_MP - - -def detect_text(image: NDArray[Any]) -> bool: - """Text-presence: DBNet (cv2.dnn) when the bundled model loads, else the MSER heuristic.""" - dbnet = _detect_text_dbnet(image) - return _detect_text_mser(image) if dbnet is None else dbnet - - -def edge_density(image: NDArray[Any]) -> float: - """Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1].""" - import cv2 - - gray = _to_gray(image) - edges = cv2.Canny(gray, 100, 200) - return float((edges > 0).mean()) - - -def plan(image_path: Path) -> AutoConfig | None: - """Inspect the input image and return the quality modes, or None if unreadable. - - Pure analysis: loads the image, runs the cv2 detectors on a downscaled copy, and - applies the quality-priority routing rules. Safe to call wherever the pipeline - runs; no diffusion model is loaded. - """ - from remove_ai_watermarks import image_io - - image = image_io.imread(image_path) - if image is None: - return None - - h, w = image.shape[:2] - small = _downscale_for_detection(image) - gray = _to_gray(small) # convert once; edge density + the MSER fallback use gray - has_face = detect_face(small) # YuNet needs the 3-channel image - has_text = detect_text(small) # DBNet wants BGR; the MSER fallback grays it internally - edges = edge_density(gray) - - structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX - pipeline = "default" if structureless else "controlnet" - smoothing = pipeline == "controlnet" - - cfg = AutoConfig( - pipeline=pipeline, - adaptive_polish=smoothing, # adaptive (detail-targeted) polish when a smoothing pass ran - unsharp=0.0, - humanize=0.0, - min_resolution=_UPSCALE_FLOOR, - has_face=has_face, - has_text=has_text, - edge_density=edges, - width=w, - height=h, - ) - logger.debug("auto plan for %s: %s", image_path, cfg.reason) - return cfg diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index 9e7b3fa..fd2e435 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -18,7 +18,11 @@ from typing import TYPE_CHECKING, Any, Literal import click from remove_ai_watermarks import __version__, watermark_registry -from remove_ai_watermarks.noai.watermark_profiles import resolve_strength, vendor_for_strength +from remove_ai_watermarks.noai.watermark_profiles import ( + resolve_strength, + strength_default_help, + vendor_for_strength, +) if TYPE_CHECKING: from collections.abc import Generator @@ -143,8 +147,8 @@ _controlnet_scale_option = click.option( "--controlnet-scale", type=float, default=1.0, - help="ControlNet conditioning scale (structure/text preservation strength), controlnet pipeline " - "only (EXPERIMENTAL).", + help="ControlNet conditioning scale (structure/text preservation strength); " + "applies to the controlnet pipeline (the default). Higher = closer to original structure.", ) _min_resolution_option = click.option( @@ -173,48 +177,103 @@ _auto_option = click.option( "--auto", is_flag=True, default=False, - help="Auto-pick the pipeline and adaptive polish from image content. " - "Every choice is overridable -- an explicit --pipeline / --adaptive-polish " - "always wins. EXPERIMENTAL.", + help="DEPRECATED: controlnet is already the default pipeline, so --auto now only " + "enables --adaptive-polish (the content detectors were removed). Use " + "--adaptive-polish instead.", ) _adaptive_polish_option = click.option( "--adaptive-polish/--no-adaptive-polish", - default=False, + default=True, help="Restore the input's detail level after removal (capped unsharp + edge-masked grain " - "targeting the input's sharpness, sparing text). On by default under --auto; pass " - "--no-adaptive-polish to disable it there, or --adaptive-polish to use it without --auto. " - "Independent of the fixed --unsharp/--humanize. EXPERIMENTAL.", + "targeting the input's sharpness, sparing text), countering the over-smoothed look. ON by " + "default; it self-limits where there is no detail deficit (text/flat graphics), so it is a " + "no-op there. Pass --no-adaptive-polish to disable. Independent of --unsharp/--humanize.", +) + +# HuggingFace model + CFG knobs, shared by the diffusion commands (invisible/all/batch) +# so the surface stays identical across them. +_model_option = click.option( + "--model", + type=str, + default=None, + help="HuggingFace model ID for the diffusion pipeline. Default: the SDXL base checkpoint.", +) +_guidance_scale_option = click.option( + "--guidance-scale", + type=float, + default=None, + help="Classifier-free guidance scale (CFG). Default: 7.5 (the library default). " + "Lower = follow the prompt less / stay closer to the input.", ) -def _apply_auto( - ctx: click.Context, - source: Path, - pipeline: str, - adaptive_polish: bool, -) -> tuple[str, bool]: - """Resolve ``--auto``: plan the three content-adaptive modes (pipeline, face - restore, adaptive polish) from the image, overriding only the ones the user left - at their default (an explicit flag always wins). The fixed ``--unsharp``/ - ``--humanize`` filters are independent and untouched. Prints the chosen plan. +def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str | None) -> str | None: + """Resolve the legacy ``default`` profile name to ``sdxl`` (click option callback). + + Emits a one-line deprecation notice when the user explicitly passes the outdated + ``default`` value, pointing at the two current choices (``sdxl`` / ``controlnet``). """ - from remove_ai_watermarks import auto_config + if value is None: + return None + from remove_ai_watermarks.noai.watermark_profiles import normalize_profile - cfg = auto_config.plan(source) - if cfg is None: - console.print(" Auto: could not read image; using defaults") - return pipeline, adaptive_polish + normalized = normalize_profile(value) + if value.strip().lower() == "default": + click.echo( + "Warning: --pipeline default is deprecated and maps to 'sdxl'. " + "Use --pipeline sdxl (plain SDXL) or --pipeline controlnet (the default).", + err=True, + ) + return normalized - def _is_default(name: str) -> bool: - return ctx.get_parameter_source(name) == click.core.ParameterSource.DEFAULT - if _is_default("pipeline"): - pipeline = cfg.pipeline - if _is_default("adaptive_polish"): - adaptive_polish = cfg.adaptive_polish - console.print(f" Auto: {cfg.reason}") - return pipeline, adaptive_polish +# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the +# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl`` +# (warned + normalized away by _normalize_pipeline). +_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"] +_PIPELINE_HELP = ( + "Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves " + "text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img " + "(lighter, no extra model download, but leaves SynthID on flat-graphic content). " + "('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)" +) + +# Shared --pipeline / --strength decorators so the three diffusion commands +# (invisible/all/batch) keep an identical surface and the strength help can never +# drift from the watermark_profiles constants (strength_default_help derives it). +_pipeline_option = click.option( + "--pipeline", + type=click.Choice(_PIPELINE_CHOICES), + default="controlnet", + callback=_normalize_pipeline, + help=_PIPELINE_HELP, +) +_strength_option = click.option( + "--strength", + type=float, + default=None, + help=f"Denoising strength (0.0-1.0). Default: {strength_default_help()}.", +) + + +def _resolve_auto_polish(auto: bool, adaptive_polish: bool) -> bool: + """Warn on the retired ``--auto`` flag, returning ``adaptive_polish`` unchanged. + + ``--auto`` used to plan the pipeline + polish from content detection, but the + pipeline is now always controlnet (the default) and the adaptive polish is ON by + default (it self-gates by detail level), so the content detectors were removed and + ``--auto`` is now a no-op alias: the polish it used to enable is already the default, + and an explicit ``--no-adaptive-polish`` still wins. So it only emits a deprecation + warning and passes ``adaptive_polish`` through. + """ + if auto: + click.echo( + "Warning: --auto is deprecated and now does nothing (the adaptive polish it " + "enabled is ON by default). Use --no-adaptive-polish to turn the polish off.", + err=True, + ) + return adaptive_polish def _warn_if_esrgan_unavailable(upscaler: str) -> None: @@ -524,21 +583,9 @@ def cmd_erase( @click.option( "-o", "--output", type=click.Path(path_type=Path), default=None, help="Output path (default: _clean.)." ) -@click.option( - "--strength", - type=float, - default=None, - help="Denoising strength (0.0-1.0). Default: vendor-adaptive (OpenAI 0.10 / Google 0.15 / " - "unknown 0.15, from the C2PA issuer).", -) +@_strength_option @click.option("--steps", type=int, default=50, help="Number of denoising steps. Default: 50.") -@click.option( - "--pipeline", - type=click.Choice(["default", "controlnet"]), - default="default", - help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves " - "text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).", -) +@_pipeline_option @click.option( "--device", type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]), @@ -560,6 +607,8 @@ def cmd_erase( @_min_resolution_option @_unsharp_option @_upscaler_option +@_model_option +@_guidance_scale_option @_auto_option @_adaptive_polish_option @click.pass_context @@ -579,6 +628,8 @@ def cmd_invisible( min_resolution: int, controlnet_scale: float, upscaler: str, + model: str | None, + guidance_scale: float | None, auto: bool, adaptive_polish: bool, ) -> None: @@ -599,8 +650,7 @@ def cmd_invisible( source = _validate_image(source) _warn_if_esrgan_unavailable(upscaler) - if auto: - pipeline, adaptive_polish = _apply_auto(ctx, source, pipeline, adaptive_polish) + adaptive_polish = _resolve_auto_polish(auto, adaptive_polish) if output is None: output = source.with_stem(source.stem + "_clean") @@ -610,6 +660,7 @@ def cmd_invisible( console.print(f" {msg}") engine = InvisibleEngine( + model_id=model, device=device_str, pipeline=pipeline, hf_token=hf_token, @@ -630,7 +681,7 @@ def cmd_invisible( output_path=output, strength=strength, num_inference_steps=steps, - guidance_scale=None, + guidance_scale=guidance_scale, seed=seed, humanize=humanize, unsharp=unsharp, @@ -781,21 +832,10 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo @click.option( "--inpaint-method", type=click.Choice(["ns", "telea", "gaussian"]), default="ns", help="Inpainting method." ) -@click.option( - "--strength", - type=float, - default=None, - help="Invisible watermark denoising strength. Default: vendor-adaptive (OpenAI 0.10 / Google 0.15 / unknown 0.15).", -) +@_strength_option @click.option("--steps", type=int, default=50, help="Number of denoising steps for invisible removal.") -@click.option( - "--pipeline", - type=click.Choice(["default", "controlnet"]), - default="default", - help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves " - "text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).", -) -@click.option("--model", type=str, default=None, help="HuggingFace model ID for invisible removal.") +@_pipeline_option +@_model_option @click.option( "--device", type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]), @@ -817,6 +857,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo @_min_resolution_option @_unsharp_option @_upscaler_option +@_guidance_scale_option @_auto_option @_adaptive_polish_option @click.pass_context @@ -839,6 +880,7 @@ def cmd_all( min_resolution: int, controlnet_scale: float, upscaler: str, + guidance_scale: float | None, auto: bool, adaptive_polish: bool, ) -> None: @@ -856,8 +898,7 @@ def cmd_all( _banner() source = _validate_image(source) _warn_if_esrgan_unavailable(upscaler) - if auto: - pipeline, adaptive_polish = _apply_auto(ctx, source, pipeline, adaptive_polish) + adaptive_polish = _resolve_auto_polish(auto, adaptive_polish) if output is None: output = source.with_stem(source.stem + "_clean") @@ -937,6 +978,7 @@ def cmd_all( output_path=tmp_path, strength=strength, num_inference_steps=steps, + guidance_scale=guidance_scale, seed=seed, humanize=humanize, unsharp=unsharp, @@ -1001,7 +1043,8 @@ def _process_batch_image( min_resolution: int = 1024, controlnet_scale: float = 1.0, upscaler: str = "lanczos", - auto: bool = False, + model: str | None = None, + guidance_scale: float | None = None, adaptive_polish: bool = False, ) -> None: """Process a single image for batch mode. @@ -1048,14 +1091,12 @@ def _process_batch_image( if invisible_available(): from remove_ai_watermarks.invisible_engine import InvisibleEngine - # --auto re-plans the pipeline / face-restore / polish per image; only the - # pipeline choice changes the engine ctor, so cache one engine per pipeline - # (controlnet vs default) rather than a single shared instance. - if auto: - pipeline, adaptive_polish = _apply_auto(ctx, img_path, pipeline, adaptive_polish) + # Cache the engine in ctx.obj so the batch builds it once (pipeline is a + # single CLI value, constant across the run). engines = ctx.obj.setdefault("_inv_engines", {}) if pipeline not in engines: engines[pipeline] = InvisibleEngine( + model_id=model, device=None if device == "auto" else device, pipeline=pipeline, hf_token=hf_token, @@ -1067,6 +1108,7 @@ def _process_batch_image( out_path, strength=strength, num_inference_steps=steps, + guidance_scale=guidance_scale, seed=seed, humanize=humanize, unsharp=unsharp, @@ -1104,19 +1146,13 @@ def _process_batch_image( @click.option( "--mode", type=click.Choice(["visible", "invisible", "metadata", "all"]), default="visible", help="Processing mode." ) -@click.option("--strength", type=float, default=None, help="Denoising strength (invisible mode).") +@_strength_option @click.option("--steps", type=int, default=50, help="Number of denoising steps (invisible mode).") @click.option("--inpaint/--no-inpaint", default=True, help="Apply inpainting (visible mode).") @click.option( "--humanize", type=float, default=0.0, help="Analog Humanizer film grain intensity (0 = off, typical: 2.0-6.0)." ) -@click.option( - "--pipeline", - type=click.Choice(["default", "controlnet"]), - default="default", - help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves " - "text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).", -) +@_pipeline_option @click.option( "--device", type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]), @@ -1135,6 +1171,8 @@ def _process_batch_image( @_unsharp_option @_upscaler_option @_controlnet_scale_option +@_model_option +@_guidance_scale_option @_auto_option @_adaptive_polish_option @click.pass_context @@ -1156,6 +1194,8 @@ def cmd_batch( min_resolution: int, controlnet_scale: float, upscaler: str, + model: str | None, + guidance_scale: float | None, auto: bool, adaptive_polish: bool, ) -> None: @@ -1177,6 +1217,7 @@ def cmd_batch( console.print(f" Mode: {mode}") if mode in ("invisible", "all"): _warn_if_esrgan_unavailable(upscaler) + adaptive_polish = _resolve_auto_polish(auto, adaptive_polish) processed = 0 errors = 0 @@ -1214,7 +1255,8 @@ def cmd_batch( min_resolution=min_resolution, controlnet_scale=controlnet_scale, upscaler=upscaler, - auto=auto, + model=model, + guidance_scale=guidance_scale, adaptive_polish=adaptive_polish, ) processed += 1 diff --git a/src/remove_ai_watermarks/invisible_engine.py b/src/remove_ai_watermarks/invisible_engine.py index fecf3a7..d758f7b 100644 --- a/src/remove_ai_watermarks/invisible_engine.py +++ b/src/remove_ai_watermarks/invisible_engine.py @@ -89,7 +89,7 @@ class InvisibleEngine: self, model_id: str | None = None, device: str | None = None, - pipeline: str = "default", + pipeline: str = "controlnet", hf_token: str | None = None, progress_callback: Callable[[str], None] | None = None, controlnet_conditioning_scale: float = 1.0, @@ -99,9 +99,10 @@ class InvisibleEngine: Args: model_id: HuggingFace model ID. None = use the SDXL base default. device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto. - pipeline: Pipeline profile. "default" (plain SDXL img2img) or - "controlnet" (SDXL + canny ControlNet that preserves text/face - structure via edge conditioning while removing SynthID). + pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet + that preserves text/face structure via edge conditioning while removing + SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on + flat-graphic content). "default" is a back-compat alias for "sdxl". hf_token: HuggingFace API token. progress_callback: Optional callback for progress messages. controlnet_conditioning_scale: ControlNet structure-preservation @@ -182,12 +183,11 @@ class InvisibleEngine: unsharp: Final unsharp-mask sharpening strength (0 = off, default). Applied last to counter the soft / over-smoothed look of the diffusion pass; ~0.5-0.8 is a safe range, higher risks edge halos. - adaptive_polish: When True (the --auto mode default), restore the input's - detail level in the softened output instead of fixed unsharp/humanize: - a capped unsharp + edge-masked grain targeting the input's Laplacian - variance (self-limiting on text/graphics). Runs LAST, after face - restoration. The fixed ``humanize``/``unsharp`` knobs are normally 0 - when this is on. + adaptive_polish: When True (the CLI default), restore the input's detail + level in the softened output: a capped unsharp + edge-masked grain + targeting the input's Laplacian variance. Self-limiting -- a no-op when + the output already meets the input's detail level (text/flat graphics), + so it only acts on over-smoothed photo/face texture. Runs LAST. max_resolution: Cap the long side (px) before diffusion. 0 (default) = no cap. Set a positive value only to bound GPU/MPS memory on very large inputs (it reintroduces a lossy downscale->upscale @@ -316,8 +316,8 @@ class InvisibleEngine: self._progress_callback(f"Sharpening (unsharp mask: {unsharp})...") image_io.imwrite(out_path, unsharp_mask(out_cv, amount=unsharp)) - # Adaptive polish (--auto): restore the input's detail level in the softened - # output, sparing text/edges. Replaces the fixed unsharp/humanize knobs. + # Adaptive polish (CLI default): restore the input's detail level in the + # softened output, sparing text/edges. Self-limiting where there is no deficit. if adaptive_polish: import cv2 import numpy as np diff --git a/src/remove_ai_watermarks/noai/watermark_profiles.py b/src/remove_ai_watermarks/noai/watermark_profiles.py index d2a0e51..153ee20 100644 --- a/src/remove_ai_watermarks/noai/watermark_profiles.py +++ b/src/remove_ai_watermarks/noai/watermark_profiles.py @@ -12,34 +12,56 @@ if TYPE_CHECKING: DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0" +# Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img +# profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's +# name before ``controlnet`` became the default-selected pipeline, 2026-06-09). +SDXL_PROFILE = "sdxl" +CONTROLNET_PROFILE = "controlnet" +_PROFILE_ALIASES = {"default": SDXL_PROFILE} + + +def normalize_profile(profile: str) -> str: + """Canonicalize a pipeline-profile name, resolving the ``default`` -> ``sdxl`` alias.""" + normalized = profile.strip().lower() + return _PROFILE_ALIASES.get(normalized, normalized) + + # The SDXL-native canny ControlNet used by the ``controlnet`` pipeline. The # ControlNet is an add-on to the SDXL base checkpoint (DEFAULT_MODEL_ID), not a -# separate base model, so both the ``default`` and ``controlnet`` profiles load -# the same base weights and share the same vendor-adaptive strength. +# separate base model, so both the ``sdxl`` and ``controlnet`` profiles load the +# same base weights and share the same vendor-adaptive strength ladder (see below). CONTROLNET_CANNY_MODEL = "xinsir/controlnet-canny-sdxl-1.0" # Vendor-adaptive default denoising strength for the SDXL img2img scrub, overridable # from the CLI (`--strength`). The right strength depends on which vendor's SynthID is -# present, detected from the C2PA issuer (metadata.synthid_source). Oracle-verified -# controlled study (2026-06-01, clean v0.8.6, per-image openai.com/verify or Gemini-app -# verdict; see docs/synthid.md section 2.2): -# - OpenAI gpt-image: removed at 0.05 across 1024-1600 (n=4), resolution-independent. -# OPENAI_STRENGTH 0.10 = the 0.05 floor plus a 2x margin (keeps quality high). -# - Google Gemini: removed at 0.15 on the capped-1536 path (n=4); 0.05/0.10 do NOT -# clear. GEMINI_STRENGTH 0.15. CAVEAT: 0.15 was validated only on -# `--max-resolution 1536`; native 2816 (the default path) was not locally -# measurable (OOM on Apple Silicon) and may need more -- pending GPU validation on -# the raiw.cc backend. If a native large Gemini still verifies positive at 0.15, -# raise `--strength`. -# - Unknown vendor (metadata stripped, or non-OpenAI/Google C2PA): UNKNOWN_STRENGTH -# 0.15, the safe middle that clears both vendors at the tested resolutions. -# The dominant factor is VENDOR, not resolution: Google's SynthID is ~3x more robust -# than OpenAI's. The ``controlnet`` pipeline shares these strengths (same SDXL base; the -# canny ControlNet only preserves structure, the strength still drives removal). -OPENAI_STRENGTH = 0.10 -GEMINI_STRENGTH = 0.15 -UNKNOWN_STRENGTH = 0.15 -# Backwards-compatible alias: the vendor-unknown default (what a caller gets without a +# present (detected from the C2PA issuer, metadata.synthid_source). The SAME ladder +# applies to BOTH pipelines (`sdxl` plain img2img and `controlnet`) -- see "why one +# ladder" below. +# +# Data basis (see docs/synthid.md sections 2.2 / 5.5): the values are the ORACLE- +# CERTIFIED controlnet floors (2026-06-04, isolated Modal cert app, each vendor on its +# own verifier): OpenAI 0.20 (2 photoreal x 3 seeds = 6/6 clean, resolution-independent), +# Google 0.30 (clean on 2/2 seeds, validated ONLY at <= 1536 -- Gemini is resolution- +# sensitive, native ~2816 likely needs ~0.35+). Unknown vendor gets the Google (more +# robust watermark) value: safe-by-default. +# +# Why ONE ladder for both pipelines (2026-06-09): the certification was run on +# controlnet, and it does NOT transfer to `sdxl` by symmetry -- the two pipelines have +# OPPOSITE hard cases (controlnet leaves SynthID on photoreal, `sdxl` leaves it on flat +# graphics; the content-x-pipeline table in docs/synthid.md §5.1). BUT on its OWN hard +# case (flat fills) `sdxl` is the WEAKER remover -- plain img2img at low strength barely +# perturbs a flat region -- so it needs AT LEAST as much strength as controlnet, not +# less. Hence the certified controlnet floor is the right floor for `sdxl` too. The +# higher strength costs little quality where it matters: `controlnet` is now the default +# pipeline, so `sdxl` is reached only for structure-less inputs (via `--auto`) or an +# explicit `--pipeline sdxl`, where over-regeneration has no faces/text to damage. NOTE: +# this is a MARGIN argument for `sdxl`, not a fresh certification -- there is no local +# SynthID detector, so if an oracle still reads SynthID on a flat `sdxl` output, raise +# `--strength`. +OPENAI_STRENGTH = 0.20 +GEMINI_STRENGTH = 0.30 +UNKNOWN_STRENGTH = 0.30 +# Backwards-compatible alias: the vendor-unknown value (what a caller gets without a # detected vendor). Kept as DEFAULT_STRENGTH for existing references. DEFAULT_STRENGTH = UNKNOWN_STRENGTH @@ -47,17 +69,29 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH _VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH} +def strength_default_help() -> str: + """One-line description of the vendor-adaptive default, derived from the constants. + + Single source of truth for the CLI ``--strength`` help so the numbers can never + drift from the actual ladder (they did once when the per-pipeline split was unified). + """ + return ( + f"vendor-adaptive (OpenAI {OPENAI_STRENGTH} / Google {GEMINI_STRENGTH} / " + f"unknown {UNKNOWN_STRENGTH}, from the C2PA issuer; same ladder for both pipelines)" + ) + + def resolve_strength(strength: float | None, vendor: str | None = None) -> float: """Resolve the denoising strength, applying the vendor default when unset. ``None`` means "the user did not pass ``--strength``", which resolves **vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from ``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` / - ``UNKNOWN_STRENGTH``. An explicit value always wins (including ``0.0`` -- the check - is ``is None``, not falsiness). The ``default`` and ``controlnet`` profiles share - the same SDXL base (the ControlNet only preserves structure), so the default does - NOT depend on the profile. Shared by the CLI (for display) and the engine (for - execution) so the two never disagree -- both must pass the SAME ``vendor``. + ``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module + comment for why one ladder is correct). An explicit value always wins (including + ``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display) + and the engine (for execution) so the two never disagree -- both must pass the SAME + ``vendor``. """ if strength is not None: return strength @@ -90,11 +124,11 @@ def vendor_for_strength(image_path: Path) -> Literal["openai", "google"] | None: def get_model_id_for_profile(profile: str) -> str: """Map CLI model profile names to concrete Hugging Face model IDs. - Both ``default`` and ``controlnet`` use the SDXL base checkpoint -- the canny + Both ``sdxl`` and ``controlnet`` use the SDXL base checkpoint -- the canny ControlNet (``CONTROLNET_CANNY_MODEL``) is an add-on loaded on top of it, not a - separate base model. + separate base model. The legacy ``default`` alias resolves to ``sdxl``. """ - normalized = profile.strip().lower() - if normalized in ("default", "controlnet"): + normalized = normalize_profile(profile) + if normalized in (SDXL_PROFILE, CONTROLNET_PROFILE): return DEFAULT_MODEL_ID - raise ValueError(f"Unknown model profile '{profile}'. Use one of: default, controlnet.") + raise ValueError(f"Unknown model profile '{profile}'. Use one of: sdxl, controlnet.") diff --git a/src/remove_ai_watermarks/noai/watermark_remover.py b/src/remove_ai_watermarks/noai/watermark_remover.py index 53bed4f..69c4990 100644 --- a/src/remove_ai_watermarks/noai/watermark_remover.py +++ b/src/remove_ai_watermarks/noai/watermark_remover.py @@ -1,13 +1,17 @@ """Watermark removal using diffusion model regeneration attack. Two pipelines: -1. ``default`` -- plain SDXL img2img. Partial-noise regeneration scrubs the - invisible watermark; ``strength`` controls how much is regenerated. -2. ``controlnet`` -- SDXL img2img with a canny ControlNet. The watermark REMOVAL - still comes from the img2img regeneration (``strength``); the ControlNet only - PRESERVES structure (text/faces) by conditioning on the edge map. No original - pixels are ever copied or frozen, so SynthID does not survive. +1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark + REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet + only PRESERVES structure (text/faces) by conditioning on the edge map. No original + pixels are ever copied or frozen. Because the edge map keeps the regeneration + closer to the original, it needs a higher ``strength`` floor than ``default`` to + destroy SynthID (the certified controlnet ladder; see ``watermark_profiles``). ``controlnet_conditioning_scale`` is the preservation knob. +2. ``default`` -- plain SDXL img2img. Partial-noise regeneration scrubs the + invisible watermark; ``strength`` controls how much is regenerated. Lighter (no + ControlNet weights), but at the low default strength it leaves SynthID on + flat-graphic content -- use it for inputs without text/faces. """ # torch/diffusers/cv2 boundary: these libs ship no usable types for the tensor and @@ -32,6 +36,7 @@ from remove_ai_watermarks.noai.watermark_profiles import ( CONTROLNET_CANNY_MODEL, DEFAULT_MODEL_ID, DEFAULT_STRENGTH, + normalize_profile, resolve_strength, ) @@ -323,13 +328,14 @@ class WatermarkRemover: torch_dtype: Any = None, progress_callback: Callable[[str], None] | None = None, hf_token: str | None = None, - pipeline: str = "default", + pipeline: str = "controlnet", controlnet_conditioning_scale: float = 1.0, ) -> None: self.model_id = model_id or self.DEFAULT_MODEL_ID # The pipeline profile is threaded explicitly (not inferred from model_id): - # both "default" and "controlnet" use the same SDXL base checkpoint. - self.model_profile = pipeline + # both "sdxl" and "controlnet" use the same SDXL base checkpoint. Normalize so + # the legacy "default" alias resolves to "sdxl". + self.model_profile = normalize_profile(pipeline) self.controlnet_conditioning_scale = controlnet_conditioning_scale if not is_watermark_removal_available(): diff --git a/tests/test_auto_config.py b/tests/test_auto_config.py deleted file mode 100644 index 312ee16..0000000 --- a/tests/test_auto_config.py +++ /dev/null @@ -1,117 +0,0 @@ -"""Tests for the --auto pipeline planner (content-adaptive mode selection). - -Detection runs on synthetic images; the face-present routing is exercised by -monkeypatching ``detect_face`` (a real detectable face fixture is private, never -committed). The planner is cv2-only and torch-free. -""" - -from __future__ import annotations - -import cv2 -import numpy as np - -from remove_ai_watermarks import auto_config, image_io - - -def _write(img, tmp_path, name="x.png"): - p = tmp_path / name - image_io.imwrite(p, img) - return p - - -class TestDetectors: - def test_detect_face_false_on_flat(self): - flat = np.full((200, 200, 3), 128, dtype=np.uint8) - assert auto_config.detect_face(flat) is False - - def test_edge_density_flat_near_zero(self): - flat = np.full((200, 200, 3), 128, dtype=np.uint8) - assert auto_config.edge_density(flat) < 0.001 - - def test_edge_density_text_higher_than_blank(self): - blank = np.full((200, 400, 3), 255, dtype=np.uint8) - text = blank.copy() - cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3) - assert auto_config.edge_density(text) > auto_config.edge_density(blank) - - def test_dbnet_detects_text_card(self): - """The bundled PP-OCRv3 DBNet model fires on a clear text card and not on flat.""" - card = np.full((300, 500, 3), 255, dtype=np.uint8) - cv2.putText(card, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4) - assert auto_config._detect_text_dbnet(card) is True - assert auto_config._detect_text_dbnet(np.full((300, 500, 3), 128, dtype=np.uint8)) is False - - def test_detect_text_falls_back_to_mser_when_dbnet_unavailable(self, monkeypatch): - """If DBNet can't load (returns None), detect_text uses the MSER heuristic.""" - monkeypatch.setattr(auto_config, "_detect_text_dbnet", lambda _img: None) - called = {} - - def _fake_mser(_img): - called["mser"] = True - return True - - monkeypatch.setattr(auto_config, "_detect_text_mser", _fake_mser) - assert auto_config.detect_text(np.full((100, 100, 3), 128, dtype=np.uint8)) is True - assert called.get("mser") is True - - -class TestPlan: - def test_unreadable_returns_none(self, tmp_path): - assert auto_config.plan(tmp_path / "does_not_exist.png") is None - - def test_flat_image_is_default_pipeline_no_polish(self, tmp_path): - flat = np.full((300, 300, 3), 128, dtype=np.uint8) - cfg = auto_config.plan(_write(flat, tmp_path)) - assert cfg is not None - assert cfg.pipeline == "default" # structure-less -> plain SDXL - assert cfg.adaptive_polish is False # no smoothing pass -> no polish - assert cfg.unsharp == 0.0 - assert cfg.humanize == 0.0 - assert cfg.min_resolution == 1024 - - def test_text_image_uses_controlnet(self, tmp_path): - img = np.full((300, 500, 3), 255, dtype=np.uint8) - cv2.putText(img, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4) - cfg = auto_config.plan(_write(img, tmp_path)) - assert cfg is not None - # Text creates edges above the structure-less floor -> controlnet preserves them. - assert cfg.pipeline == "controlnet" - - def test_face_routes_to_controlnet_and_polish(self, tmp_path, monkeypatch): - monkeypatch.setattr(auto_config, "detect_face", lambda _img: True) - flat = np.full((300, 300, 3), 128, dtype=np.uint8) - cfg = auto_config.plan(_write(flat, tmp_path)) - assert cfg is not None - assert cfg.has_face - assert cfg.pipeline == "controlnet" - assert cfg.adaptive_polish # smoothing pass ran -> adaptive polish on - assert cfg.unsharp == 0.0 # fixed knobs off; the adaptive polish replaces them - assert cfg.humanize == 0.0 - - def test_text_signal_forces_controlnet_on_flat(self, tmp_path, monkeypatch): - monkeypatch.setattr(auto_config, "detect_text", lambda _img: True) - flat = np.full((300, 300, 3), 128, dtype=np.uint8) - cfg = auto_config.plan(_write(flat, tmp_path)) - assert cfg is not None - assert cfg.has_text - assert cfg.pipeline == "controlnet" - - -class TestReason: - def test_reason_summarizes_plan(self): - cfg = auto_config.AutoConfig( - pipeline="controlnet", - adaptive_polish=True, - unsharp=0.0, - humanize=0.0, - min_resolution=1024, - has_face=True, - has_text=False, - edge_density=0.05, - width=800, - height=600, - ) - r = cfg.reason - assert "controlnet" in r - assert "face" in r - assert "adaptive polish" in r diff --git a/tests/test_cli.py b/tests/test_cli.py index e73d6a5..defc988 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -277,6 +277,72 @@ class TestInvisibleCommand: expected = sample_png.with_stem(sample_png.stem + "_clean") assert expected.exists() + def test_invisible_adaptive_polish_on_by_default(self, runner, sample_png): + mock_cls, mock_engine = _mock_invisible_engine() + with ( + patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), + patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), + patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), + ): + result = runner.invoke(main, ["invisible", str(sample_png)]) + assert result.exit_code == 0, result.output + # adaptive_polish is ON by default (self-gating, so a no-op where not needed). + assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True + # Default model is None (the SDXL base) and CFG is None (the library's 7.5). + assert mock_cls.call_args.kwargs["model_id"] is None + assert mock_engine.remove_watermark.call_args.kwargs["guidance_scale"] is None + + def test_invisible_no_adaptive_polish_disables(self, runner, sample_png): + mock_cls, mock_engine = _mock_invisible_engine() + with ( + patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), + patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), + patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), + ): + result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish"]) + assert result.exit_code == 0, result.output + assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is False + + def test_invisible_model_and_guidance_scale_flow_to_engine(self, runner, sample_png): + mock_cls, mock_engine = _mock_invisible_engine() + with ( + patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), + patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), + patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), + ): + result = runner.invoke( + main, + ["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5"], + ) + assert result.exit_code == 0, result.output + assert mock_cls.call_args.kwargs["model_id"] == "org/custom-sdxl" + assert mock_engine.remove_watermark.call_args.kwargs["guidance_scale"] == 5.5 + + def test_pipeline_default_alias_warns_and_maps_to_sdxl(self, runner, sample_png): + mock_cls, _mock_engine = _mock_invisible_engine() + with ( + patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), + patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), + patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), + ): + result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default"]) + assert result.exit_code == 0, result.output + # The legacy value warns and is normalized to "sdxl" before the engine is built. + assert "deprecated" in result.output.lower() + assert mock_cls.call_args.kwargs["pipeline"] == "sdxl" + + def test_pipeline_sdxl_does_not_warn(self, runner, sample_png): + mock_cls, _mock_engine = _mock_invisible_engine() + with ( + patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), + patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), + patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), + ): + result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl"]) + assert result.exit_code == 0, result.output + assert "deprecated" not in result.output.lower() + assert mock_cls.call_args.kwargs["pipeline"] == "sdxl" + def test_invisible_nonexistent_file(self, runner): result = runner.invoke(main, ["invisible", "/nonexistent/file.png"]) assert result.exit_code != 0 @@ -514,32 +580,17 @@ class TestBatchCommand: assert out[0, 0, 3] == 0 assert out[100, 100, 3] == 255 - def test_batch_auto_plans_pipeline_per_image(self, runner, tmp_path): - """--auto in batch re-plans the pipeline/restore/polish per image and - builds one engine per resolved pipeline.""" - from remove_ai_watermarks import auto_config - + def test_batch_auto_is_deprecated_and_enables_polish(self, runner, tmp_path): + """--auto is retired: it warns and just enables the adaptive polish (the + pipeline is always the default controlnet now).""" input_dir = _make_batch_dir(tmp_path, count=2) output_dir = tmp_path / "output" - plan = auto_config.AutoConfig( - pipeline="controlnet", - adaptive_polish=True, - unsharp=0.0, - humanize=0.0, - min_resolution=1024, - has_face=True, - has_text=False, - edge_density=0.05, - width=200, - height=200, - ) mock_cls, mock_engine = _mock_invisible_engine() with ( patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True), patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls), patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True), patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True), - patch("remove_ai_watermarks.auto_config.plan", return_value=plan), ): result = runner.invoke( main, @@ -547,9 +598,9 @@ class TestBatchCommand: ) assert result.exit_code == 0, result.output assert "2 processed" in result.output - # Engine built with the auto-resolved controlnet pipeline. + assert "deprecated" in result.output.lower() + # Pipeline stays the default controlnet; --auto only turned the polish on. assert mock_cls.call_args.kwargs["pipeline"] == "controlnet" - # The auto plan's adaptive polish reached the engine call. assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True def test_batch_default_output_dir(self, runner, tmp_path): diff --git a/tests/test_platform.py b/tests/test_platform.py index c5d019f..ae69e91 100644 --- a/tests/test_platform.py +++ b/tests/test_platform.py @@ -21,7 +21,9 @@ from remove_ai_watermarks.noai.watermark_profiles import ( OPENAI_STRENGTH, UNKNOWN_STRENGTH, get_model_id_for_profile, + normalize_profile, resolve_strength, + strength_default_help, ) from remove_ai_watermarks.noai.watermark_remover import get_device, is_watermark_removal_available @@ -111,8 +113,14 @@ class TestMpsErrorDetection: class TestModelProfiles: """Tests for watermark_profiles.py.""" - def test_default_profile(self): + def test_sdxl_profile(self): + assert get_model_id_for_profile("sdxl") == "stabilityai/stable-diffusion-xl-base-1.0" + + def test_default_alias_resolves_to_sdxl(self): + # "default" is the legacy alias for "sdxl" (back-compat for existing scripts). assert get_model_id_for_profile("default") == "stabilityai/stable-diffusion-xl-base-1.0" + assert normalize_profile("default") == "sdxl" + assert normalize_profile("controlnet") == "controlnet" def test_controlnet_profile(self): # controlnet shares the SDXL base checkpoint (the ControlNet is an add-on). @@ -127,9 +135,9 @@ class TestResolveStrength: """resolve_strength applies the vendor default only when strength is unset.""" def test_none_is_vendor_adaptive(self): - # No vendor -> unknown default; OpenAI lower, Google == unknown. The default - # is vendor-adaptive and does NOT depend on the pipeline profile (default and - # controlnet share the same SDXL base). + # No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder + # applies to both pipelines (the certified controlnet floors), so there is no + # pipeline argument. assert resolve_strength(None) == UNKNOWN_STRENGTH assert resolve_strength(None, "openai") == OPENAI_STRENGTH assert resolve_strength(None, "google") == GEMINI_STRENGTH @@ -137,10 +145,25 @@ class TestResolveStrength: # An unrecognized vendor string falls through to the unknown default. assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH + def test_ladder_is_the_certified_controlnet_floors(self): + # The unified ladder == the oracle-certified controlnet floors (OpenAI 0.20, + # Google/unknown 0.30); Google is the more-robust watermark, so it is higher. + assert OPENAI_STRENGTH == 0.20 + assert GEMINI_STRENGTH == 0.30 + assert UNKNOWN_STRENGTH == 0.30 + assert OPENAI_STRENGTH < GEMINI_STRENGTH + def test_default_strength_alias_is_unknown_vendor_value(self): assert DEFAULT_STRENGTH == UNKNOWN_STRENGTH assert OPENAI_STRENGTH < UNKNOWN_STRENGTH + def test_strength_default_help_derives_from_constants(self): + # The CLI --strength help is built from this, so it can never drift from the ladder. + h = strength_default_help() + assert str(OPENAI_STRENGTH) in h + assert str(GEMINI_STRENGTH) in h + assert str(UNKNOWN_STRENGTH) in h + def test_explicit_value_overrides_vendor(self): assert resolve_strength(0.3) == 0.3 assert resolve_strength(0.3, "openai") == 0.3