diff --git a/.gitignore b/.gitignore index a9af08a..0747245 100644 --- a/.gitignore +++ b/.gitignore @@ -50,3 +50,6 @@ data/samsung_capture/captures/samsung_content_* # (GFPGAN wrote RetinaFace/parsing weights to a CWD ./gfpgan/weights/ working # dir on first use). Runtime artifact, never committed. gfpgan/ + +# Qwen ControlNet experiment outputs (throwaway eval; never the committed corpus) +scripts/_qwen_exp_out/ diff --git a/CLAUDE.md b/CLAUDE.md index 67b2d0c..18d2d2f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau ## How to run - `uv run remove-ai-watermarks all -o ` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True. -- `uv run remove-ai-watermarks invisible -o ` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). +- `uv run remove-ai-watermarks invisible -o ` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). - `uv run remove-ai-watermarks visible -o ` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark ` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0). - `uv run remove-ai-watermarks erase --region x,y,w,h -o ` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable. - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector @@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal - `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet). - `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`. - `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise. -- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15, so pass explicit `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired). Fidelity measured by `scripts/fidelity_metrics.py` (OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball), compared ONLY at each pipeline's oracle-confirmed scrub floor (where SynthID is removed in BOTH — equal-strength is invalid where it leaves one un-scrubbed): Qwen wins TEXT (incl. CJK), controlnet wins FACES (Qwen smooths faces more) — Qwen is the text-preserving remover, not a universal fidelity win. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one). +- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15). `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically (the old manual `--strength 0.25` workaround is retired). `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via `_qwen_target_size`) — without it the pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (fixed 2026-06-20). **`qwen` is a MANUAL opt-in only — there is NO auto-router.** Measured (`scripts/fidelity_metrics.py`, OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball): qwen beats controlnet on ONE niche only — **clean body text on a plain background, no faces** (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES (it always has) AND **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes, qwen re-renders and garbles them). So a content `--pipeline auto` router and a faces+text **mixed dual-pass** were prototyped and **DROPPED** (2026-06-20): on the canonical faces+text case controlnet wins every metric incl. text, so mixed loses; and "text→qwen" can't be auto-decided (it is body-vs-display text that matters, undetectable cheaply). qwen stays for callers who KNOW their content is clean-text-heavy and face-free. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one). - `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. Also home to `feather_region_composite(base, regenerated, box, *, feather)` — the pure region-targeted compositor for **AI-enhanced composites** (`ai_source_kind == "enhanced"`): blends the regenerated AI box back over the original with a feathered seam, leaving the real photo OUTSIDE the box pixel-exact. It backs `WatermarkRemover.remove_watermark(region=...)` (regenerate ONLY the AI region, not the whole frame); the no-model lossless region path stays `region_eraser.erase`. New tile/region-blend tuning goes in these pure helpers; do not inline blend math into the runner. - `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit). - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text. diff --git a/docs/known-limitations.md b/docs/known-limitations.md index 095e958..3843178 100644 --- a/docs/known-limitations.md +++ b/docs/known-limitations.md @@ -144,8 +144,8 @@ The scrub still comes from the img2img `strength` (same lever as SDXL); the call - **Text:** Qwen wins on substantial Latin/mixed-script text -- OCR CER, controlnet vs Qwen: openai_1 (EN+RU+ZH, both 0.10) 0.385 vs **0.241**, openai_2 (EN, both 0.10) 0.341 vs **0.290**. On a SHORT CJK sign (gemini_1, cnet 0.15 / Qwen 0.25) it is a TIE (0.037 vs 0.037 -- both near-perfect; the earlier Qwen 0.000 was at the higher 0.30, not the certified floor). - **Faces:** controlnet wins -- gemini_3, 18 faces (cnet 0.15 / Qwen 0.25): ArcFace identity 0.546 vs 0.382, Laplacian-variance retention 0.62 vs 0.40, face LPIPS 0.09 vs 0.17 (Qwen smooths faces MORE; the gap narrows vs Qwen 0.30 but controlnet still wins clearly). -**Conclusion: Qwen is the better TEXT-preserving remover (substantial Latin/mixed text), NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better, so the path is a content-routed lane (text→qwen, faces→controlnet), not a blanket migration.** Caveat: `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15) UNDER-scrubs Gemini on `qwen` (floor 0.25) — pass `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired. Flat-graphic content was not in the sample. +**Conclusion: Qwen wins TEXT only for clean body text on a plain background with NO faces; controlnet wins faces AND display/decorative text in a scene. So `qwen` is a MANUAL `--pipeline qwen` opt-in, not a routed lane.** A content `--pipeline auto` router + a faces+text mixed dual-pass were prototyped and DROPPED (2026-06-20): on the canonical faces+text case (the abba poster, faces + display text) controlnet won EVERY metric incl. text (CER 0.114 vs qwen 0.379), so grafting qwen text only hurts; and "text→qwen" is undecidable cheaply (body-vs-display text is what matters). Caveat: `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`, Gemini 0.25), so `--pipeline qwen` gets the 0.25 Gemini floor automatically — the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` now passes an explicit height/width (qwen squished non-square inputs to 1024² without it). Flat-graphic content was not in the sample. -**Improving Qwen (ship vs improve):** the cited research on fixing the face-smoothing while keeping the text win (Qwen-Image ControlNet for structure conditioning, Qwen-Image-Edit, Z-Image-Turbo as a cheaper text-preserving substitute, non-regenerative detail restoration) lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable now as an opt-in text lane; the strongest improvement lead is adding a Qwen-Image ControlNet, but no improvement has measured face-fidelity at our floors yet (validate with `scripts/fidelity_metrics.py` first). +**Improving Qwen (ship vs improve):** the cited research lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable as an opt-in text lane. **The "add a Qwen-Image ControlNet to fix face smoothing" lead was built, measured, and CLOSED (2026-06-20):** a DiffSynth-Studio Qwen + Apache-2.0 blockwise-canny ControlNet at the Gemini floor 0.25 did NOT restore face skin texture (face Laplacian-variance retention flat 0.40 -> 0.40, 13/16 faces within +-0.02; the SDXL+canny target 0.62 was not approached), because canny carries edges not skin grain and Qwen's higher Gemini floor (0.25 vs SDXL+canny 0.15) forces more smoothing -- and a deep-research sweep confirmed NO permissively-licensed Qwen tile/detail/realism/skin ControlNet exists anywhere (every Qwen conditioning is geometry). So **faces stay on SDXL+controlnet; Qwen is the text lane, not a face fix.** The strongest remaining lead is **Z-Image-Turbo** (6B, Apache-2.0, `ZImageImg2ImgPipeline`, scrub mechanism preserved) -- its own SynthID floor and face/text fidelity are UNMEASURED; that is the next experiment. Non-regenerative high-frequency detail re-injection is NOT safe by assumption (the "clean-output high frequencies do not carry the watermark" claim was refuted) -- it must be oracle-gated. Always validate any improvement at the certified floors with `scripts/fidelity_metrics.py` first. **Seed as a quality lever (measured, openai_1 at 0.10, seeds 0-4):** the seed barely moves whole-image fidelity (img LPIPS 0.062-0.065, SSIM 0.855-0.857, PSNR 28.5-28.7 — flat) but does shift TEXT legibility (OCR CER 0.241-0.290, ~17% spread) -- the seed changes WHICH details get regenerated, not the overall level. So a per-image best-of-N-seed selection is a WEAK, text-only lever (pick the lowest-CER seed that still scrubs; fidelity selection needs no oracle). Not worth the N× cost for general use -- pin one decent seed in prod; reserve best-of-N for text-heavy premium cases. diff --git a/docs/module-internals.md b/docs/module-internals.md index d1339c5..0d930cc 100644 --- a/docs/module-internals.md +++ b/docs/module-internals.md @@ -185,7 +185,7 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo **`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights). -**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength 0.25` for Gemini content on `qwen` until a Qwen-specific ladder is wired into `resolve_strength`.** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed — see the metric nuance above: Qwen wins substantial text, controlnet wins faces. +**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15); `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically -- the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via the pure `_qwen_target_size`); WITHOUT it the img2img pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (the abba 2816x1536 case came back 1024x1024, distorting the scene and garbling text — fixed 2026-06-20, tested in `TestQwenKwargs`).** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed. **`qwen` is a MANUAL opt-in only — there is NO auto-router (one was prototyped and DROPPED, see below).** It wins ONE niche: clean body text on a plain background, NO faces (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES and **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes; qwen re-renders and garbles them). **`--pipeline auto` + a faces+text mixed dual-pass were built and DROPPED (2026-06-20):** on the canonical faces+text case (abba) controlnet wins EVERY metric incl. text, so grafting qwen text would only hurt; and "text→qwen" is undecidable cheaply (it is body-vs-display text that matters). The router/detector/mixed modules were removed; the geometry fix + the Qwen strength ladder were kept (they make the manual `--pipeline qwen` correct). **Do NOT retry "add a Qwen ControlNet to close the face gap" — it was built, measured, and CLOSED 2026-06-20:** a DiffSynth blockwise-canny Qwen ControlNet did not restore face skin texture (lapvar flat 0.40, canny carries edges not skin grain) and no permissively-licensed Qwen tile/detail/skin ControlNet exists anywhere (all conditioning is geometry). Faces stay on controlnet; the next improvement lead is Z-Image-Turbo (Apache-2.0, unmeasured floor). Full record + the deep-research sweep in `docs/qwen-improvement-research.md`. **`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). @@ -213,6 +213,15 @@ History: `auto_config.plan()` was a content-adaptive planner that detected faces **`--auto` is now a DEPRECATED no-op** (`cli._resolve_auto_polish`): controlnet is already the default pipeline AND the adaptive polish is ON by default, so `--auto` has nothing left to do — it only prints a deprecation warning and passes `adaptive_polish` through unchanged (an explicit `--no-adaptive-polish` still wins). (Originally it re-enabled the polish; once the polish default flipped to ON the same day, the parameter-source branch became dead and was dropped.) The **adaptive polish itself lives on** in `humanizer.adaptive_polish` (CLI `--adaptive-polish/--no-adaptive-polish`, **ON by default since 2026-06-09** — it self-gates to a no-op where there is no detail deficit, so default-on is safe; uses the full-res original as the detail reference) — see the `humanizer` test note. `batch` resolves the polish once before the loop (one warning) and caches the invisible engine per pipeline (`ctx.obj["_inv_engines"]`). +## Content `--pipeline auto` router + faces+text mixed dual-pass — PROTOTYPED and DROPPED (2026-06-20) + +A `--pipeline auto` content router (`pipeline_router.py` + `content_detect.py`: Haar faces + MSER text → route text→qwen / faces→controlnet / both→mixed) and a faces+text **mixed dual-pass** (`mixed_pipeline.py`: scrub the whole frame on BOTH pipelines, then graft the qwen text regions onto the controlnet base via `tiling.feather_region_composite`) were built, run on Modal (the abba poster: faces + display text), measured, and **removed**. Why it failed: +- On the canonical faces+text image **controlnet wins EVERY metric, including text** (CER 0.114 vs qwen 0.379; ID 0.64 vs 0.36; lapvar 0.71 vs 0.59) — canny holds the existing letter shapes, qwen re-renders display/decorative text and garbles it. So grafting qwen text onto the controlnet base only HURTS. +- qwen beats controlnet on text ONLY for clean body text on a plain background with no faces (openai_1/2) — a niche where there are no faces to route around anyway, so `--pipeline qwen` alone covers it. The faces+clean-body-text intersection is near-empty. +- "text→qwen" is not cheaply decidable: it is body-vs-display text that matters, which face/text detectors can't tell apart. MSER also over-fired (47% of the busy poster, incl. faces). + +KEPT from that work (independently valid for the manual `--pipeline qwen`): the qwen **geometry fix** (`_qwen_target_size` + `_build_qwen_kwargs` height/width — qwen squished non-square inputs to 1024² without it) and the **pipeline-aware `resolve_strength`** Qwen ladder (Gemini 0.25). Also kept: the `fidelity_metrics.py` one-to-one face matcher. The throwaway Modal eval scripts were removed after the run (findings recorded here and in `docs/qwen-improvement-research.md`). + ## `upscaler.py` `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps). diff --git a/docs/qwen-improvement-research.md b/docs/qwen-improvement-research.md index b75688e..868c325 100644 --- a/docs/qwen-improvement-research.md +++ b/docs/qwen-improvement-research.md @@ -29,6 +29,73 @@ the 20B cost. None of the improvements has measured face-fidelity numbers at our scrub floors yet, so each must be validated with `scripts/fidelity_metrics.py` plus the oracle before shipping. +## Follow-up: ControlNet experiment + deeper research (2026-06-20) + +The verdict's strongest lead -- adding a Qwen-Image ControlNet -- was **built, measured, and +CLOSED**. + +**Experiment** (Modal A100-80GB; DiffSynth-Studio `QwenImagePipeline` + the Apache-2.0 +`DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny` -- the only framework exposing +Qwen-Image + canny ControlNet + img2img `denoising_strength` in ONE call; diffusers ships no +`QwenImageControlNetImg2ImgPipeline`, its three Qwen ControlNet pipelines are txt2img only). +Measured on `gemini_3` (18 faces) at the Gemini scrub floor 0.25 vs base-Qwen 0.25 with +`scripts/fidelity_metrics.py`: +- **The actual failure mode (face skin texture) was NOT restored:** Laplacian-variance + retention stayed flat (base 0.40 -> qwen+canny 0.40; per-face 13/16 within +-0.02 after a + one-to-one face match, sd 0.016 -- not an averaging artifact). The SDXL+canny target 0.62 + was not approached. +- Identity rose modestly and broadly (ArcFace 0.346 -> 0.415, 12/16 faces improved) but the + absolute stays ~0.42 ("a different person, slightly closer"). +- Mechanism (verified, not inferred): canny conditioning was applied fully (scale 1.0, full + denoise schedule); the canny edge map is clean facial geometry with BLANK skin (4.83% edge + density) -- canny carries edges, not skin grain. Root cause: Qwen's Gemini floor (0.25) is + higher than SDXL+canny's (0.15), forcing more denoising -> more smoothing; structure + conditioning cannot compensate for that. + +**Deeper research** (deep-research harness, 103 agents, 3-vote adversarial): +- **[high, unanimous] No permissively-licensed Qwen-Image tile / detail / realism / skin + ControlNet exists anywhere** -- DiffSynth first-party is Canny/Depth/Inpaint only, InstantX + Union is canny/soft-edge/depth/pose, the official QwenLM repo ships none. Every Qwen + conditioning is GEOMETRY, the same class as the tested canny. **The "add a Qwen ControlNet to + fix faces" lead is closed for good.** +- **[high, unanimous] Z-Image / Z-Image-Turbo (6B, Apache-2.0 on code AND weights, ~1/3 of + Qwen 20B)** ships a documented `ZImageImg2ImgPipeline` with standard strength denoising, so + it preserves the scrub mechanism. Its own SynthID scrub floor and face/text fidelity are + UNMEASURED -- this is the strongest concrete NEXT experiment. +- **[medium] Lowering Qwen's scrub floor has no off-the-shelf SynthID answer:** the "partial + img2img ~0.3 breaks robust watermarks" literature tests open schemes + (StegaStamp/TrustMark/VINE), NEVER SynthID (proprietary decoder) -- analogy, not proof. No + minimal-strength SynthID attack under a named permissive license was found. +- **REFUTED [0-3]:** "re-injecting high-frequency detail from a clean diffusion output would + not carry the watermark back." So non-regenerative detail transfer is NOT safe by + assumption -- the transferred high-frequency band must be gated against the SynthID oracle. + +**Net for the pipeline:** **faces stay on SDXL+controlnet**; there is no Qwen face-fix. +The live frontier is Z-Image-Turbo (next experiment) and oracle-gated non-regenerative detail +re-injection. + +**Follow-up (2026-06-20) — the content-routed lane / mixed dual-pass was tested and DROPPED.** +A `--pipeline auto` router (Haar+MSER → text→qwen / faces→controlnet / both→mixed) and a +faces+text mixed dual-pass (scrub the whole frame on both, graft qwen text regions onto the +controlnet base) were built and run on Modal (the abba poster: faces + display text). On that +canonical faces+text case **controlnet won EVERY metric, including text** (CER 0.114 vs qwen +0.379; ID 0.64 vs 0.36) — canny holds existing letter shapes, qwen re-renders display text and +garbles it, so grafting qwen text only hurts. Qwen beats controlnet on text ONLY for clean body +text on a plain background with no faces (openai_1/2), a niche `--pipeline qwen` alone covers; +the faces+clean-body-text intersection is near-empty, and "text→qwen" is undecidable cheaply +(body-vs-display text is what matters). So the router + mixed modules were removed and **`qwen` +is a manual `--pipeline qwen` opt-in only.** KEPT (independently valid): the qwen geometry fix +(it squished non-square inputs to 1024²), the pipeline-aware `resolve_strength` Qwen ladder, and +the `fidelity_metrics.py` one-to-one face matcher below. + +**Tooling fix surfaced by this run:** `scripts/fidelity_metrics.py` face matching was changed +from per-face nearest-center to a collision-free one-to-one assignment +(`assign_faces_one_to_one`, gated by face size), after the 18-face `gemini_3` exposed +collisions (the regenerated variants detected 17 faces, so two originals mapped to the same +variant face, corrupting the identity metric). lapvar/LPIPS were always anchored to the +original bbox and stayed collision-immune. Regression-guarded by +`tests/test_fidelity_matching.py`. + ## Findings 1. **[high, 3-0] A permissively-licensed Qwen-Image ControlNet exists today and is diff --git a/scripts/fidelity_metrics.py b/scripts/fidelity_metrics.py index 932cb03..5578010 100644 --- a/scripts/fidelity_metrics.py +++ b/scripts/fidelity_metrics.py @@ -186,16 +186,50 @@ def _lap_var(bgr: np.ndarray) -> float: return float(cv2.Laplacian(gray, cv2.CV_64F).var()) -def _match_face(orig_face: Any, variant_faces: list[Any]) -> Any: - """Nearest variant face to an original face by bbox-center distance (geometry kept).""" - ox, oy = (orig_face.bbox[0] + orig_face.bbox[2]) / 2, (orig_face.bbox[1] + orig_face.bbox[3]) / 2 - best, best_d = None, 1e18 - for vf in variant_faces: - vx, vy = (vf.bbox[0] + vf.bbox[2]) / 2, (vf.bbox[1] + vf.bbox[3]) / 2 - d = (ox - vx) ** 2 + (oy - vy) ** 2 - if d < best_d: - best, best_d = vf, d - return best +def _bbox_center(bbox: Any) -> tuple[float, float]: + return (bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2 + + +def _bbox_diag(bbox: Any) -> float: + return float(((bbox[2] - bbox[0]) ** 2 + (bbox[3] - bbox[1]) ** 2) ** 0.5) + + +def assign_faces_one_to_one( + ref_centers: list[tuple[float, float]], + var_centers: list[tuple[float, float]], + ref_diags: list[float], + max_frac: float = 0.6, +) -> dict[int, int]: + """One-to-one nearest-center face assignment (pure; unit-tested without insightface). + + Per-face nearest matching collides on multi-face images -- two original faces can both + pick the SAME variant face (e.g. when regeneration drops a face, so the variant has fewer + detections), corrupting the identity metric (the lapvar/LPIPS metrics are immune: they are + anchored to the ORIGINAL bbox on both images). This greedy-by-distance assignment is + collision-free: it walks candidate pairs nearest-first and never reuses a ref or a variant + face. Faces are spatially well-separated, so greedy equals the optimal (Hungarian) result + here without the scipy dependency. A pair is dropped when the center distance exceeds + ``max_frac`` of the original face diagonal (no plausible match -- the face was lost). + + Returns a dict mapping ref-face index -> variant-face index for matched faces only. + """ + pairs: list[tuple[float, int, int]] = [] + for i, (rx, ry) in enumerate(ref_centers): + for j, (vx, vy) in enumerate(var_centers): + pairs.append((((rx - vx) ** 2 + (ry - vy) ** 2) ** 0.5, i, j)) + pairs.sort() + used_ref: set[int] = set() + used_var: set[int] = set() + matched: dict[int, int] = {} + for dist, i, j in pairs: + if i in used_ref or j in used_var: + continue + if dist > max_frac * ref_diags[i]: + continue + matched[i] = j + used_ref.add(i) + used_var.add(j) + return matched def _cosine(a: np.ndarray, b: np.ndarray) -> float: @@ -325,15 +359,19 @@ def compare(original: str, variants: tuple[str, ...], ocr_langs: str, ground_tru app.prepare(ctx_id=-1, det_size=(640, 640)) ref_faces = app.get(ref) if ref_faces: + ref_centers = [_bbox_center(of.bbox) for of in ref_faces] + ref_diags = [_bbox_diag(of.bbox) for of in ref_faces] for label, img in parsed: vfaces = app.get(img) st = face_stats[label] - for of in ref_faces: - vf = _match_face(of, vfaces) - if vf is None: - continue + # One-to-one assignment for identity (collision-free); lapvar/LPIPS stay + # anchored to the original bbox below, so they need no match. + matched = assign_faces_one_to_one(ref_centers, [_bbox_center(vf.bbox) for vf in vfaces], ref_diags) + for oi, of in enumerate(ref_faces): st.n_faces += 1 - st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding)) + vf = vfaces[matched[oi]] if oi in matched else None + if vf is not None: + st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding)) oc, vc = _crop(ref, of.bbox), _crop(img, of.bbox) if oc.size == 0 or vc.size == 0: continue diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index 40918d4..e410685 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -762,7 +762,7 @@ def cmd_invisible( vendor = vendor_for_strength(source) console.print(f" Input: {source.name}") console.print(f" Pipeline: {pipeline}") - console.print(f" Strength: {resolve_strength(strength, vendor)} Steps: {steps}") + console.print(f" Strength: {resolve_strength(strength, vendor, pipeline)} Steps: {steps}") t0 = time.monotonic() result_path = engine.remove_watermark( @@ -1075,7 +1075,7 @@ def cmd_all( # already lost its C2PA to the visible-removal pass, so reading it would # always resolve to the unknown-vendor default. vendor = vendor_for_strength(source) - console.print(f" Strength: {resolve_strength(strength, vendor)} Steps: {steps}") + console.print(f" Strength: {resolve_strength(strength, vendor, pipeline)} Steps: {steps}") inv_engine.remove_watermark( image_path=tmp_path, output_path=tmp_path, diff --git a/src/remove_ai_watermarks/noai/watermark_profiles.py b/src/remove_ai_watermarks/noai/watermark_profiles.py index 14cc874..3f8bcc8 100644 --- a/src/remove_ai_watermarks/noai/watermark_profiles.py +++ b/src/remove_ai_watermarks/noai/watermark_profiles.py @@ -18,9 +18,10 @@ DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0" # oracle floors (2026-06-20): OpenAI **0.10** (seed-robust -- clean on seeds 0-4) and # Google/Gemini **0.25** (seed 0 verified on 2 images; pin a seed in prod, the Gemini # oracle rate-limits volume seed-repeat). The Gemini floor (0.25) is HIGHER than the -# certified controlnet Gemini floor (0.15), and ``resolve_strength`` is shared/ -# pipeline-independent, so pass an explicit ``--strength 0.25`` for Gemini content on -# this pipeline until a Qwen-specific ladder is wired into ``resolve_strength``. +# certified controlnet Gemini floor (0.15); ``resolve_strength(..., pipeline="qwen")`` +# now carries this via ``_QWEN_VENDOR_STRENGTH`` (below), so ``--pipeline qwen`` gets the +# right floor automatically -- the old manual "pass --strength 0.25 for Gemini on qwen" +# workaround is retired. # (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there # is no QWEN_PROFILE constant -- only the model id is referenced from code.) QWEN_MODEL_ID = "Qwen/Qwen-Image" @@ -90,6 +91,18 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH # Detected-vendor -> default strength. Vendor strings come from `vendor_for_strength`. _VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH} +# Qwen has its OWN certified floors (Modal A100-80GB, 2026-06-20), DIFFERENT from the +# SDXL ladder above: OpenAI 0.10 (seed-robust), Gemini 0.25 (HIGHER than controlnet's +# 0.15 -- the 20B MMDiT perturbs less per denoising step, so it needs more strength to +# clear Gemini SynthID). Unknown vendor tracks the higher (Gemini) value, safe-by-default. +# `resolve_strength(..., pipeline="qwen")` uses this table so `--pipeline qwen` carries the +# right floor automatically -- retiring the old manual "pass --strength 0.25 for Gemini on +# qwen" workaround. +QWEN_OPENAI_STRENGTH = 0.10 +QWEN_GEMINI_STRENGTH = 0.25 +QWEN_UNKNOWN_STRENGTH = 0.25 +_QWEN_VENDOR_STRENGTH = {"openai": QWEN_OPENAI_STRENGTH, "google": QWEN_GEMINI_STRENGTH} + def strength_default_help() -> str: """One-line description of the vendor-adaptive default, derived from the constants. @@ -103,20 +116,24 @@ def strength_default_help() -> str: ) -def resolve_strength(strength: float | None, vendor: str | None = None) -> float: +def resolve_strength(strength: float | None, vendor: str | None = None, pipeline: str | None = None) -> float: """Resolve the denoising strength, applying the vendor default when unset. ``None`` means "the user did not pass ``--strength``", which resolves **vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from - ``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` / - ``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module - comment for why one ladder is correct). An explicit value always wins (including + ``vendor_for_strength``) selects the per-vendor floor. The ``sdxl`` and ``controlnet`` + pipelines share ONE ladder (``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` / + ``UNKNOWN_STRENGTH`` -- see the module comment for why); ``qwen`` has its OWN higher + ladder (``_QWEN_VENDOR_STRENGTH``, Gemini 0.25 vs controlnet 0.15), selected when + ``pipeline`` normalizes to ``"qwen"``. An explicit value always wins (including ``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display) and the engine (for execution) so the two never disagree -- both must pass the SAME - ``vendor``. + ``vendor`` and ``pipeline``. """ if strength is not None: return strength + if pipeline is not None and normalize_profile(pipeline) == "qwen": + return _QWEN_VENDOR_STRENGTH.get(vendor or "", QWEN_UNKNOWN_STRENGTH) return _VENDOR_STRENGTH.get(vendor or "", UNKNOWN_STRENGTH) diff --git a/src/remove_ai_watermarks/noai/watermark_remover.py b/src/remove_ai_watermarks/noai/watermark_remover.py index fa38b9f..ad5a493 100644 --- a/src/remove_ai_watermarks/noai/watermark_remover.py +++ b/src/remove_ai_watermarks/noai/watermark_remover.py @@ -322,6 +322,15 @@ _QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original" _QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts" +def _qwen_target_size(width: int, height: int) -> tuple[int, int]: + """Floor (width, height) to a multiple of 16 for Qwen's VAE/patchifier (>= 16). + + Pure; unit-tested. Without explicit dims the img2img pipeline defaults to a 1024x1024 + SQUARE and silently distorts any non-square input. + """ + return max(16, (width // 16) * 16), max(16, (height // 16) * 16) + + def _build_qwen_kwargs( image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any ) -> dict[str, Any]: @@ -329,7 +338,12 @@ def _build_qwen_kwargs( Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``. + Passes an explicit ``height``/``width`` derived from the input (floored to /16): the + pipeline otherwise defaults to a 1024x1024 SQUARE, squishing any non-square input + (the abba mixed-seam test: a 2816x1536 poster came back 1024x1024, distorting the + scene and garbling text). So qwen regenerates at the input's own geometry. """ + qw, qh = _qwen_target_size(image.width, image.height) return { "prompt": _QWEN_PROMPT, "negative_prompt": _QWEN_NEGATIVE, @@ -338,6 +352,8 @@ def _build_qwen_kwargs( "num_inference_steps": num_inference_steps, "true_cfg_scale": true_cfg_scale, "generator": generator, + "height": qh, + "width": qw, } @@ -614,7 +630,7 @@ class WatermarkRemover: if output_path is None: output_path = image_path - strength = resolve_strength(strength, vendor) + strength = resolve_strength(strength, vendor, self.model_profile) if not 0.0 <= strength <= 1.0: raise ValueError(f"Strength must be between 0.0 and 1.0, got {strength}") diff --git a/tests/test_fidelity_matching.py b/tests/test_fidelity_matching.py new file mode 100644 index 0000000..4c369cb --- /dev/null +++ b/tests/test_fidelity_matching.py @@ -0,0 +1,76 @@ +"""Regression test for the one-to-one face matcher in ``scripts/fidelity_metrics.py``. + +The shipped per-face nearest matcher collided on multi-face images (two original faces +both picking the same variant face when regeneration dropped a face), which inflated/ +corrupted the identity metric. ``assign_faces_one_to_one`` is the collision-free +replacement. The function is pure (centers + diagonals in, index map out), so it is +tested here without insightface / the heavy PEP723 env. Caught on the gemini_3 Qwen +ControlNet experiment, where the original had 18 faces but the regenerated variants had +17, producing two collisions under the old matcher. +""" + +from __future__ import annotations + +import importlib.util +import sys +from pathlib import Path + +import pytest + +_SCRIPTS = Path(__file__).resolve().parent.parent / "scripts" + + +def _load_assign(): + # fidelity_metrics is a standalone PEP723 script, not an installed module; load it by + # path with scripts/ on sys.path so its `_plain_console` shim import resolves. + sys.path.insert(0, str(_SCRIPTS)) + try: + spec = importlib.util.spec_from_file_location("fidelity_metrics", _SCRIPTS / "fidelity_metrics.py") + assert spec is not None + assert spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod # @dataclass introspection needs the module registered + spec.loader.exec_module(mod) + except ImportError as exc: # cv2/click absent in a bare env -> skip, not fail + pytest.skip(f"fidelity_metrics import deps missing: {exc}") + finally: + sys.path.remove(str(_SCRIPTS)) + return mod.assign_faces_one_to_one + + +def test_distinct_faces_match_nearest() -> None: + assign = _load_assign() + ref = [(0.0, 0.0), (100.0, 100.0)] + var = [(2.0, 1.0), (98.0, 102.0)] + diags = [50.0, 50.0] + assert assign(ref, var, diags) == {0: 0, 1: 1} + + +def test_no_collision_when_variant_drops_a_face() -> None: + # Two original faces near the SAME single variant face: the old nearest matcher mapped + # BOTH to index 0; one-to-one must give the nearer ref the match and drop the other. + assign = _load_assign() + ref = [(10.0, 10.0), (14.0, 10.0)] # both close to the lone variant + var = [(12.0, 10.0)] + diags = [50.0, 50.0] + matched = assign(ref, var, diags) + assert sorted(matched.values()) == [0] # variant 0 used at most once + assert len(matched) == 1 + + +def test_gate_drops_implausibly_far_match() -> None: + assign = _load_assign() + ref = [(0.0, 0.0)] + var = [(1000.0, 1000.0)] # far beyond 0.6 * diag + diags = [50.0] + assert assign(ref, var, diags) == {} + + +def test_assignment_is_one_to_one_over_many_faces() -> None: + assign = _load_assign() + ref = [(float(i * 100), 0.0) for i in range(18)] + var = [(float(i * 100) + 3.0, 0.0) for i in range(17)] # one fewer, as in the experiment + diags = [50.0] * 18 + matched = assign(ref, var, diags) + assert len(matched) == 17 + assert len(set(matched.values())) == 17 # every variant used at most once diff --git a/tests/test_platform.py b/tests/test_platform.py index ae90334..94d1f9f 100644 --- a/tests/test_platform.py +++ b/tests/test_platform.py @@ -126,6 +126,14 @@ class TestModelProfiles: assert normalize_profile("CONTROLNET") == "controlnet" +class _StubImage: + """Minimal PIL.Image stand-in: just the ``width``/``height`` the pure helper reads.""" + + def __init__(self, width: int, height: int) -> None: + self.width = width + self.height = height + + class TestQwenKwargs: """_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape. @@ -137,18 +145,37 @@ class TestQwenKwargs: from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs gen = object() - kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen) + img = _StubImage(2816, 1536) + kwargs = _build_qwen_kwargs(img, strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen) # Qwen uses true_cfg_scale, NOT SDXL's guidance_scale. assert kwargs["true_cfg_scale"] == 4.0 assert "guidance_scale" not in kwargs # The scrub still comes from strength; image + generator pass through. assert kwargs["strength"] == 0.3 - assert kwargs["image"] == "IMG" + assert kwargs["image"] is img assert kwargs["generator"] is gen # Faithful-regeneration prompt + an explicit negative prompt. assert kwargs["prompt"] assert kwargs["negative_prompt"] + def test_passes_explicit_aspect_preserving_size(self): + # Without height/width the pipeline defaults to 1024x1024 and squishes non-square + # input (the abba mixed-seam regression). Both already multiples of 16 -> unchanged. + from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs + + kwargs = _build_qwen_kwargs( + _StubImage(2816, 1536), strength=0.25, num_inference_steps=40, true_cfg_scale=4.0, generator=None + ) + assert kwargs["width"] == 2816 + assert kwargs["height"] == 1536 + + def test_qwen_target_size_floors_to_multiple_of_16(self): + from remove_ai_watermarks.noai.watermark_remover import _qwen_target_size + + assert _qwen_target_size(2816, 1536) == (2816, 1536) # already /16 + assert _qwen_target_size(1122, 1402) == (1120, 1392) # floored + assert _qwen_target_size(10, 10) == (16, 16) # min clamp, never 0 + def test_qwen_model_id_is_qwen_image(self): from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID @@ -159,15 +186,33 @@ class TestResolveStrength: """resolve_strength applies the vendor default only when strength is unset.""" def test_none_is_vendor_adaptive(self): - # No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder - # applies to both pipelines (the certified controlnet floors), so there is no - # pipeline argument. + # No vendor -> unknown default; OpenAI lower, Google == unknown. The sdxl/controlnet + # pipelines share this ladder (the certified controlnet floors); qwen has its own + # (see test_qwen_pipeline_uses_its_own_higher_ladder). assert resolve_strength(None) == UNKNOWN_STRENGTH assert resolve_strength(None, "openai") == OPENAI_STRENGTH assert resolve_strength(None, "google") == GEMINI_STRENGTH assert resolve_strength(None, None) == UNKNOWN_STRENGTH # An unrecognized vendor string falls through to the unknown default. assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH + # sdxl/controlnet pipelines (and the "default" alias) use the same shared ladder. + assert resolve_strength(None, "google", "controlnet") == GEMINI_STRENGTH + assert resolve_strength(None, "google", "sdxl") == GEMINI_STRENGTH + + def test_qwen_pipeline_uses_its_own_higher_ladder(self): + # Qwen's certified Gemini floor (0.25) is HIGHER than controlnet's (0.15); OpenAI + # matches (0.10). Unknown vendor on qwen tracks the higher Gemini value. This retires + # the old manual "pass --strength 0.25 for Gemini on qwen" workaround. + from remove_ai_watermarks.noai.watermark_profiles import QWEN_GEMINI_STRENGTH, QWEN_OPENAI_STRENGTH + + assert QWEN_GEMINI_STRENGTH == 0.25 + assert QWEN_OPENAI_STRENGTH == 0.10 + assert resolve_strength(None, "google", "qwen") == QWEN_GEMINI_STRENGTH + assert resolve_strength(None, "openai", "qwen") == QWEN_OPENAI_STRENGTH + assert resolve_strength(None, None, "qwen") == QWEN_GEMINI_STRENGTH # unknown -> higher floor + assert resolve_strength(None, "google", "qwen") > resolve_strength(None, "google", "controlnet") + # An explicit strength still wins on qwen. + assert resolve_strength(0.12, "google", "qwen") == 0.12 def test_ladder_is_the_certified_controlnet_floors(self): # The unified ladder == the oracle-certified controlnet floors. Lowered on the