fix(qwen): native-geometry img2img + pipeline-aware strength; record dropped auto/mixed/Z-Image leads

- watermark_remover: _build_qwen_kwargs now passes explicit height/width (via
  _qwen_target_size, floored to /16). Without it QwenImageImg2ImgPipeline defaults to
  1024x1024 and silently squishes non-square inputs, distorting the scene and garbling text.
- watermark_profiles: resolve_strength gains a `pipeline` arg + a Qwen strength ladder
  (_QWEN_VENDOR_STRENGTH, Gemini 0.25), so `--pipeline qwen` gets its certified floor
  automatically; retires the manual "pass --strength 0.25 for Gemini on qwen" workaround.
- fidelity_metrics: replace per-face nearest matching (collided on multi-face images when a
  variant dropped a face, corrupting the identity metric) with a collision-free one-to-one
  assignment (assign_faces_one_to_one). lapvar/LPIPS were always bbox-anchored and immune.
  Regression-guarded by tests/test_fidelity_matching.py.
- docs: record the measured outcomes of the qwen-improvement arc. The Qwen ControlNet
  face-fix is CLOSED (no permissive Qwen detail/tile ControlNet exists; canny carries edges,
  not skin grain). The `--pipeline auto` router + faces+text mixed dual-pass were prototyped
  and DROPPED (controlnet wins faces AND display text: abba CER 0.114 vs qwen 0.379).
  Z-Image-Turbo was tried and dropped (same regeneration limits). qwen stays a manual opt-in;
  controlnet is the default for everything.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-20 21:52:56 -07:00
parent 8f64869bfc
commit d5dd24140c
11 changed files with 307 additions and 36 deletions
+3
View File
@@ -50,3 +50,6 @@ data/samsung_capture/captures/samsung_content_*
# (GFPGAN wrote RetinaFace/parsing weights to a CWD ./gfpgan/weights/ working
# dir on first use). Runtime artifact, never committed.
gfpgan/
# Qwen ControlNet experiment outputs (throwaway eval; never the committed corpus)
scripts/_qwen_exp_out/
+2 -2
View File
@@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau
## How to run
- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped**`all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
- `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
- `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
@@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal
- `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
- `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
- `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
- `noai/watermark_remover.py``WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15, so pass explicit `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired). Fidelity measured by `scripts/fidelity_metrics.py` (OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball), compared ONLY at each pipeline's oracle-confirmed scrub floor (where SynthID is removed in BOTH — equal-strength is invalid where it leaves one un-scrubbed): Qwen wins TEXT (incl. CJK), controlnet wins FACES (Qwen smooths faces more) — Qwen is the text-preserving remover, not a universal fidelity win. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
- `noai/watermark_remover.py``WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15). `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically (the old manual `--strength 0.25` workaround is retired). `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via `_qwen_target_size`) — without it the pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (fixed 2026-06-20). **`qwen` is a MANUAL opt-in only — there is NO auto-router.** Measured (`scripts/fidelity_metrics.py`, OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball): qwen beats controlnet on ONE niche only — **clean body text on a plain background, no faces** (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES (it always has) AND **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes, qwen re-renders and garbles them). So a content `--pipeline auto` router and a faces+text **mixed dual-pass** were prototyped and **DROPPED** (2026-06-20): on the canonical faces+text case controlnet wins every metric incl. text, so mixed loses; and "text→qwen" can't be auto-decided (it is body-vs-display text that matters, undetectable cheaply). qwen stays for callers who KNOW their content is clean-text-heavy and face-free. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
- `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. Also home to `feather_region_composite(base, regenerated, box, *, feather)` — the pure region-targeted compositor for **AI-enhanced composites** (`ai_source_kind == "enhanced"`): blends the regenerated AI box back over the original with a feathered seam, leaving the real photo OUTSIDE the box pixel-exact. It backs `WatermarkRemover.remove_watermark(region=...)` (regenerate ONLY the AI region, not the whole frame); the no-model lossless region path stays `region_eraser.erase`. New tile/region-blend tuning goes in these pure helpers; do not inline blend math into the runner.
- `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
- `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
+2 -2
View File
@@ -144,8 +144,8 @@ The scrub still comes from the img2img `strength` (same lever as SDXL); the call
- **Text:** Qwen wins on substantial Latin/mixed-script text -- OCR CER, controlnet vs Qwen: openai_1 (EN+RU+ZH, both 0.10) 0.385 vs **0.241**, openai_2 (EN, both 0.10) 0.341 vs **0.290**. On a SHORT CJK sign (gemini_1, cnet 0.15 / Qwen 0.25) it is a TIE (0.037 vs 0.037 -- both near-perfect; the earlier Qwen 0.000 was at the higher 0.30, not the certified floor).
- **Faces:** controlnet wins -- gemini_3, 18 faces (cnet 0.15 / Qwen 0.25): ArcFace identity 0.546 vs 0.382, Laplacian-variance retention 0.62 vs 0.40, face LPIPS 0.09 vs 0.17 (Qwen smooths faces MORE; the gap narrows vs Qwen 0.30 but controlnet still wins clearly).
**Conclusion: Qwen is the better TEXT-preserving remover (substantial Latin/mixed text), NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better, so the path is a content-routed lane (text→qwen, faces→controlnet), not a blanket migration.** Caveat: `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15) UNDER-scrubs Gemini on `qwen` (floor 0.25) — pass `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired. Flat-graphic content was not in the sample.
**Conclusion: Qwen wins TEXT only for clean body text on a plain background with NO faces; controlnet wins faces AND display/decorative text in a scene. So `qwen` is a MANUAL `--pipeline qwen` opt-in, not a routed lane.** A content `--pipeline auto` router + a faces+text mixed dual-pass were prototyped and DROPPED (2026-06-20): on the canonical faces+text case (the abba poster, faces + display text) controlnet won EVERY metric incl. text (CER 0.114 vs qwen 0.379), so grafting qwen text only hurts; and "text→qwen" is undecidable cheaply (body-vs-display text is what matters). Caveat: `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`, Gemini 0.25), so `--pipeline qwen` gets the 0.25 Gemini floor automatically — the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` now passes an explicit height/width (qwen squished non-square inputs to 1024² without it). Flat-graphic content was not in the sample.
**Improving Qwen (ship vs improve):** the cited research on fixing the face-smoothing while keeping the text win (Qwen-Image ControlNet for structure conditioning, Qwen-Image-Edit, Z-Image-Turbo as a cheaper text-preserving substitute, non-regenerative detail restoration) lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable now as an opt-in text lane; the strongest improvement lead is adding a Qwen-Image ControlNet, but no improvement has measured face-fidelity at our floors yet (validate with `scripts/fidelity_metrics.py` first).
**Improving Qwen (ship vs improve):** the cited research lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable as an opt-in text lane. **The "add a Qwen-Image ControlNet to fix face smoothing" lead was built, measured, and CLOSED (2026-06-20):** a DiffSynth-Studio Qwen + Apache-2.0 blockwise-canny ControlNet at the Gemini floor 0.25 did NOT restore face skin texture (face Laplacian-variance retention flat 0.40 -> 0.40, 13/16 faces within +-0.02; the SDXL+canny target 0.62 was not approached), because canny carries edges not skin grain and Qwen's higher Gemini floor (0.25 vs SDXL+canny 0.15) forces more smoothing -- and a deep-research sweep confirmed NO permissively-licensed Qwen tile/detail/realism/skin ControlNet exists anywhere (every Qwen conditioning is geometry). So **faces stay on SDXL+controlnet; Qwen is the text lane, not a face fix.** The strongest remaining lead is **Z-Image-Turbo** (6B, Apache-2.0, `ZImageImg2ImgPipeline`, scrub mechanism preserved) -- its own SynthID floor and face/text fidelity are UNMEASURED; that is the next experiment. Non-regenerative high-frequency detail re-injection is NOT safe by assumption (the "clean-output high frequencies do not carry the watermark" claim was refuted) -- it must be oracle-gated. Always validate any improvement at the certified floors with `scripts/fidelity_metrics.py` first.
**Seed as a quality lever (measured, openai_1 at 0.10, seeds 0-4):** the seed barely moves whole-image fidelity (img LPIPS 0.062-0.065, SSIM 0.855-0.857, PSNR 28.5-28.7 — flat) but does shift TEXT legibility (OCR CER 0.241-0.290, ~17% spread) -- the seed changes WHICH details get regenerated, not the overall level. So a per-image best-of-N-seed selection is a WEAK, text-only lever (pick the lowest-CER seed that still scrubs; fidelity selection needs no oracle). Not worth the N× cost for general use -- pin one decent seed in prod; reserve best-of-N for text-heavy premium cases.
+10 -1
View File
@@ -185,7 +185,7 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo
**`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights).
**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength 0.25` for Gemini content on `qwen` until a Qwen-specific ladder is wired into `resolve_strength`.** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed — see the metric nuance above: Qwen wins substantial text, controlnet wins faces.
**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15); `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically -- the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via the pure `_qwen_target_size`); WITHOUT it the img2img pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (the abba 2816x1536 case came back 1024x1024, distorting the scene and garbling text — fixed 2026-06-20, tested in `TestQwenKwargs`).** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed. **`qwen` is a MANUAL opt-in only — there is NO auto-router (one was prototyped and DROPPED, see below).** It wins ONE niche: clean body text on a plain background, NO faces (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES and **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes; qwen re-renders and garbles them). **`--pipeline auto` + a faces+text mixed dual-pass were built and DROPPED (2026-06-20):** on the canonical faces+text case (abba) controlnet wins EVERY metric incl. text, so grafting qwen text would only hurt; and "text→qwen" is undecidable cheaply (it is body-vs-display text that matters). The router/detector/mixed modules were removed; the geometry fix + the Qwen strength ladder were kept (they make the manual `--pipeline qwen` correct). **Do NOT retry "add a Qwen ControlNet to close the face gap" — it was built, measured, and CLOSED 2026-06-20:** a DiffSynth blockwise-canny Qwen ControlNet did not restore face skin texture (lapvar flat 0.40, canny carries edges not skin grain) and no permissively-licensed Qwen tile/detail/skin ControlNet exists anywhere (all conditioning is geometry). Faces stay on controlnet; the next improvement lead is Z-Image-Turbo (Apache-2.0, unmeasured floor). Full record + the deep-research sweep in `docs/qwen-improvement-research.md`.
**`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`).
@@ -213,6 +213,15 @@ History: `auto_config.plan()` was a content-adaptive planner that detected faces
**`--auto` is now a DEPRECATED no-op** (`cli._resolve_auto_polish`): controlnet is already the default pipeline AND the adaptive polish is ON by default, so `--auto` has nothing left to do — it only prints a deprecation warning and passes `adaptive_polish` through unchanged (an explicit `--no-adaptive-polish` still wins). (Originally it re-enabled the polish; once the polish default flipped to ON the same day, the parameter-source branch became dead and was dropped.) The **adaptive polish itself lives on** in `humanizer.adaptive_polish` (CLI `--adaptive-polish/--no-adaptive-polish`, **ON by default since 2026-06-09** — it self-gates to a no-op where there is no detail deficit, so default-on is safe; uses the full-res original as the detail reference) — see the `humanizer` test note. `batch` resolves the polish once before the loop (one warning) and caches the invisible engine per pipeline (`ctx.obj["_inv_engines"]`).
## Content `--pipeline auto` router + faces+text mixed dual-pass — PROTOTYPED and DROPPED (2026-06-20)
A `--pipeline auto` content router (`pipeline_router.py` + `content_detect.py`: Haar faces + MSER text → route text→qwen / faces→controlnet / both→mixed) and a faces+text **mixed dual-pass** (`mixed_pipeline.py`: scrub the whole frame on BOTH pipelines, then graft the qwen text regions onto the controlnet base via `tiling.feather_region_composite`) were built, run on Modal (the abba poster: faces + display text), measured, and **removed**. Why it failed:
- On the canonical faces+text image **controlnet wins EVERY metric, including text** (CER 0.114 vs qwen 0.379; ID 0.64 vs 0.36; lapvar 0.71 vs 0.59) — canny holds the existing letter shapes, qwen re-renders display/decorative text and garbles it. So grafting qwen text onto the controlnet base only HURTS.
- qwen beats controlnet on text ONLY for clean body text on a plain background with no faces (openai_1/2) — a niche where there are no faces to route around anyway, so `--pipeline qwen` alone covers it. The faces+clean-body-text intersection is near-empty.
- "text→qwen" is not cheaply decidable: it is body-vs-display text that matters, which face/text detectors can't tell apart. MSER also over-fired (47% of the busy poster, incl. faces).
KEPT from that work (independently valid for the manual `--pipeline qwen`): the qwen **geometry fix** (`_qwen_target_size` + `_build_qwen_kwargs` height/width — qwen squished non-square inputs to 1024² without it) and the **pipeline-aware `resolve_strength`** Qwen ladder (Gemini 0.25). Also kept: the `fidelity_metrics.py` one-to-one face matcher. The throwaway Modal eval scripts were removed after the run (findings recorded here and in `docs/qwen-improvement-research.md`).
## `upscaler.py`
`upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps).
+67
View File
@@ -29,6 +29,73 @@ the 20B cost. None of the improvements has measured face-fidelity numbers at our
scrub floors yet, so each must be validated with `scripts/fidelity_metrics.py` plus
the oracle before shipping.
## Follow-up: ControlNet experiment + deeper research (2026-06-20)
The verdict's strongest lead -- adding a Qwen-Image ControlNet -- was **built, measured, and
CLOSED**.
**Experiment** (Modal A100-80GB; DiffSynth-Studio `QwenImagePipeline` + the Apache-2.0
`DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny` -- the only framework exposing
Qwen-Image + canny ControlNet + img2img `denoising_strength` in ONE call; diffusers ships no
`QwenImageControlNetImg2ImgPipeline`, its three Qwen ControlNet pipelines are txt2img only).
Measured on `gemini_3` (18 faces) at the Gemini scrub floor 0.25 vs base-Qwen 0.25 with
`scripts/fidelity_metrics.py`:
- **The actual failure mode (face skin texture) was NOT restored:** Laplacian-variance
retention stayed flat (base 0.40 -> qwen+canny 0.40; per-face 13/16 within +-0.02 after a
one-to-one face match, sd 0.016 -- not an averaging artifact). The SDXL+canny target 0.62
was not approached.
- Identity rose modestly and broadly (ArcFace 0.346 -> 0.415, 12/16 faces improved) but the
absolute stays ~0.42 ("a different person, slightly closer").
- Mechanism (verified, not inferred): canny conditioning was applied fully (scale 1.0, full
denoise schedule); the canny edge map is clean facial geometry with BLANK skin (4.83% edge
density) -- canny carries edges, not skin grain. Root cause: Qwen's Gemini floor (0.25) is
higher than SDXL+canny's (0.15), forcing more denoising -> more smoothing; structure
conditioning cannot compensate for that.
**Deeper research** (deep-research harness, 103 agents, 3-vote adversarial):
- **[high, unanimous] No permissively-licensed Qwen-Image tile / detail / realism / skin
ControlNet exists anywhere** -- DiffSynth first-party is Canny/Depth/Inpaint only, InstantX
Union is canny/soft-edge/depth/pose, the official QwenLM repo ships none. Every Qwen
conditioning is GEOMETRY, the same class as the tested canny. **The "add a Qwen ControlNet to
fix faces" lead is closed for good.**
- **[high, unanimous] Z-Image / Z-Image-Turbo (6B, Apache-2.0 on code AND weights, ~1/3 of
Qwen 20B)** ships a documented `ZImageImg2ImgPipeline` with standard strength denoising, so
it preserves the scrub mechanism. Its own SynthID scrub floor and face/text fidelity are
UNMEASURED -- this is the strongest concrete NEXT experiment.
- **[medium] Lowering Qwen's scrub floor has no off-the-shelf SynthID answer:** the "partial
img2img ~0.3 breaks robust watermarks" literature tests open schemes
(StegaStamp/TrustMark/VINE), NEVER SynthID (proprietary decoder) -- analogy, not proof. No
minimal-strength SynthID attack under a named permissive license was found.
- **REFUTED [0-3]:** "re-injecting high-frequency detail from a clean diffusion output would
not carry the watermark back." So non-regenerative detail transfer is NOT safe by
assumption -- the transferred high-frequency band must be gated against the SynthID oracle.
**Net for the pipeline:** **faces stay on SDXL+controlnet**; there is no Qwen face-fix.
The live frontier is Z-Image-Turbo (next experiment) and oracle-gated non-regenerative detail
re-injection.
**Follow-up (2026-06-20) — the content-routed lane / mixed dual-pass was tested and DROPPED.**
A `--pipeline auto` router (Haar+MSER → text→qwen / faces→controlnet / both→mixed) and a
faces+text mixed dual-pass (scrub the whole frame on both, graft qwen text regions onto the
controlnet base) were built and run on Modal (the abba poster: faces + display text). On that
canonical faces+text case **controlnet won EVERY metric, including text** (CER 0.114 vs qwen
0.379; ID 0.64 vs 0.36) — canny holds existing letter shapes, qwen re-renders display text and
garbles it, so grafting qwen text only hurts. Qwen beats controlnet on text ONLY for clean body
text on a plain background with no faces (openai_1/2), a niche `--pipeline qwen` alone covers;
the faces+clean-body-text intersection is near-empty, and "text→qwen" is undecidable cheaply
(body-vs-display text is what matters). So the router + mixed modules were removed and **`qwen`
is a manual `--pipeline qwen` opt-in only.** KEPT (independently valid): the qwen geometry fix
(it squished non-square inputs to 1024²), the pipeline-aware `resolve_strength` Qwen ladder, and
the `fidelity_metrics.py` one-to-one face matcher below.
**Tooling fix surfaced by this run:** `scripts/fidelity_metrics.py` face matching was changed
from per-face nearest-center to a collision-free one-to-one assignment
(`assign_faces_one_to_one`, gated by face size), after the 18-face `gemini_3` exposed
collisions (the regenerated variants detected 17 faces, so two originals mapped to the same
variant face, corrupting the identity metric). lapvar/LPIPS were always anchored to the
original bbox and stayed collision-immune. Regression-guarded by
`tests/test_fidelity_matching.py`.
## Findings
1. **[high, 3-0] A permissively-licensed Qwen-Image ControlNet exists today and is
+53 -15
View File
@@ -186,16 +186,50 @@ def _lap_var(bgr: np.ndarray) -> float:
return float(cv2.Laplacian(gray, cv2.CV_64F).var())
def _match_face(orig_face: Any, variant_faces: list[Any]) -> Any:
"""Nearest variant face to an original face by bbox-center distance (geometry kept)."""
ox, oy = (orig_face.bbox[0] + orig_face.bbox[2]) / 2, (orig_face.bbox[1] + orig_face.bbox[3]) / 2
best, best_d = None, 1e18
for vf in variant_faces:
vx, vy = (vf.bbox[0] + vf.bbox[2]) / 2, (vf.bbox[1] + vf.bbox[3]) / 2
d = (ox - vx) ** 2 + (oy - vy) ** 2
if d < best_d:
best, best_d = vf, d
return best
def _bbox_center(bbox: Any) -> tuple[float, float]:
return (bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2
def _bbox_diag(bbox: Any) -> float:
return float(((bbox[2] - bbox[0]) ** 2 + (bbox[3] - bbox[1]) ** 2) ** 0.5)
def assign_faces_one_to_one(
ref_centers: list[tuple[float, float]],
var_centers: list[tuple[float, float]],
ref_diags: list[float],
max_frac: float = 0.6,
) -> dict[int, int]:
"""One-to-one nearest-center face assignment (pure; unit-tested without insightface).
Per-face nearest matching collides on multi-face images -- two original faces can both
pick the SAME variant face (e.g. when regeneration drops a face, so the variant has fewer
detections), corrupting the identity metric (the lapvar/LPIPS metrics are immune: they are
anchored to the ORIGINAL bbox on both images). This greedy-by-distance assignment is
collision-free: it walks candidate pairs nearest-first and never reuses a ref or a variant
face. Faces are spatially well-separated, so greedy equals the optimal (Hungarian) result
here without the scipy dependency. A pair is dropped when the center distance exceeds
``max_frac`` of the original face diagonal (no plausible match -- the face was lost).
Returns a dict mapping ref-face index -> variant-face index for matched faces only.
"""
pairs: list[tuple[float, int, int]] = []
for i, (rx, ry) in enumerate(ref_centers):
for j, (vx, vy) in enumerate(var_centers):
pairs.append((((rx - vx) ** 2 + (ry - vy) ** 2) ** 0.5, i, j))
pairs.sort()
used_ref: set[int] = set()
used_var: set[int] = set()
matched: dict[int, int] = {}
for dist, i, j in pairs:
if i in used_ref or j in used_var:
continue
if dist > max_frac * ref_diags[i]:
continue
matched[i] = j
used_ref.add(i)
used_var.add(j)
return matched
def _cosine(a: np.ndarray, b: np.ndarray) -> float:
@@ -325,15 +359,19 @@ def compare(original: str, variants: tuple[str, ...], ocr_langs: str, ground_tru
app.prepare(ctx_id=-1, det_size=(640, 640))
ref_faces = app.get(ref)
if ref_faces:
ref_centers = [_bbox_center(of.bbox) for of in ref_faces]
ref_diags = [_bbox_diag(of.bbox) for of in ref_faces]
for label, img in parsed:
vfaces = app.get(img)
st = face_stats[label]
for of in ref_faces:
vf = _match_face(of, vfaces)
if vf is None:
continue
# One-to-one assignment for identity (collision-free); lapvar/LPIPS stay
# anchored to the original bbox below, so they need no match.
matched = assign_faces_one_to_one(ref_centers, [_bbox_center(vf.bbox) for vf in vfaces], ref_diags)
for oi, of in enumerate(ref_faces):
st.n_faces += 1
st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding))
vf = vfaces[matched[oi]] if oi in matched else None
if vf is not None:
st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding))
oc, vc = _crop(ref, of.bbox), _crop(img, of.bbox)
if oc.size == 0 or vc.size == 0:
continue
+2 -2
View File
@@ -762,7 +762,7 @@ def cmd_invisible(
vendor = vendor_for_strength(source)
console.print(f" Input: {source.name}")
console.print(f" Pipeline: {pipeline}")
console.print(f" Strength: {resolve_strength(strength, vendor)} Steps: {steps}")
console.print(f" Strength: {resolve_strength(strength, vendor, pipeline)} Steps: {steps}")
t0 = time.monotonic()
result_path = engine.remove_watermark(
@@ -1075,7 +1075,7 @@ def cmd_all(
# already lost its C2PA to the visible-removal pass, so reading it would
# always resolve to the unknown-vendor default.
vendor = vendor_for_strength(source)
console.print(f" Strength: {resolve_strength(strength, vendor)} Steps: {steps}")
console.print(f" Strength: {resolve_strength(strength, vendor, pipeline)} Steps: {steps}")
inv_engine.remove_watermark(
image_path=tmp_path,
output_path=tmp_path,
@@ -18,9 +18,10 @@ DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
# oracle floors (2026-06-20): OpenAI **0.10** (seed-robust -- clean on seeds 0-4) and
# Google/Gemini **0.25** (seed 0 verified on 2 images; pin a seed in prod, the Gemini
# oracle rate-limits volume seed-repeat). The Gemini floor (0.25) is HIGHER than the
# certified controlnet Gemini floor (0.15), and ``resolve_strength`` is shared/
# pipeline-independent, so pass an explicit ``--strength 0.25`` for Gemini content on
# this pipeline until a Qwen-specific ladder is wired into ``resolve_strength``.
# certified controlnet Gemini floor (0.15); ``resolve_strength(..., pipeline="qwen")``
# now carries this via ``_QWEN_VENDOR_STRENGTH`` (below), so ``--pipeline qwen`` gets the
# right floor automatically -- the old manual "pass --strength 0.25 for Gemini on qwen"
# workaround is retired.
# (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there
# is no QWEN_PROFILE constant -- only the model id is referenced from code.)
QWEN_MODEL_ID = "Qwen/Qwen-Image"
@@ -90,6 +91,18 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH
# Detected-vendor -> default strength. Vendor strings come from `vendor_for_strength`.
_VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH}
# Qwen has its OWN certified floors (Modal A100-80GB, 2026-06-20), DIFFERENT from the
# SDXL ladder above: OpenAI 0.10 (seed-robust), Gemini 0.25 (HIGHER than controlnet's
# 0.15 -- the 20B MMDiT perturbs less per denoising step, so it needs more strength to
# clear Gemini SynthID). Unknown vendor tracks the higher (Gemini) value, safe-by-default.
# `resolve_strength(..., pipeline="qwen")` uses this table so `--pipeline qwen` carries the
# right floor automatically -- retiring the old manual "pass --strength 0.25 for Gemini on
# qwen" workaround.
QWEN_OPENAI_STRENGTH = 0.10
QWEN_GEMINI_STRENGTH = 0.25
QWEN_UNKNOWN_STRENGTH = 0.25
_QWEN_VENDOR_STRENGTH = {"openai": QWEN_OPENAI_STRENGTH, "google": QWEN_GEMINI_STRENGTH}
def strength_default_help() -> str:
"""One-line description of the vendor-adaptive default, derived from the constants.
@@ -103,20 +116,24 @@ def strength_default_help() -> str:
)
def resolve_strength(strength: float | None, vendor: str | None = None) -> float:
def resolve_strength(strength: float | None, vendor: str | None = None, pipeline: str | None = None) -> float:
"""Resolve the denoising strength, applying the vendor default when unset.
``None`` means "the user did not pass ``--strength``", which resolves
**vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from
``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module
comment for why one ladder is correct). An explicit value always wins (including
``vendor_for_strength``) selects the per-vendor floor. The ``sdxl`` and ``controlnet``
pipelines share ONE ladder (``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
``UNKNOWN_STRENGTH`` -- see the module comment for why); ``qwen`` has its OWN higher
ladder (``_QWEN_VENDOR_STRENGTH``, Gemini 0.25 vs controlnet 0.15), selected when
``pipeline`` normalizes to ``"qwen"``. An explicit value always wins (including
``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display)
and the engine (for execution) so the two never disagree -- both must pass the SAME
``vendor``.
``vendor`` and ``pipeline``.
"""
if strength is not None:
return strength
if pipeline is not None and normalize_profile(pipeline) == "qwen":
return _QWEN_VENDOR_STRENGTH.get(vendor or "", QWEN_UNKNOWN_STRENGTH)
return _VENDOR_STRENGTH.get(vendor or "", UNKNOWN_STRENGTH)
@@ -322,6 +322,15 @@ _QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original"
_QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
def _qwen_target_size(width: int, height: int) -> tuple[int, int]:
"""Floor (width, height) to a multiple of 16 for Qwen's VAE/patchifier (>= 16).
Pure; unit-tested. Without explicit dims the img2img pipeline defaults to a 1024x1024
SQUARE and silently distorts any non-square input.
"""
return max(16, (width // 16) * 16), max(16, (height // 16) * 16)
def _build_qwen_kwargs(
image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any
) -> dict[str, Any]:
@@ -329,7 +338,12 @@ def _build_qwen_kwargs(
Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an
explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``.
Passes an explicit ``height``/``width`` derived from the input (floored to /16): the
pipeline otherwise defaults to a 1024x1024 SQUARE, squishing any non-square input
(the abba mixed-seam test: a 2816x1536 poster came back 1024x1024, distorting the
scene and garbling text). So qwen regenerates at the input's own geometry.
"""
qw, qh = _qwen_target_size(image.width, image.height)
return {
"prompt": _QWEN_PROMPT,
"negative_prompt": _QWEN_NEGATIVE,
@@ -338,6 +352,8 @@ def _build_qwen_kwargs(
"num_inference_steps": num_inference_steps,
"true_cfg_scale": true_cfg_scale,
"generator": generator,
"height": qh,
"width": qw,
}
@@ -614,7 +630,7 @@ class WatermarkRemover:
if output_path is None:
output_path = image_path
strength = resolve_strength(strength, vendor)
strength = resolve_strength(strength, vendor, self.model_profile)
if not 0.0 <= strength <= 1.0:
raise ValueError(f"Strength must be between 0.0 and 1.0, got {strength}")
+76
View File
@@ -0,0 +1,76 @@
"""Regression test for the one-to-one face matcher in ``scripts/fidelity_metrics.py``.
The shipped per-face nearest matcher collided on multi-face images (two original faces
both picking the same variant face when regeneration dropped a face), which inflated/
corrupted the identity metric. ``assign_faces_one_to_one`` is the collision-free
replacement. The function is pure (centers + diagonals in, index map out), so it is
tested here without insightface / the heavy PEP723 env. Caught on the gemini_3 Qwen
ControlNet experiment, where the original had 18 faces but the regenerated variants had
17, producing two collisions under the old matcher.
"""
from __future__ import annotations
import importlib.util
import sys
from pathlib import Path
import pytest
_SCRIPTS = Path(__file__).resolve().parent.parent / "scripts"
def _load_assign():
# fidelity_metrics is a standalone PEP723 script, not an installed module; load it by
# path with scripts/ on sys.path so its `_plain_console` shim import resolves.
sys.path.insert(0, str(_SCRIPTS))
try:
spec = importlib.util.spec_from_file_location("fidelity_metrics", _SCRIPTS / "fidelity_metrics.py")
assert spec is not None
assert spec.loader is not None
mod = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = mod # @dataclass introspection needs the module registered
spec.loader.exec_module(mod)
except ImportError as exc: # cv2/click absent in a bare env -> skip, not fail
pytest.skip(f"fidelity_metrics import deps missing: {exc}")
finally:
sys.path.remove(str(_SCRIPTS))
return mod.assign_faces_one_to_one
def test_distinct_faces_match_nearest() -> None:
assign = _load_assign()
ref = [(0.0, 0.0), (100.0, 100.0)]
var = [(2.0, 1.0), (98.0, 102.0)]
diags = [50.0, 50.0]
assert assign(ref, var, diags) == {0: 0, 1: 1}
def test_no_collision_when_variant_drops_a_face() -> None:
# Two original faces near the SAME single variant face: the old nearest matcher mapped
# BOTH to index 0; one-to-one must give the nearer ref the match and drop the other.
assign = _load_assign()
ref = [(10.0, 10.0), (14.0, 10.0)] # both close to the lone variant
var = [(12.0, 10.0)]
diags = [50.0, 50.0]
matched = assign(ref, var, diags)
assert sorted(matched.values()) == [0] # variant 0 used at most once
assert len(matched) == 1
def test_gate_drops_implausibly_far_match() -> None:
assign = _load_assign()
ref = [(0.0, 0.0)]
var = [(1000.0, 1000.0)] # far beyond 0.6 * diag
diags = [50.0]
assert assign(ref, var, diags) == {}
def test_assignment_is_one_to_one_over_many_faces() -> None:
assign = _load_assign()
ref = [(float(i * 100), 0.0) for i in range(18)]
var = [(float(i * 100) + 3.0, 0.0) for i in range(17)] # one fewer, as in the experiment
diags = [50.0] * 18
matched = assign(ref, var, diags)
assert len(matched) == 17
assert len(set(matched.values())) == 17 # every variant used at most once
+50 -5
View File
@@ -126,6 +126,14 @@ class TestModelProfiles:
assert normalize_profile("CONTROLNET") == "controlnet"
class _StubImage:
"""Minimal PIL.Image stand-in: just the ``width``/``height`` the pure helper reads."""
def __init__(self, width: int, height: int) -> None:
self.width = width
self.height = height
class TestQwenKwargs:
"""_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape.
@@ -137,18 +145,37 @@ class TestQwenKwargs:
from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
gen = object()
kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
img = _StubImage(2816, 1536)
kwargs = _build_qwen_kwargs(img, strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
# Qwen uses true_cfg_scale, NOT SDXL's guidance_scale.
assert kwargs["true_cfg_scale"] == 4.0
assert "guidance_scale" not in kwargs
# The scrub still comes from strength; image + generator pass through.
assert kwargs["strength"] == 0.3
assert kwargs["image"] == "IMG"
assert kwargs["image"] is img
assert kwargs["generator"] is gen
# Faithful-regeneration prompt + an explicit negative prompt.
assert kwargs["prompt"]
assert kwargs["negative_prompt"]
def test_passes_explicit_aspect_preserving_size(self):
# Without height/width the pipeline defaults to 1024x1024 and squishes non-square
# input (the abba mixed-seam regression). Both already multiples of 16 -> unchanged.
from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
kwargs = _build_qwen_kwargs(
_StubImage(2816, 1536), strength=0.25, num_inference_steps=40, true_cfg_scale=4.0, generator=None
)
assert kwargs["width"] == 2816
assert kwargs["height"] == 1536
def test_qwen_target_size_floors_to_multiple_of_16(self):
from remove_ai_watermarks.noai.watermark_remover import _qwen_target_size
assert _qwen_target_size(2816, 1536) == (2816, 1536) # already /16
assert _qwen_target_size(1122, 1402) == (1120, 1392) # floored
assert _qwen_target_size(10, 10) == (16, 16) # min clamp, never 0
def test_qwen_model_id_is_qwen_image(self):
from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID
@@ -159,15 +186,33 @@ class TestResolveStrength:
"""resolve_strength applies the vendor default only when strength is unset."""
def test_none_is_vendor_adaptive(self):
# No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder
# applies to both pipelines (the certified controlnet floors), so there is no
# pipeline argument.
# No vendor -> unknown default; OpenAI lower, Google == unknown. The sdxl/controlnet
# pipelines share this ladder (the certified controlnet floors); qwen has its own
# (see test_qwen_pipeline_uses_its_own_higher_ladder).
assert resolve_strength(None) == UNKNOWN_STRENGTH
assert resolve_strength(None, "openai") == OPENAI_STRENGTH
assert resolve_strength(None, "google") == GEMINI_STRENGTH
assert resolve_strength(None, None) == UNKNOWN_STRENGTH
# An unrecognized vendor string falls through to the unknown default.
assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH
# sdxl/controlnet pipelines (and the "default" alias) use the same shared ladder.
assert resolve_strength(None, "google", "controlnet") == GEMINI_STRENGTH
assert resolve_strength(None, "google", "sdxl") == GEMINI_STRENGTH
def test_qwen_pipeline_uses_its_own_higher_ladder(self):
# Qwen's certified Gemini floor (0.25) is HIGHER than controlnet's (0.15); OpenAI
# matches (0.10). Unknown vendor on qwen tracks the higher Gemini value. This retires
# the old manual "pass --strength 0.25 for Gemini on qwen" workaround.
from remove_ai_watermarks.noai.watermark_profiles import QWEN_GEMINI_STRENGTH, QWEN_OPENAI_STRENGTH
assert QWEN_GEMINI_STRENGTH == 0.25
assert QWEN_OPENAI_STRENGTH == 0.10
assert resolve_strength(None, "google", "qwen") == QWEN_GEMINI_STRENGTH
assert resolve_strength(None, "openai", "qwen") == QWEN_OPENAI_STRENGTH
assert resolve_strength(None, None, "qwen") == QWEN_GEMINI_STRENGTH # unknown -> higher floor
assert resolve_strength(None, "google", "qwen") > resolve_strength(None, "google", "controlnet")
# An explicit strength still wins on qwen.
assert resolve_strength(0.12, "google", "qwen") == 0.12
def test_ladder_is_the_certified_controlnet_floors(self):
# The unified ladder == the oracle-certified controlnet floors. Lowered on the