fix(qwen): native-geometry img2img + pipeline-aware strength; record dropped auto/mixed/Z-Image leads

- watermark_remover: _build_qwen_kwargs now passes explicit height/width (via _qwen_target_size, floored to /16). Without it QwenImageImg2ImgPipeline defaults to 1024x1024 and silently squishes non-square inputs, distorting the scene and garbling text. - watermark_profiles: resolve_strength gains a `pipeline` arg + a Qwen strength ladder (_QWEN_VENDOR_STRENGTH, Gemini 0.25), so `--pipeline qwen` gets its certified floor automatically; retires the manual "pass --strength 0.25 for Gemini on qwen" workaround. - fidelity_metrics: replace per-face nearest matching (collided on multi-face images when a variant dropped a face, corrupting the identity metric) with a collision-free one-to-one assignment (assign_faces_one_to_one). lapvar/LPIPS were always bbox-anchored and immune. Regression-guarded by tests/test_fidelity_matching.py. - docs: record the measured outcomes of the qwen-improvement arc. The Qwen ControlNet face-fix is CLOSED (no permissive Qwen detail/tile ControlNet exists; canny carries edges, not skin grain). The `--pipeline auto` router + faces+text mixed dual-pass were prototyped and DROPPED (controlnet wins faces AND display text: abba CER 0.114 vs qwen 0.379). Z-Image-Turbo was tried and dropped (same regeneration limits). qwen stays a manual opt-in; controlnet is the default for everything. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-04 15:37:49 +02:00 · 2026-06-20 21:52:56 -07:00
parent 8f64869bfc
commit d5dd24140c
11 changed files with 307 additions and 36 deletions
@@ -50,3 +50,6 @@ data/samsung_capture/captures/samsung_content_*
 # (GFPGAN wrote RetinaFace/parsing weights to a CWD ./gfpgan/weights/ working
 # dir on first use). Runtime artifact, never committed.
 gfpgan/
+
+# Qwen ControlNet experiment outputs (throwaway eval; never the committed corpus)
+scripts/_qwen_exp_out/
@@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau
 ## How to run

 - `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
+- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
 - `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
 - `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
 - `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
@@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal
 - `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
 - `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
 - `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15, so pass explicit `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired). Fidelity measured by `scripts/fidelity_metrics.py` (OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball), compared ONLY at each pipeline's oracle-confirmed scrub floor (where SynthID is removed in BOTH — equal-strength is invalid where it leaves one un-scrubbed): Qwen wins TEXT (incl. CJK), controlnet wins FACES (Qwen smooths faces more) — Qwen is the text-preserving remover, not a universal fidelity win. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
+- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15). `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically (the old manual `--strength 0.25` workaround is retired). `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via `_qwen_target_size`) — without it the pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (fixed 2026-06-20). **`qwen` is a MANUAL opt-in only — there is NO auto-router.** Measured (`scripts/fidelity_metrics.py`, OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball): qwen beats controlnet on ONE niche only — **clean body text on a plain background, no faces** (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES (it always has) AND **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes, qwen re-renders and garbles them). So a content `--pipeline auto` router and a faces+text **mixed dual-pass** were prototyped and **DROPPED** (2026-06-20): on the canonical faces+text case controlnet wins every metric incl. text, so mixed loses; and "text→qwen" can't be auto-decided (it is body-vs-display text that matters, undetectable cheaply). qwen stays for callers who KNOW their content is clean-text-heavy and face-free. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
 - `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. Also home to `feather_region_composite(base, regenerated, box, *, feather)` — the pure region-targeted compositor for **AI-enhanced composites** (`ai_source_kind == "enhanced"`): blends the regenerated AI box back over the original with a feathered seam, leaving the real photo OUTSIDE the box pixel-exact. It backs `WatermarkRemover.remove_watermark(region=...)` (regenerate ONLY the AI region, not the whole frame); the no-model lossless region path stays `region_eraser.erase`. New tile/region-blend tuning goes in these pure helpers; do not inline blend math into the runner.
 - `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
 - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
@@ -144,8 +144,8 @@ The scrub still comes from the img2img `strength` (same lever as SDXL); the call
 - **Text:** Qwen wins on substantial Latin/mixed-script text -- OCR CER, controlnet vs Qwen: openai_1 (EN+RU+ZH, both 0.10) 0.385 vs **0.241**, openai_2 (EN, both 0.10) 0.341 vs **0.290**. On a SHORT CJK sign (gemini_1, cnet 0.15 / Qwen 0.25) it is a TIE (0.037 vs 0.037 -- both near-perfect; the earlier Qwen 0.000 was at the higher 0.30, not the certified floor).
 - **Faces:** controlnet wins -- gemini_3, 18 faces (cnet 0.15 / Qwen 0.25): ArcFace identity 0.546 vs 0.382, Laplacian-variance retention 0.62 vs 0.40, face LPIPS 0.09 vs 0.17 (Qwen smooths faces MORE; the gap narrows vs Qwen 0.30 but controlnet still wins clearly).

-**Conclusion: Qwen is the better TEXT-preserving remover (substantial Latin/mixed text), NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better, so the path is a content-routed lane (text→qwen, faces→controlnet), not a blanket migration.** Caveat: `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15) UNDER-scrubs Gemini on `qwen` (floor 0.25) — pass `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired. Flat-graphic content was not in the sample.
+**Conclusion: Qwen wins TEXT only for clean body text on a plain background with NO faces; controlnet wins faces AND display/decorative text in a scene. So `qwen` is a MANUAL `--pipeline qwen` opt-in, not a routed lane.** A content `--pipeline auto` router + a faces+text mixed dual-pass were prototyped and DROPPED (2026-06-20): on the canonical faces+text case (the abba poster, faces + display text) controlnet won EVERY metric incl. text (CER 0.114 vs qwen 0.379), so grafting qwen text only hurts; and "text→qwen" is undecidable cheaply (body-vs-display text is what matters). Caveat: `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`, Gemini 0.25), so `--pipeline qwen` gets the 0.25 Gemini floor automatically — the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` now passes an explicit height/width (qwen squished non-square inputs to 1024² without it). Flat-graphic content was not in the sample.

-**Improving Qwen (ship vs improve):** the cited research on fixing the face-smoothing while keeping the text win (Qwen-Image ControlNet for structure conditioning, Qwen-Image-Edit, Z-Image-Turbo as a cheaper text-preserving substitute, non-regenerative detail restoration) lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable now as an opt-in text lane; the strongest improvement lead is adding a Qwen-Image ControlNet, but no improvement has measured face-fidelity at our floors yet (validate with `scripts/fidelity_metrics.py` first).
+**Improving Qwen (ship vs improve):** the cited research lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable as an opt-in text lane. **The "add a Qwen-Image ControlNet to fix face smoothing" lead was built, measured, and CLOSED (2026-06-20):** a DiffSynth-Studio Qwen + Apache-2.0 blockwise-canny ControlNet at the Gemini floor 0.25 did NOT restore face skin texture (face Laplacian-variance retention flat 0.40 -> 0.40, 13/16 faces within +-0.02; the SDXL+canny target 0.62 was not approached), because canny carries edges not skin grain and Qwen's higher Gemini floor (0.25 vs SDXL+canny 0.15) forces more smoothing -- and a deep-research sweep confirmed NO permissively-licensed Qwen tile/detail/realism/skin ControlNet exists anywhere (every Qwen conditioning is geometry). So **faces stay on SDXL+controlnet; Qwen is the text lane, not a face fix.** The strongest remaining lead is **Z-Image-Turbo** (6B, Apache-2.0, `ZImageImg2ImgPipeline`, scrub mechanism preserved) -- its own SynthID floor and face/text fidelity are UNMEASURED; that is the next experiment. Non-regenerative high-frequency detail re-injection is NOT safe by assumption (the "clean-output high frequencies do not carry the watermark" claim was refuted) -- it must be oracle-gated. Always validate any improvement at the certified floors with `scripts/fidelity_metrics.py` first.

 **Seed as a quality lever (measured, openai_1 at 0.10, seeds 0-4):** the seed barely moves whole-image fidelity (img LPIPS 0.062-0.065, SSIM 0.855-0.857, PSNR 28.5-28.7 — flat) but does shift TEXT legibility (OCR CER 0.241-0.290, ~17% spread) -- the seed changes WHICH details get regenerated, not the overall level. So a per-image best-of-N-seed selection is a WEAK, text-only lever (pick the lowest-CER seed that still scrubs; fidelity selection needs no oracle). Not worth the N× cost for general use -- pin one decent seed in prod; reserve best-of-N for text-heavy premium cases.
@@ -185,7 +185,7 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo

 **`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights).

-**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength 0.25` for Gemini content on `qwen` until a Qwen-specific ladder is wired into `resolve_strength`.** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed — see the metric nuance above: Qwen wins substantial text, controlnet wins faces.
+**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15); `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically -- the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via the pure `_qwen_target_size`); WITHOUT it the img2img pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (the abba 2816x1536 case came back 1024x1024, distorting the scene and garbling text — fixed 2026-06-20, tested in `TestQwenKwargs`).** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed. **`qwen` is a MANUAL opt-in only — there is NO auto-router (one was prototyped and DROPPED, see below).** It wins ONE niche: clean body text on a plain background, NO faces (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES and **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes; qwen re-renders and garbles them). **`--pipeline auto` + a faces+text mixed dual-pass were built and DROPPED (2026-06-20):** on the canonical faces+text case (abba) controlnet wins EVERY metric incl. text, so grafting qwen text would only hurt; and "text→qwen" is undecidable cheaply (it is body-vs-display text that matters). The router/detector/mixed modules were removed; the geometry fix + the Qwen strength ladder were kept (they make the manual `--pipeline qwen` correct). **Do NOT retry "add a Qwen ControlNet to close the face gap" — it was built, measured, and CLOSED 2026-06-20:** a DiffSynth blockwise-canny Qwen ControlNet did not restore face skin texture (lapvar flat 0.40, canny carries edges not skin grain) and no permissively-licensed Qwen tile/detail/skin ControlNet exists anywhere (all conditioning is geometry). Faces stay on controlnet; the next improvement lead is Z-Image-Turbo (Apache-2.0, unmeasured floor). Full record + the deep-research sweep in `docs/qwen-improvement-research.md`.

 **`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`).

@@ -213,6 +213,15 @@ History: `auto_config.plan()` was a content-adaptive planner that detected faces

 **`--auto` is now a DEPRECATED no-op** (`cli._resolve_auto_polish`): controlnet is already the default pipeline AND the adaptive polish is ON by default, so `--auto` has nothing left to do — it only prints a deprecation warning and passes `adaptive_polish` through unchanged (an explicit `--no-adaptive-polish` still wins). (Originally it re-enabled the polish; once the polish default flipped to ON the same day, the parameter-source branch became dead and was dropped.) The **adaptive polish itself lives on** in `humanizer.adaptive_polish` (CLI `--adaptive-polish/--no-adaptive-polish`, **ON by default since 2026-06-09** — it self-gates to a no-op where there is no detail deficit, so default-on is safe; uses the full-res original as the detail reference) — see the `humanizer` test note. `batch` resolves the polish once before the loop (one warning) and caches the invisible engine per pipeline (`ctx.obj["_inv_engines"]`).

+## Content `--pipeline auto` router + faces+text mixed dual-pass — PROTOTYPED and DROPPED (2026-06-20)
+
+A `--pipeline auto` content router (`pipeline_router.py` + `content_detect.py`: Haar faces + MSER text → route text→qwen / faces→controlnet / both→mixed) and a faces+text **mixed dual-pass** (`mixed_pipeline.py`: scrub the whole frame on BOTH pipelines, then graft the qwen text regions onto the controlnet base via `tiling.feather_region_composite`) were built, run on Modal (the abba poster: faces + display text), measured, and **removed**. Why it failed:
+- On the canonical faces+text image **controlnet wins EVERY metric, including text** (CER 0.114 vs qwen 0.379; ID 0.64 vs 0.36; lapvar 0.71 vs 0.59) — canny holds the existing letter shapes, qwen re-renders display/decorative text and garbles it. So grafting qwen text onto the controlnet base only HURTS.
+- qwen beats controlnet on text ONLY for clean body text on a plain background with no faces (openai_1/2) — a niche where there are no faces to route around anyway, so `--pipeline qwen` alone covers it. The faces+clean-body-text intersection is near-empty.
+- "text→qwen" is not cheaply decidable: it is body-vs-display text that matters, which face/text detectors can't tell apart. MSER also over-fired (47% of the busy poster, incl. faces).
+
+KEPT from that work (independently valid for the manual `--pipeline qwen`): the qwen **geometry fix** (`_qwen_target_size` + `_build_qwen_kwargs` height/width — qwen squished non-square inputs to 1024² without it) and the **pipeline-aware `resolve_strength`** Qwen ladder (Gemini 0.25). Also kept: the `fidelity_metrics.py` one-to-one face matcher. The throwaway Modal eval scripts were removed after the run (findings recorded here and in `docs/qwen-improvement-research.md`).
+
 ## `upscaler.py`

 `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps).
@@ -29,6 +29,73 @@ the 20B cost. None of the improvements has measured face-fidelity numbers at our
 scrub floors yet, so each must be validated with `scripts/fidelity_metrics.py` plus
 the oracle before shipping.

+## Follow-up: ControlNet experiment + deeper research (2026-06-20)
+
+The verdict's strongest lead -- adding a Qwen-Image ControlNet -- was **built, measured, and
+CLOSED**.
+
+**Experiment** (Modal A100-80GB; DiffSynth-Studio `QwenImagePipeline` + the Apache-2.0
+`DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny` -- the only framework exposing
+Qwen-Image + canny ControlNet + img2img `denoising_strength` in ONE call; diffusers ships no
+`QwenImageControlNetImg2ImgPipeline`, its three Qwen ControlNet pipelines are txt2img only).
+Measured on `gemini_3` (18 faces) at the Gemini scrub floor 0.25 vs base-Qwen 0.25 with
+`scripts/fidelity_metrics.py`:
+- **The actual failure mode (face skin texture) was NOT restored:** Laplacian-variance
+  retention stayed flat (base 0.40 -> qwen+canny 0.40; per-face 13/16 within +-0.02 after a
+  one-to-one face match, sd 0.016 -- not an averaging artifact). The SDXL+canny target 0.62
+  was not approached.
+- Identity rose modestly and broadly (ArcFace 0.346 -> 0.415, 12/16 faces improved) but the
+  absolute stays ~0.42 ("a different person, slightly closer").
+- Mechanism (verified, not inferred): canny conditioning was applied fully (scale 1.0, full
+  denoise schedule); the canny edge map is clean facial geometry with BLANK skin (4.83% edge
+  density) -- canny carries edges, not skin grain. Root cause: Qwen's Gemini floor (0.25) is
+  higher than SDXL+canny's (0.15), forcing more denoising -> more smoothing; structure
+  conditioning cannot compensate for that.
+
+**Deeper research** (deep-research harness, 103 agents, 3-vote adversarial):
+- **[high, unanimous] No permissively-licensed Qwen-Image tile / detail / realism / skin
+  ControlNet exists anywhere** -- DiffSynth first-party is Canny/Depth/Inpaint only, InstantX
+  Union is canny/soft-edge/depth/pose, the official QwenLM repo ships none. Every Qwen
+  conditioning is GEOMETRY, the same class as the tested canny. **The "add a Qwen ControlNet to
+  fix faces" lead is closed for good.**
+- **[high, unanimous] Z-Image / Z-Image-Turbo (6B, Apache-2.0 on code AND weights, ~1/3 of
+  Qwen 20B)** ships a documented `ZImageImg2ImgPipeline` with standard strength denoising, so
+  it preserves the scrub mechanism. Its own SynthID scrub floor and face/text fidelity are
+  UNMEASURED -- this is the strongest concrete NEXT experiment.
+- **[medium] Lowering Qwen's scrub floor has no off-the-shelf SynthID answer:** the "partial
+  img2img ~0.3 breaks robust watermarks" literature tests open schemes
+  (StegaStamp/TrustMark/VINE), NEVER SynthID (proprietary decoder) -- analogy, not proof. No
+  minimal-strength SynthID attack under a named permissive license was found.
+- **REFUTED [0-3]:** "re-injecting high-frequency detail from a clean diffusion output would
+  not carry the watermark back." So non-regenerative detail transfer is NOT safe by
+  assumption -- the transferred high-frequency band must be gated against the SynthID oracle.
+
+**Net for the pipeline:** **faces stay on SDXL+controlnet**; there is no Qwen face-fix.
+The live frontier is Z-Image-Turbo (next experiment) and oracle-gated non-regenerative detail
+re-injection.
+
+**Follow-up (2026-06-20) — the content-routed lane / mixed dual-pass was tested and DROPPED.**
+A `--pipeline auto` router (Haar+MSER → text→qwen / faces→controlnet / both→mixed) and a
+faces+text mixed dual-pass (scrub the whole frame on both, graft qwen text regions onto the
+controlnet base) were built and run on Modal (the abba poster: faces + display text). On that
+canonical faces+text case **controlnet won EVERY metric, including text** (CER 0.114 vs qwen
+0.379; ID 0.64 vs 0.36) — canny holds existing letter shapes, qwen re-renders display text and
+garbles it, so grafting qwen text only hurts. Qwen beats controlnet on text ONLY for clean body
+text on a plain background with no faces (openai_1/2), a niche `--pipeline qwen` alone covers;
+the faces+clean-body-text intersection is near-empty, and "text→qwen" is undecidable cheaply
+(body-vs-display text is what matters). So the router + mixed modules were removed and **`qwen`
+is a manual `--pipeline qwen` opt-in only.** KEPT (independently valid): the qwen geometry fix
+(it squished non-square inputs to 1024²), the pipeline-aware `resolve_strength` Qwen ladder, and
+the `fidelity_metrics.py` one-to-one face matcher below.
+
+**Tooling fix surfaced by this run:** `scripts/fidelity_metrics.py` face matching was changed
+from per-face nearest-center to a collision-free one-to-one assignment
+(`assign_faces_one_to_one`, gated by face size), after the 18-face `gemini_3` exposed
+collisions (the regenerated variants detected 17 faces, so two originals mapped to the same
+variant face, corrupting the identity metric). lapvar/LPIPS were always anchored to the
+original bbox and stayed collision-immune. Regression-guarded by
+`tests/test_fidelity_matching.py`.
+
 ## Findings

 1. **[high, 3-0] A permissively-licensed Qwen-Image ControlNet exists today and is
@@ -186,16 +186,50 @@ def _lap_var(bgr: np.ndarray) -> float:
    return float(cv2.Laplacian(gray, cv2.CV_64F).var())


-def _match_face(orig_face: Any, variant_faces: list[Any]) -> Any:
-    """Nearest variant face to an original face by bbox-center distance (geometry kept)."""
-    ox, oy = (orig_face.bbox[0] + orig_face.bbox[2]) / 2, (orig_face.bbox[1] + orig_face.bbox[3]) / 2
-    best, best_d = None, 1e18
-    for vf in variant_faces:
-        vx, vy = (vf.bbox[0] + vf.bbox[2]) / 2, (vf.bbox[1] + vf.bbox[3]) / 2
-        d = (ox - vx) ** 2 + (oy - vy) ** 2
-        if d < best_d:
-            best, best_d = vf, d
-    return best
+def _bbox_center(bbox: Any) -> tuple[float, float]:
+    return (bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2
+
+
+def _bbox_diag(bbox: Any) -> float:
+    return float(((bbox[2] - bbox[0]) ** 2 + (bbox[3] - bbox[1]) ** 2) ** 0.5)
+
+
+def assign_faces_one_to_one(
+    ref_centers: list[tuple[float, float]],
+    var_centers: list[tuple[float, float]],
+    ref_diags: list[float],
+    max_frac: float = 0.6,
+) -> dict[int, int]:
+    """One-to-one nearest-center face assignment (pure; unit-tested without insightface).
+
+    Per-face nearest matching collides on multi-face images -- two original faces can both
+    pick the SAME variant face (e.g. when regeneration drops a face, so the variant has fewer
+    detections), corrupting the identity metric (the lapvar/LPIPS metrics are immune: they are
+    anchored to the ORIGINAL bbox on both images). This greedy-by-distance assignment is
+    collision-free: it walks candidate pairs nearest-first and never reuses a ref or a variant
+    face. Faces are spatially well-separated, so greedy equals the optimal (Hungarian) result
+    here without the scipy dependency. A pair is dropped when the center distance exceeds
+    ``max_frac`` of the original face diagonal (no plausible match -- the face was lost).
+
+    Returns a dict mapping ref-face index -> variant-face index for matched faces only.
+    """
+    pairs: list[tuple[float, int, int]] = []
+    for i, (rx, ry) in enumerate(ref_centers):
+        for j, (vx, vy) in enumerate(var_centers):
+            pairs.append((((rx - vx) ** 2 + (ry - vy) ** 2) ** 0.5, i, j))
+    pairs.sort()
+    used_ref: set[int] = set()
+    used_var: set[int] = set()
+    matched: dict[int, int] = {}
+    for dist, i, j in pairs:
+        if i in used_ref or j in used_var:
+            continue
+        if dist > max_frac * ref_diags[i]:
+            continue
+        matched[i] = j
+        used_ref.add(i)
+        used_var.add(j)
+    return matched


 def _cosine(a: np.ndarray, b: np.ndarray) -> float:
@@ -325,15 +359,19 @@ def compare(original: str, variants: tuple[str, ...], ocr_langs: str, ground_tru
        app.prepare(ctx_id=-1, det_size=(640, 640))
        ref_faces = app.get(ref)
        if ref_faces:
+            ref_centers = [_bbox_center(of.bbox) for of in ref_faces]
+            ref_diags = [_bbox_diag(of.bbox) for of in ref_faces]
            for label, img in parsed:
                vfaces = app.get(img)
                st = face_stats[label]
-                for of in ref_faces:
-                    vf = _match_face(of, vfaces)
-                    if vf is None:
-                        continue
+                # One-to-one assignment for identity (collision-free); lapvar/LPIPS stay
+                # anchored to the original bbox below, so they need no match.
+                matched = assign_faces_one_to_one(ref_centers, [_bbox_center(vf.bbox) for vf in vfaces], ref_diags)
+                for oi, of in enumerate(ref_faces):
                    st.n_faces += 1
-                    st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding))
+                    vf = vfaces[matched[oi]] if oi in matched else None
+                    if vf is not None:
+                        st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding))
                    oc, vc = _crop(ref, of.bbox), _crop(img, of.bbox)
                    if oc.size == 0 or vc.size == 0:
                        continue
@@ -762,7 +762,7 @@ def cmd_invisible(
    vendor = vendor_for_strength(source)
    console.print(f"  Input:    {source.name}")
    console.print(f"  Pipeline: {pipeline}")
-    console.print(f"  Strength: {resolve_strength(strength, vendor)}  Steps: {steps}")
+    console.print(f"  Strength: {resolve_strength(strength, vendor, pipeline)}  Steps: {steps}")

    t0 = time.monotonic()
    result_path = engine.remove_watermark(
@@ -1075,7 +1075,7 @@ def cmd_all(
            # already lost its C2PA to the visible-removal pass, so reading it would
            # always resolve to the unknown-vendor default.
            vendor = vendor_for_strength(source)
-            console.print(f"    Strength: {resolve_strength(strength, vendor)}  Steps: {steps}")
+            console.print(f"    Strength: {resolve_strength(strength, vendor, pipeline)}  Steps: {steps}")
            inv_engine.remove_watermark(
                image_path=tmp_path,
                output_path=tmp_path,
@@ -18,9 +18,10 @@ DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
 # oracle floors (2026-06-20): OpenAI **0.10** (seed-robust -- clean on seeds 0-4) and
 # Google/Gemini **0.25** (seed 0 verified on 2 images; pin a seed in prod, the Gemini
 # oracle rate-limits volume seed-repeat). The Gemini floor (0.25) is HIGHER than the
-# certified controlnet Gemini floor (0.15), and ``resolve_strength`` is shared/
-# pipeline-independent, so pass an explicit ``--strength 0.25`` for Gemini content on
-# this pipeline until a Qwen-specific ladder is wired into ``resolve_strength``.
+# certified controlnet Gemini floor (0.15); ``resolve_strength(..., pipeline="qwen")``
+# now carries this via ``_QWEN_VENDOR_STRENGTH`` (below), so ``--pipeline qwen`` gets the
+# right floor automatically -- the old manual "pass --strength 0.25 for Gemini on qwen"
+# workaround is retired.
 # (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there
 # is no QWEN_PROFILE constant -- only the model id is referenced from code.)
 QWEN_MODEL_ID = "Qwen/Qwen-Image"
@@ -90,6 +91,18 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH
 # Detected-vendor -> default strength. Vendor strings come from `vendor_for_strength`.
 _VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH}

+# Qwen has its OWN certified floors (Modal A100-80GB, 2026-06-20), DIFFERENT from the
+# SDXL ladder above: OpenAI 0.10 (seed-robust), Gemini 0.25 (HIGHER than controlnet's
+# 0.15 -- the 20B MMDiT perturbs less per denoising step, so it needs more strength to
+# clear Gemini SynthID). Unknown vendor tracks the higher (Gemini) value, safe-by-default.
+# `resolve_strength(..., pipeline="qwen")` uses this table so `--pipeline qwen` carries the
+# right floor automatically -- retiring the old manual "pass --strength 0.25 for Gemini on
+# qwen" workaround.
+QWEN_OPENAI_STRENGTH = 0.10
+QWEN_GEMINI_STRENGTH = 0.25
+QWEN_UNKNOWN_STRENGTH = 0.25
+_QWEN_VENDOR_STRENGTH = {"openai": QWEN_OPENAI_STRENGTH, "google": QWEN_GEMINI_STRENGTH}
+

 def strength_default_help() -> str:
    """One-line description of the vendor-adaptive default, derived from the constants.
@@ -103,20 +116,24 @@ def strength_default_help() -> str:
    )


-def resolve_strength(strength: float | None, vendor: str | None = None) -> float:
+def resolve_strength(strength: float | None, vendor: str | None = None, pipeline: str | None = None) -> float:
    """Resolve the denoising strength, applying the vendor default when unset.

    ``None`` means "the user did not pass ``--strength``", which resolves
    **vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from
-    ``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
-    ``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module
-    comment for why one ladder is correct). An explicit value always wins (including
+    ``vendor_for_strength``) selects the per-vendor floor. The ``sdxl`` and ``controlnet``
+    pipelines share ONE ladder (``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
+    ``UNKNOWN_STRENGTH`` -- see the module comment for why); ``qwen`` has its OWN higher
+    ladder (``_QWEN_VENDOR_STRENGTH``, Gemini 0.25 vs controlnet 0.15), selected when
+    ``pipeline`` normalizes to ``"qwen"``. An explicit value always wins (including
    ``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display)
    and the engine (for execution) so the two never disagree -- both must pass the SAME
-    ``vendor``.
+    ``vendor`` and ``pipeline``.
    """
    if strength is not None:
        return strength
+    if pipeline is not None and normalize_profile(pipeline) == "qwen":
+        return _QWEN_VENDOR_STRENGTH.get(vendor or "", QWEN_UNKNOWN_STRENGTH)
    return _VENDOR_STRENGTH.get(vendor or "", UNKNOWN_STRENGTH)


@@ -322,6 +322,15 @@ _QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original"
 _QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"


+def _qwen_target_size(width: int, height: int) -> tuple[int, int]:
+    """Floor (width, height) to a multiple of 16 for Qwen's VAE/patchifier (>= 16).
+
+    Pure; unit-tested. Without explicit dims the img2img pipeline defaults to a 1024x1024
+    SQUARE and silently distorts any non-square input.
+    """
+    return max(16, (width // 16) * 16), max(16, (height // 16) * 16)
+
+
 def _build_qwen_kwargs(
    image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any
 ) -> dict[str, Any]:
@@ -329,7 +338,12 @@ def _build_qwen_kwargs(

    Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an
    explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``.
+    Passes an explicit ``height``/``width`` derived from the input (floored to /16): the
+    pipeline otherwise defaults to a 1024x1024 SQUARE, squishing any non-square input
+    (the abba mixed-seam test: a 2816x1536 poster came back 1024x1024, distorting the
+    scene and garbling text). So qwen regenerates at the input's own geometry.
    """
+    qw, qh = _qwen_target_size(image.width, image.height)
    return {
        "prompt": _QWEN_PROMPT,
        "negative_prompt": _QWEN_NEGATIVE,
@@ -338,6 +352,8 @@ def _build_qwen_kwargs(
        "num_inference_steps": num_inference_steps,
        "true_cfg_scale": true_cfg_scale,
        "generator": generator,
+        "height": qh,
+        "width": qw,
    }


@@ -614,7 +630,7 @@ class WatermarkRemover:
        if output_path is None:
            output_path = image_path

-        strength = resolve_strength(strength, vendor)
+        strength = resolve_strength(strength, vendor, self.model_profile)

        if not 0.0 <= strength <= 1.0:
            raise ValueError(f"Strength must be between 0.0 and 1.0, got {strength}")
@@ -0,0 +1,76 @@
+"""Regression test for the one-to-one face matcher in ``scripts/fidelity_metrics.py``.
+
+The shipped per-face nearest matcher collided on multi-face images (two original faces
+both picking the same variant face when regeneration dropped a face), which inflated/
+corrupted the identity metric. ``assign_faces_one_to_one`` is the collision-free
+replacement. The function is pure (centers + diagonals in, index map out), so it is
+tested here without insightface / the heavy PEP723 env. Caught on the gemini_3 Qwen
+ControlNet experiment, where the original had 18 faces but the regenerated variants had
+17, producing two collisions under the old matcher.
+"""
+
+from __future__ import annotations
+
+import importlib.util
+import sys
+from pathlib import Path
+
+import pytest
+
+_SCRIPTS = Path(__file__).resolve().parent.parent / "scripts"
+
+
+def _load_assign():
+    # fidelity_metrics is a standalone PEP723 script, not an installed module; load it by
+    # path with scripts/ on sys.path so its `_plain_console` shim import resolves.
+    sys.path.insert(0, str(_SCRIPTS))
+    try:
+        spec = importlib.util.spec_from_file_location("fidelity_metrics", _SCRIPTS / "fidelity_metrics.py")
+        assert spec is not None
+        assert spec.loader is not None
+        mod = importlib.util.module_from_spec(spec)
+        sys.modules[spec.name] = mod  # @dataclass introspection needs the module registered
+        spec.loader.exec_module(mod)
+    except ImportError as exc:  # cv2/click absent in a bare env -> skip, not fail
+        pytest.skip(f"fidelity_metrics import deps missing: {exc}")
+    finally:
+        sys.path.remove(str(_SCRIPTS))
+    return mod.assign_faces_one_to_one
+
+
+def test_distinct_faces_match_nearest() -> None:
+    assign = _load_assign()
+    ref = [(0.0, 0.0), (100.0, 100.0)]
+    var = [(2.0, 1.0), (98.0, 102.0)]
+    diags = [50.0, 50.0]
+    assert assign(ref, var, diags) == {0: 0, 1: 1}
+
+
+def test_no_collision_when_variant_drops_a_face() -> None:
+    # Two original faces near the SAME single variant face: the old nearest matcher mapped
+    # BOTH to index 0; one-to-one must give the nearer ref the match and drop the other.
+    assign = _load_assign()
+    ref = [(10.0, 10.0), (14.0, 10.0)]  # both close to the lone variant
+    var = [(12.0, 10.0)]
+    diags = [50.0, 50.0]
+    matched = assign(ref, var, diags)
+    assert sorted(matched.values()) == [0]  # variant 0 used at most once
+    assert len(matched) == 1
+
+
+def test_gate_drops_implausibly_far_match() -> None:
+    assign = _load_assign()
+    ref = [(0.0, 0.0)]
+    var = [(1000.0, 1000.0)]  # far beyond 0.6 * diag
+    diags = [50.0]
+    assert assign(ref, var, diags) == {}
+
+
+def test_assignment_is_one_to_one_over_many_faces() -> None:
+    assign = _load_assign()
+    ref = [(float(i * 100), 0.0) for i in range(18)]
+    var = [(float(i * 100) + 3.0, 0.0) for i in range(17)]  # one fewer, as in the experiment
+    diags = [50.0] * 18
+    matched = assign(ref, var, diags)
+    assert len(matched) == 17
+    assert len(set(matched.values())) == 17  # every variant used at most once
@@ -126,6 +126,14 @@ class TestModelProfiles:
        assert normalize_profile("CONTROLNET") == "controlnet"


+class _StubImage:
+    """Minimal PIL.Image stand-in: just the ``width``/``height`` the pure helper reads."""
+
+    def __init__(self, width: int, height: int) -> None:
+        self.width = width
+        self.height = height
+
+
 class TestQwenKwargs:
    """_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape.

@@ -137,18 +145,37 @@ class TestQwenKwargs:
        from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs

        gen = object()
-        kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
+        img = _StubImage(2816, 1536)
+        kwargs = _build_qwen_kwargs(img, strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
        # Qwen uses true_cfg_scale, NOT SDXL's guidance_scale.
        assert kwargs["true_cfg_scale"] == 4.0
        assert "guidance_scale" not in kwargs
        # The scrub still comes from strength; image + generator pass through.
        assert kwargs["strength"] == 0.3
-        assert kwargs["image"] == "IMG"
+        assert kwargs["image"] is img
        assert kwargs["generator"] is gen
        # Faithful-regeneration prompt + an explicit negative prompt.
        assert kwargs["prompt"]
        assert kwargs["negative_prompt"]

+    def test_passes_explicit_aspect_preserving_size(self):
+        # Without height/width the pipeline defaults to 1024x1024 and squishes non-square
+        # input (the abba mixed-seam regression). Both already multiples of 16 -> unchanged.
+        from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
+
+        kwargs = _build_qwen_kwargs(
+            _StubImage(2816, 1536), strength=0.25, num_inference_steps=40, true_cfg_scale=4.0, generator=None
+        )
+        assert kwargs["width"] == 2816
+        assert kwargs["height"] == 1536
+
+    def test_qwen_target_size_floors_to_multiple_of_16(self):
+        from remove_ai_watermarks.noai.watermark_remover import _qwen_target_size
+
+        assert _qwen_target_size(2816, 1536) == (2816, 1536)  # already /16
+        assert _qwen_target_size(1122, 1402) == (1120, 1392)  # floored
+        assert _qwen_target_size(10, 10) == (16, 16)  # min clamp, never 0
+
    def test_qwen_model_id_is_qwen_image(self):
        from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID

@@ -159,15 +186,33 @@ class TestResolveStrength:
    """resolve_strength applies the vendor default only when strength is unset."""

    def test_none_is_vendor_adaptive(self):
-        # No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder
-        # applies to both pipelines (the certified controlnet floors), so there is no
-        # pipeline argument.
+        # No vendor -> unknown default; OpenAI lower, Google == unknown. The sdxl/controlnet
+        # pipelines share this ladder (the certified controlnet floors); qwen has its own
+        # (see test_qwen_pipeline_uses_its_own_higher_ladder).
        assert resolve_strength(None) == UNKNOWN_STRENGTH
        assert resolve_strength(None, "openai") == OPENAI_STRENGTH
        assert resolve_strength(None, "google") == GEMINI_STRENGTH
        assert resolve_strength(None, None) == UNKNOWN_STRENGTH
        # An unrecognized vendor string falls through to the unknown default.
        assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH
+        # sdxl/controlnet pipelines (and the "default" alias) use the same shared ladder.
+        assert resolve_strength(None, "google", "controlnet") == GEMINI_STRENGTH
+        assert resolve_strength(None, "google", "sdxl") == GEMINI_STRENGTH
+
+    def test_qwen_pipeline_uses_its_own_higher_ladder(self):
+        # Qwen's certified Gemini floor (0.25) is HIGHER than controlnet's (0.15); OpenAI
+        # matches (0.10). Unknown vendor on qwen tracks the higher Gemini value. This retires
+        # the old manual "pass --strength 0.25 for Gemini on qwen" workaround.
+        from remove_ai_watermarks.noai.watermark_profiles import QWEN_GEMINI_STRENGTH, QWEN_OPENAI_STRENGTH
+
+        assert QWEN_GEMINI_STRENGTH == 0.25
+        assert QWEN_OPENAI_STRENGTH == 0.10
+        assert resolve_strength(None, "google", "qwen") == QWEN_GEMINI_STRENGTH
+        assert resolve_strength(None, "openai", "qwen") == QWEN_OPENAI_STRENGTH
+        assert resolve_strength(None, None, "qwen") == QWEN_GEMINI_STRENGTH  # unknown -> higher floor
+        assert resolve_strength(None, "google", "qwen") > resolve_strength(None, "google", "controlnet")
+        # An explicit strength still wins on qwen.
+        assert resolve_strength(0.12, "google", "qwen") == 0.12

    def test_ladder_is_the_certified_controlnet_floors(self):
        # The unified ladder == the oracle-certified controlnet floors. Lowered on the