mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-04 15:37:49 +02:00
fix(qwen): native-geometry img2img + pipeline-aware strength; record dropped auto/mixed/Z-Image leads
- watermark_remover: _build_qwen_kwargs now passes explicit height/width (via _qwen_target_size, floored to /16). Without it QwenImageImg2ImgPipeline defaults to 1024x1024 and silently squishes non-square inputs, distorting the scene and garbling text. - watermark_profiles: resolve_strength gains a `pipeline` arg + a Qwen strength ladder (_QWEN_VENDOR_STRENGTH, Gemini 0.25), so `--pipeline qwen` gets its certified floor automatically; retires the manual "pass --strength 0.25 for Gemini on qwen" workaround. - fidelity_metrics: replace per-face nearest matching (collided on multi-face images when a variant dropped a face, corrupting the identity metric) with a collision-free one-to-one assignment (assign_faces_one_to_one). lapvar/LPIPS were always bbox-anchored and immune. Regression-guarded by tests/test_fidelity_matching.py. - docs: record the measured outcomes of the qwen-improvement arc. The Qwen ControlNet face-fix is CLOSED (no permissive Qwen detail/tile ControlNet exists; canny carries edges, not skin grain). The `--pipeline auto` router + faces+text mixed dual-pass were prototyped and DROPPED (controlnet wins faces AND display text: abba CER 0.114 vs qwen 0.379). Z-Image-Turbo was tried and dropped (same regeneration limits). qwen stays a manual opt-in; controlnet is the default for everything. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -50,3 +50,6 @@ data/samsung_capture/captures/samsung_content_*
|
||||
# (GFPGAN wrote RetinaFace/parsing weights to a CWD ./gfpgan/weights/ working
|
||||
# dir on first use). Runtime artifact, never committed.
|
||||
gfpgan/
|
||||
|
||||
# Qwen ControlNet experiment outputs (throwaway eval; never the committed corpus)
|
||||
scripts/_qwen_exp_out/
|
||||
|
||||
@@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau
|
||||
## How to run
|
||||
|
||||
- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
|
||||
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
|
||||
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
|
||||
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
|
||||
- `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
|
||||
- `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
|
||||
@@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal
|
||||
- `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
|
||||
- `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
|
||||
- `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
|
||||
- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15, so pass explicit `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired). Fidelity measured by `scripts/fidelity_metrics.py` (OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball), compared ONLY at each pipeline's oracle-confirmed scrub floor (where SynthID is removed in BOTH — equal-strength is invalid where it leaves one un-scrubbed): Qwen wins TEXT (incl. CJK), controlnet wins FACES (Qwen smooths faces more) — Qwen is the text-preserving remover, not a universal fidelity win. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
|
||||
- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15). `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically (the old manual `--strength 0.25` workaround is retired). `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via `_qwen_target_size`) — without it the pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (fixed 2026-06-20). **`qwen` is a MANUAL opt-in only — there is NO auto-router.** Measured (`scripts/fidelity_metrics.py`, OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball): qwen beats controlnet on ONE niche only — **clean body text on a plain background, no faces** (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES (it always has) AND **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes, qwen re-renders and garbles them). So a content `--pipeline auto` router and a faces+text **mixed dual-pass** were prototyped and **DROPPED** (2026-06-20): on the canonical faces+text case controlnet wins every metric incl. text, so mixed loses; and "text→qwen" can't be auto-decided (it is body-vs-display text that matters, undetectable cheaply). qwen stays for callers who KNOW their content is clean-text-heavy and face-free. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
|
||||
- `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. Also home to `feather_region_composite(base, regenerated, box, *, feather)` — the pure region-targeted compositor for **AI-enhanced composites** (`ai_source_kind == "enhanced"`): blends the regenerated AI box back over the original with a feathered seam, leaving the real photo OUTSIDE the box pixel-exact. It backs `WatermarkRemover.remove_watermark(region=...)` (regenerate ONLY the AI region, not the whole frame); the no-model lossless region path stays `region_eraser.erase`. New tile/region-blend tuning goes in these pure helpers; do not inline blend math into the runner.
|
||||
- `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
|
||||
- `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
|
||||
|
||||
@@ -144,8 +144,8 @@ The scrub still comes from the img2img `strength` (same lever as SDXL); the call
|
||||
- **Text:** Qwen wins on substantial Latin/mixed-script text -- OCR CER, controlnet vs Qwen: openai_1 (EN+RU+ZH, both 0.10) 0.385 vs **0.241**, openai_2 (EN, both 0.10) 0.341 vs **0.290**. On a SHORT CJK sign (gemini_1, cnet 0.15 / Qwen 0.25) it is a TIE (0.037 vs 0.037 -- both near-perfect; the earlier Qwen 0.000 was at the higher 0.30, not the certified floor).
|
||||
- **Faces:** controlnet wins -- gemini_3, 18 faces (cnet 0.15 / Qwen 0.25): ArcFace identity 0.546 vs 0.382, Laplacian-variance retention 0.62 vs 0.40, face LPIPS 0.09 vs 0.17 (Qwen smooths faces MORE; the gap narrows vs Qwen 0.30 but controlnet still wins clearly).
|
||||
|
||||
**Conclusion: Qwen is the better TEXT-preserving remover (substantial Latin/mixed text), NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better, so the path is a content-routed lane (text→qwen, faces→controlnet), not a blanket migration.** Caveat: `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15) UNDER-scrubs Gemini on `qwen` (floor 0.25) — pass `--strength 0.25` for Gemini on `qwen` until a Qwen ladder is wired. Flat-graphic content was not in the sample.
|
||||
**Conclusion: Qwen wins TEXT only for clean body text on a plain background with NO faces; controlnet wins faces AND display/decorative text in a scene. So `qwen` is a MANUAL `--pipeline qwen` opt-in, not a routed lane.** A content `--pipeline auto` router + a faces+text mixed dual-pass were prototyped and DROPPED (2026-06-20): on the canonical faces+text case (the abba poster, faces + display text) controlnet won EVERY metric incl. text (CER 0.114 vs qwen 0.379), so grafting qwen text only hurts; and "text→qwen" is undecidable cheaply (body-vs-display text is what matters). Caveat: `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`, Gemini 0.25), so `--pipeline qwen` gets the 0.25 Gemini floor automatically — the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` now passes an explicit height/width (qwen squished non-square inputs to 1024² without it). Flat-graphic content was not in the sample.
|
||||
|
||||
**Improving Qwen (ship vs improve):** the cited research on fixing the face-smoothing while keeping the text win (Qwen-Image ControlNet for structure conditioning, Qwen-Image-Edit, Z-Image-Turbo as a cheaper text-preserving substitute, non-regenerative detail restoration) lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable now as an opt-in text lane; the strongest improvement lead is adding a Qwen-Image ControlNet, but no improvement has measured face-fidelity at our floors yet (validate with `scripts/fidelity_metrics.py` first).
|
||||
**Improving Qwen (ship vs improve):** the cited research lives in `docs/qwen-improvement-research.md` -- read it before extending the `qwen` pipeline. Verdict: shippable as an opt-in text lane. **The "add a Qwen-Image ControlNet to fix face smoothing" lead was built, measured, and CLOSED (2026-06-20):** a DiffSynth-Studio Qwen + Apache-2.0 blockwise-canny ControlNet at the Gemini floor 0.25 did NOT restore face skin texture (face Laplacian-variance retention flat 0.40 -> 0.40, 13/16 faces within +-0.02; the SDXL+canny target 0.62 was not approached), because canny carries edges not skin grain and Qwen's higher Gemini floor (0.25 vs SDXL+canny 0.15) forces more smoothing -- and a deep-research sweep confirmed NO permissively-licensed Qwen tile/detail/realism/skin ControlNet exists anywhere (every Qwen conditioning is geometry). So **faces stay on SDXL+controlnet; Qwen is the text lane, not a face fix.** The strongest remaining lead is **Z-Image-Turbo** (6B, Apache-2.0, `ZImageImg2ImgPipeline`, scrub mechanism preserved) -- its own SynthID floor and face/text fidelity are UNMEASURED; that is the next experiment. Non-regenerative high-frequency detail re-injection is NOT safe by assumption (the "clean-output high frequencies do not carry the watermark" claim was refuted) -- it must be oracle-gated. Always validate any improvement at the certified floors with `scripts/fidelity_metrics.py` first.
|
||||
|
||||
**Seed as a quality lever (measured, openai_1 at 0.10, seeds 0-4):** the seed barely moves whole-image fidelity (img LPIPS 0.062-0.065, SSIM 0.855-0.857, PSNR 28.5-28.7 — flat) but does shift TEXT legibility (OCR CER 0.241-0.290, ~17% spread) -- the seed changes WHICH details get regenerated, not the overall level. So a per-image best-of-N-seed selection is a WEAK, text-only lever (pick the lowest-CER seed that still scrubs; fidelity selection needs no oracle). Not worth the N× cost for general use -- pin one decent seed in prod; reserve best-of-N for text-heavy premium cases.
|
||||
|
||||
@@ -185,7 +185,7 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo
|
||||
|
||||
**`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights).
|
||||
|
||||
**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength 0.25` for Gemini content on `qwen` until a Qwen-specific ladder is wired into `resolve_strength`.** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed — see the metric nuance above: Qwen wins substantial text, controlnet wins faces.
|
||||
**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is **text preservation** (incl. CJK and small text). **Metric-measured nuance (2026-06-19, `scripts/fidelity_metrics.py`, do NOT trust the eyeball here — it misled). Compare ONLY at each pipeline's oracle-confirmed scrub floor (outputs where SynthID is removed in BOTH — an equal-strength compare is invalid where it leaves one un-scrubbed; Qwen at 0.15 does not clear Gemini): Qwen wins TEXT (lower OCR CER across EN/RU/ZH, perfect Chinese) but controlnet wins FACES (higher Laplacian-variance retention and lower LPIPS — Qwen smooths faces MORE; ArcFace identity favors controlnet 0.546 vs 0.331 at the Gemini floors).** So Qwen is the better text-preserving remover, NOT a universal fidelity win — controlnet's canny edge map holds face skin detail better. Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **CERTIFIED oracle floors (Modal A100-80GB, 2026-06-20): OpenAI 0.10 (seed-robust — clean on seeds 0-4), Gemini 0.25 (seed 0 verified on 2 images; the Gemini oracle rate-limits volume seed-repeat, so PIN a seed in prod). The Gemini floor (0.25) is HIGHER than the certified controlnet Gemini floor (0.15); `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically -- the old manual `--strength 0.25` workaround is retired. `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via the pure `_qwen_target_size`); WITHOUT it the img2img pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (the abba 2816x1536 case came back 1024x1024, distorting the scene and garbling text — fixed 2026-06-20, tested in `TestQwenKwargs`).** Fidelity vs controlnet was measured at the certified floors (`scripts/fidelity_metrics.py`), NOT eyeballed. **`qwen` is a MANUAL opt-in only — there is NO auto-router (one was prototyped and DROPPED, see below).** It wins ONE niche: clean body text on a plain background, NO faces (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES and **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes; qwen re-renders and garbles them). **`--pipeline auto` + a faces+text mixed dual-pass were built and DROPPED (2026-06-20):** on the canonical faces+text case (abba) controlnet wins EVERY metric incl. text, so grafting qwen text would only hurt; and "text→qwen" is undecidable cheaply (it is body-vs-display text that matters). The router/detector/mixed modules were removed; the geometry fix + the Qwen strength ladder were kept (they make the manual `--pipeline qwen` correct). **Do NOT retry "add a Qwen ControlNet to close the face gap" — it was built, measured, and CLOSED 2026-06-20:** a DiffSynth blockwise-canny Qwen ControlNet did not restore face skin texture (lapvar flat 0.40, canny carries edges not skin grain) and no permissively-licensed Qwen tile/detail/skin ControlNet exists anywhere (all conditioning is geometry). Faces stay on controlnet; the next improvement lead is Z-Image-Turbo (Apache-2.0, unmeasured floor). Full record + the deep-research sweep in `docs/qwen-improvement-research.md`.
|
||||
|
||||
**`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`).
|
||||
|
||||
@@ -213,6 +213,15 @@ History: `auto_config.plan()` was a content-adaptive planner that detected faces
|
||||
|
||||
**`--auto` is now a DEPRECATED no-op** (`cli._resolve_auto_polish`): controlnet is already the default pipeline AND the adaptive polish is ON by default, so `--auto` has nothing left to do — it only prints a deprecation warning and passes `adaptive_polish` through unchanged (an explicit `--no-adaptive-polish` still wins). (Originally it re-enabled the polish; once the polish default flipped to ON the same day, the parameter-source branch became dead and was dropped.) The **adaptive polish itself lives on** in `humanizer.adaptive_polish` (CLI `--adaptive-polish/--no-adaptive-polish`, **ON by default since 2026-06-09** — it self-gates to a no-op where there is no detail deficit, so default-on is safe; uses the full-res original as the detail reference) — see the `humanizer` test note. `batch` resolves the polish once before the loop (one warning) and caches the invisible engine per pipeline (`ctx.obj["_inv_engines"]`).
|
||||
|
||||
## Content `--pipeline auto` router + faces+text mixed dual-pass — PROTOTYPED and DROPPED (2026-06-20)
|
||||
|
||||
A `--pipeline auto` content router (`pipeline_router.py` + `content_detect.py`: Haar faces + MSER text → route text→qwen / faces→controlnet / both→mixed) and a faces+text **mixed dual-pass** (`mixed_pipeline.py`: scrub the whole frame on BOTH pipelines, then graft the qwen text regions onto the controlnet base via `tiling.feather_region_composite`) were built, run on Modal (the abba poster: faces + display text), measured, and **removed**. Why it failed:
|
||||
- On the canonical faces+text image **controlnet wins EVERY metric, including text** (CER 0.114 vs qwen 0.379; ID 0.64 vs 0.36; lapvar 0.71 vs 0.59) — canny holds the existing letter shapes, qwen re-renders display/decorative text and garbles it. So grafting qwen text onto the controlnet base only HURTS.
|
||||
- qwen beats controlnet on text ONLY for clean body text on a plain background with no faces (openai_1/2) — a niche where there are no faces to route around anyway, so `--pipeline qwen` alone covers it. The faces+clean-body-text intersection is near-empty.
|
||||
- "text→qwen" is not cheaply decidable: it is body-vs-display text that matters, which face/text detectors can't tell apart. MSER also over-fired (47% of the busy poster, incl. faces).
|
||||
|
||||
KEPT from that work (independently valid for the manual `--pipeline qwen`): the qwen **geometry fix** (`_qwen_target_size` + `_build_qwen_kwargs` height/width — qwen squished non-square inputs to 1024² without it) and the **pipeline-aware `resolve_strength`** Qwen ladder (Gemini 0.25). Also kept: the `fidelity_metrics.py` one-to-one face matcher. The throwaway Modal eval scripts were removed after the run (findings recorded here and in `docs/qwen-improvement-research.md`).
|
||||
|
||||
## `upscaler.py`
|
||||
|
||||
`upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (spandrel boundary, top-of-file pyright pragma). `is_available()` gates on spandrel+torch (via `importlib.util.find_spec`); `upscale(bgr, device=None)` loads a lazily-built spandrel `ImageModelDescriptor` singleton (double-checked lock) and upscales by the model's native factor (x2), with a non-CPU→CPU device fallback mirroring the diffusion engine's MPS→CPU retry. Weights (`RealESRGAN_x2plus.pth`, BSD-3-Clause) download on first use to the `torch.hub` checkpoints cache; never bundled. Used only when UPscaling to the `min_resolution` floor (a `max_resolution` downscale always uses Lanczos). The wiring is `InvisibleEngine._esrgan_upscale(pil, target)` — Real-ESRGAN at native factor, then a Lanczos resize to the exact target, falling back to a plain Lanczos resize if the extra is absent or the model errors (so an optional upscaler can never break removal). The default `--upscaler` is `lanczos` (cv2, no deps).
|
||||
|
||||
@@ -29,6 +29,73 @@ the 20B cost. None of the improvements has measured face-fidelity numbers at our
|
||||
scrub floors yet, so each must be validated with `scripts/fidelity_metrics.py` plus
|
||||
the oracle before shipping.
|
||||
|
||||
## Follow-up: ControlNet experiment + deeper research (2026-06-20)
|
||||
|
||||
The verdict's strongest lead -- adding a Qwen-Image ControlNet -- was **built, measured, and
|
||||
CLOSED**.
|
||||
|
||||
**Experiment** (Modal A100-80GB; DiffSynth-Studio `QwenImagePipeline` + the Apache-2.0
|
||||
`DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny` -- the only framework exposing
|
||||
Qwen-Image + canny ControlNet + img2img `denoising_strength` in ONE call; diffusers ships no
|
||||
`QwenImageControlNetImg2ImgPipeline`, its three Qwen ControlNet pipelines are txt2img only).
|
||||
Measured on `gemini_3` (18 faces) at the Gemini scrub floor 0.25 vs base-Qwen 0.25 with
|
||||
`scripts/fidelity_metrics.py`:
|
||||
- **The actual failure mode (face skin texture) was NOT restored:** Laplacian-variance
|
||||
retention stayed flat (base 0.40 -> qwen+canny 0.40; per-face 13/16 within +-0.02 after a
|
||||
one-to-one face match, sd 0.016 -- not an averaging artifact). The SDXL+canny target 0.62
|
||||
was not approached.
|
||||
- Identity rose modestly and broadly (ArcFace 0.346 -> 0.415, 12/16 faces improved) but the
|
||||
absolute stays ~0.42 ("a different person, slightly closer").
|
||||
- Mechanism (verified, not inferred): canny conditioning was applied fully (scale 1.0, full
|
||||
denoise schedule); the canny edge map is clean facial geometry with BLANK skin (4.83% edge
|
||||
density) -- canny carries edges, not skin grain. Root cause: Qwen's Gemini floor (0.25) is
|
||||
higher than SDXL+canny's (0.15), forcing more denoising -> more smoothing; structure
|
||||
conditioning cannot compensate for that.
|
||||
|
||||
**Deeper research** (deep-research harness, 103 agents, 3-vote adversarial):
|
||||
- **[high, unanimous] No permissively-licensed Qwen-Image tile / detail / realism / skin
|
||||
ControlNet exists anywhere** -- DiffSynth first-party is Canny/Depth/Inpaint only, InstantX
|
||||
Union is canny/soft-edge/depth/pose, the official QwenLM repo ships none. Every Qwen
|
||||
conditioning is GEOMETRY, the same class as the tested canny. **The "add a Qwen ControlNet to
|
||||
fix faces" lead is closed for good.**
|
||||
- **[high, unanimous] Z-Image / Z-Image-Turbo (6B, Apache-2.0 on code AND weights, ~1/3 of
|
||||
Qwen 20B)** ships a documented `ZImageImg2ImgPipeline` with standard strength denoising, so
|
||||
it preserves the scrub mechanism. Its own SynthID scrub floor and face/text fidelity are
|
||||
UNMEASURED -- this is the strongest concrete NEXT experiment.
|
||||
- **[medium] Lowering Qwen's scrub floor has no off-the-shelf SynthID answer:** the "partial
|
||||
img2img ~0.3 breaks robust watermarks" literature tests open schemes
|
||||
(StegaStamp/TrustMark/VINE), NEVER SynthID (proprietary decoder) -- analogy, not proof. No
|
||||
minimal-strength SynthID attack under a named permissive license was found.
|
||||
- **REFUTED [0-3]:** "re-injecting high-frequency detail from a clean diffusion output would
|
||||
not carry the watermark back." So non-regenerative detail transfer is NOT safe by
|
||||
assumption -- the transferred high-frequency band must be gated against the SynthID oracle.
|
||||
|
||||
**Net for the pipeline:** **faces stay on SDXL+controlnet**; there is no Qwen face-fix.
|
||||
The live frontier is Z-Image-Turbo (next experiment) and oracle-gated non-regenerative detail
|
||||
re-injection.
|
||||
|
||||
**Follow-up (2026-06-20) — the content-routed lane / mixed dual-pass was tested and DROPPED.**
|
||||
A `--pipeline auto` router (Haar+MSER → text→qwen / faces→controlnet / both→mixed) and a
|
||||
faces+text mixed dual-pass (scrub the whole frame on both, graft qwen text regions onto the
|
||||
controlnet base) were built and run on Modal (the abba poster: faces + display text). On that
|
||||
canonical faces+text case **controlnet won EVERY metric, including text** (CER 0.114 vs qwen
|
||||
0.379; ID 0.64 vs 0.36) — canny holds existing letter shapes, qwen re-renders display text and
|
||||
garbles it, so grafting qwen text only hurts. Qwen beats controlnet on text ONLY for clean body
|
||||
text on a plain background with no faces (openai_1/2), a niche `--pipeline qwen` alone covers;
|
||||
the faces+clean-body-text intersection is near-empty, and "text→qwen" is undecidable cheaply
|
||||
(body-vs-display text is what matters). So the router + mixed modules were removed and **`qwen`
|
||||
is a manual `--pipeline qwen` opt-in only.** KEPT (independently valid): the qwen geometry fix
|
||||
(it squished non-square inputs to 1024²), the pipeline-aware `resolve_strength` Qwen ladder, and
|
||||
the `fidelity_metrics.py` one-to-one face matcher below.
|
||||
|
||||
**Tooling fix surfaced by this run:** `scripts/fidelity_metrics.py` face matching was changed
|
||||
from per-face nearest-center to a collision-free one-to-one assignment
|
||||
(`assign_faces_one_to_one`, gated by face size), after the 18-face `gemini_3` exposed
|
||||
collisions (the regenerated variants detected 17 faces, so two originals mapped to the same
|
||||
variant face, corrupting the identity metric). lapvar/LPIPS were always anchored to the
|
||||
original bbox and stayed collision-immune. Regression-guarded by
|
||||
`tests/test_fidelity_matching.py`.
|
||||
|
||||
## Findings
|
||||
|
||||
1. **[high, 3-0] A permissively-licensed Qwen-Image ControlNet exists today and is
|
||||
|
||||
+53
-15
@@ -186,16 +186,50 @@ def _lap_var(bgr: np.ndarray) -> float:
|
||||
return float(cv2.Laplacian(gray, cv2.CV_64F).var())
|
||||
|
||||
|
||||
def _match_face(orig_face: Any, variant_faces: list[Any]) -> Any:
|
||||
"""Nearest variant face to an original face by bbox-center distance (geometry kept)."""
|
||||
ox, oy = (orig_face.bbox[0] + orig_face.bbox[2]) / 2, (orig_face.bbox[1] + orig_face.bbox[3]) / 2
|
||||
best, best_d = None, 1e18
|
||||
for vf in variant_faces:
|
||||
vx, vy = (vf.bbox[0] + vf.bbox[2]) / 2, (vf.bbox[1] + vf.bbox[3]) / 2
|
||||
d = (ox - vx) ** 2 + (oy - vy) ** 2
|
||||
if d < best_d:
|
||||
best, best_d = vf, d
|
||||
return best
|
||||
def _bbox_center(bbox: Any) -> tuple[float, float]:
|
||||
return (bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2
|
||||
|
||||
|
||||
def _bbox_diag(bbox: Any) -> float:
|
||||
return float(((bbox[2] - bbox[0]) ** 2 + (bbox[3] - bbox[1]) ** 2) ** 0.5)
|
||||
|
||||
|
||||
def assign_faces_one_to_one(
|
||||
ref_centers: list[tuple[float, float]],
|
||||
var_centers: list[tuple[float, float]],
|
||||
ref_diags: list[float],
|
||||
max_frac: float = 0.6,
|
||||
) -> dict[int, int]:
|
||||
"""One-to-one nearest-center face assignment (pure; unit-tested without insightface).
|
||||
|
||||
Per-face nearest matching collides on multi-face images -- two original faces can both
|
||||
pick the SAME variant face (e.g. when regeneration drops a face, so the variant has fewer
|
||||
detections), corrupting the identity metric (the lapvar/LPIPS metrics are immune: they are
|
||||
anchored to the ORIGINAL bbox on both images). This greedy-by-distance assignment is
|
||||
collision-free: it walks candidate pairs nearest-first and never reuses a ref or a variant
|
||||
face. Faces are spatially well-separated, so greedy equals the optimal (Hungarian) result
|
||||
here without the scipy dependency. A pair is dropped when the center distance exceeds
|
||||
``max_frac`` of the original face diagonal (no plausible match -- the face was lost).
|
||||
|
||||
Returns a dict mapping ref-face index -> variant-face index for matched faces only.
|
||||
"""
|
||||
pairs: list[tuple[float, int, int]] = []
|
||||
for i, (rx, ry) in enumerate(ref_centers):
|
||||
for j, (vx, vy) in enumerate(var_centers):
|
||||
pairs.append((((rx - vx) ** 2 + (ry - vy) ** 2) ** 0.5, i, j))
|
||||
pairs.sort()
|
||||
used_ref: set[int] = set()
|
||||
used_var: set[int] = set()
|
||||
matched: dict[int, int] = {}
|
||||
for dist, i, j in pairs:
|
||||
if i in used_ref or j in used_var:
|
||||
continue
|
||||
if dist > max_frac * ref_diags[i]:
|
||||
continue
|
||||
matched[i] = j
|
||||
used_ref.add(i)
|
||||
used_var.add(j)
|
||||
return matched
|
||||
|
||||
|
||||
def _cosine(a: np.ndarray, b: np.ndarray) -> float:
|
||||
@@ -325,15 +359,19 @@ def compare(original: str, variants: tuple[str, ...], ocr_langs: str, ground_tru
|
||||
app.prepare(ctx_id=-1, det_size=(640, 640))
|
||||
ref_faces = app.get(ref)
|
||||
if ref_faces:
|
||||
ref_centers = [_bbox_center(of.bbox) for of in ref_faces]
|
||||
ref_diags = [_bbox_diag(of.bbox) for of in ref_faces]
|
||||
for label, img in parsed:
|
||||
vfaces = app.get(img)
|
||||
st = face_stats[label]
|
||||
for of in ref_faces:
|
||||
vf = _match_face(of, vfaces)
|
||||
if vf is None:
|
||||
continue
|
||||
# One-to-one assignment for identity (collision-free); lapvar/LPIPS stay
|
||||
# anchored to the original bbox below, so they need no match.
|
||||
matched = assign_faces_one_to_one(ref_centers, [_bbox_center(vf.bbox) for vf in vfaces], ref_diags)
|
||||
for oi, of in enumerate(ref_faces):
|
||||
st.n_faces += 1
|
||||
st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding))
|
||||
vf = vfaces[matched[oi]] if oi in matched else None
|
||||
if vf is not None:
|
||||
st.identity.append(_cosine(of.normed_embedding, vf.normed_embedding))
|
||||
oc, vc = _crop(ref, of.bbox), _crop(img, of.bbox)
|
||||
if oc.size == 0 or vc.size == 0:
|
||||
continue
|
||||
|
||||
@@ -762,7 +762,7 @@ def cmd_invisible(
|
||||
vendor = vendor_for_strength(source)
|
||||
console.print(f" Input: {source.name}")
|
||||
console.print(f" Pipeline: {pipeline}")
|
||||
console.print(f" Strength: {resolve_strength(strength, vendor)} Steps: {steps}")
|
||||
console.print(f" Strength: {resolve_strength(strength, vendor, pipeline)} Steps: {steps}")
|
||||
|
||||
t0 = time.monotonic()
|
||||
result_path = engine.remove_watermark(
|
||||
@@ -1075,7 +1075,7 @@ def cmd_all(
|
||||
# already lost its C2PA to the visible-removal pass, so reading it would
|
||||
# always resolve to the unknown-vendor default.
|
||||
vendor = vendor_for_strength(source)
|
||||
console.print(f" Strength: {resolve_strength(strength, vendor)} Steps: {steps}")
|
||||
console.print(f" Strength: {resolve_strength(strength, vendor, pipeline)} Steps: {steps}")
|
||||
inv_engine.remove_watermark(
|
||||
image_path=tmp_path,
|
||||
output_path=tmp_path,
|
||||
|
||||
@@ -18,9 +18,10 @@ DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
|
||||
# oracle floors (2026-06-20): OpenAI **0.10** (seed-robust -- clean on seeds 0-4) and
|
||||
# Google/Gemini **0.25** (seed 0 verified on 2 images; pin a seed in prod, the Gemini
|
||||
# oracle rate-limits volume seed-repeat). The Gemini floor (0.25) is HIGHER than the
|
||||
# certified controlnet Gemini floor (0.15), and ``resolve_strength`` is shared/
|
||||
# pipeline-independent, so pass an explicit ``--strength 0.25`` for Gemini content on
|
||||
# this pipeline until a Qwen-specific ladder is wired into ``resolve_strength``.
|
||||
# certified controlnet Gemini floor (0.15); ``resolve_strength(..., pipeline="qwen")``
|
||||
# now carries this via ``_QWEN_VENDOR_STRENGTH`` (below), so ``--pipeline qwen`` gets the
|
||||
# right floor automatically -- the old manual "pass --strength 0.25 for Gemini on qwen"
|
||||
# workaround is retired.
|
||||
# (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there
|
||||
# is no QWEN_PROFILE constant -- only the model id is referenced from code.)
|
||||
QWEN_MODEL_ID = "Qwen/Qwen-Image"
|
||||
@@ -90,6 +91,18 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH
|
||||
# Detected-vendor -> default strength. Vendor strings come from `vendor_for_strength`.
|
||||
_VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH}
|
||||
|
||||
# Qwen has its OWN certified floors (Modal A100-80GB, 2026-06-20), DIFFERENT from the
|
||||
# SDXL ladder above: OpenAI 0.10 (seed-robust), Gemini 0.25 (HIGHER than controlnet's
|
||||
# 0.15 -- the 20B MMDiT perturbs less per denoising step, so it needs more strength to
|
||||
# clear Gemini SynthID). Unknown vendor tracks the higher (Gemini) value, safe-by-default.
|
||||
# `resolve_strength(..., pipeline="qwen")` uses this table so `--pipeline qwen` carries the
|
||||
# right floor automatically -- retiring the old manual "pass --strength 0.25 for Gemini on
|
||||
# qwen" workaround.
|
||||
QWEN_OPENAI_STRENGTH = 0.10
|
||||
QWEN_GEMINI_STRENGTH = 0.25
|
||||
QWEN_UNKNOWN_STRENGTH = 0.25
|
||||
_QWEN_VENDOR_STRENGTH = {"openai": QWEN_OPENAI_STRENGTH, "google": QWEN_GEMINI_STRENGTH}
|
||||
|
||||
|
||||
def strength_default_help() -> str:
|
||||
"""One-line description of the vendor-adaptive default, derived from the constants.
|
||||
@@ -103,20 +116,24 @@ def strength_default_help() -> str:
|
||||
)
|
||||
|
||||
|
||||
def resolve_strength(strength: float | None, vendor: str | None = None) -> float:
|
||||
def resolve_strength(strength: float | None, vendor: str | None = None, pipeline: str | None = None) -> float:
|
||||
"""Resolve the denoising strength, applying the vendor default when unset.
|
||||
|
||||
``None`` means "the user did not pass ``--strength``", which resolves
|
||||
**vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from
|
||||
``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
|
||||
``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module
|
||||
comment for why one ladder is correct). An explicit value always wins (including
|
||||
``vendor_for_strength``) selects the per-vendor floor. The ``sdxl`` and ``controlnet``
|
||||
pipelines share ONE ladder (``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
|
||||
``UNKNOWN_STRENGTH`` -- see the module comment for why); ``qwen`` has its OWN higher
|
||||
ladder (``_QWEN_VENDOR_STRENGTH``, Gemini 0.25 vs controlnet 0.15), selected when
|
||||
``pipeline`` normalizes to ``"qwen"``. An explicit value always wins (including
|
||||
``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display)
|
||||
and the engine (for execution) so the two never disagree -- both must pass the SAME
|
||||
``vendor``.
|
||||
``vendor`` and ``pipeline``.
|
||||
"""
|
||||
if strength is not None:
|
||||
return strength
|
||||
if pipeline is not None and normalize_profile(pipeline) == "qwen":
|
||||
return _QWEN_VENDOR_STRENGTH.get(vendor or "", QWEN_UNKNOWN_STRENGTH)
|
||||
return _VENDOR_STRENGTH.get(vendor or "", UNKNOWN_STRENGTH)
|
||||
|
||||
|
||||
|
||||
@@ -322,6 +322,15 @@ _QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original"
|
||||
_QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
|
||||
|
||||
|
||||
def _qwen_target_size(width: int, height: int) -> tuple[int, int]:
|
||||
"""Floor (width, height) to a multiple of 16 for Qwen's VAE/patchifier (>= 16).
|
||||
|
||||
Pure; unit-tested. Without explicit dims the img2img pipeline defaults to a 1024x1024
|
||||
SQUARE and silently distorts any non-square input.
|
||||
"""
|
||||
return max(16, (width // 16) * 16), max(16, (height // 16) * 16)
|
||||
|
||||
|
||||
def _build_qwen_kwargs(
|
||||
image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any
|
||||
) -> dict[str, Any]:
|
||||
@@ -329,7 +338,12 @@ def _build_qwen_kwargs(
|
||||
|
||||
Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an
|
||||
explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``.
|
||||
Passes an explicit ``height``/``width`` derived from the input (floored to /16): the
|
||||
pipeline otherwise defaults to a 1024x1024 SQUARE, squishing any non-square input
|
||||
(the abba mixed-seam test: a 2816x1536 poster came back 1024x1024, distorting the
|
||||
scene and garbling text). So qwen regenerates at the input's own geometry.
|
||||
"""
|
||||
qw, qh = _qwen_target_size(image.width, image.height)
|
||||
return {
|
||||
"prompt": _QWEN_PROMPT,
|
||||
"negative_prompt": _QWEN_NEGATIVE,
|
||||
@@ -338,6 +352,8 @@ def _build_qwen_kwargs(
|
||||
"num_inference_steps": num_inference_steps,
|
||||
"true_cfg_scale": true_cfg_scale,
|
||||
"generator": generator,
|
||||
"height": qh,
|
||||
"width": qw,
|
||||
}
|
||||
|
||||
|
||||
@@ -614,7 +630,7 @@ class WatermarkRemover:
|
||||
if output_path is None:
|
||||
output_path = image_path
|
||||
|
||||
strength = resolve_strength(strength, vendor)
|
||||
strength = resolve_strength(strength, vendor, self.model_profile)
|
||||
|
||||
if not 0.0 <= strength <= 1.0:
|
||||
raise ValueError(f"Strength must be between 0.0 and 1.0, got {strength}")
|
||||
|
||||
@@ -0,0 +1,76 @@
|
||||
"""Regression test for the one-to-one face matcher in ``scripts/fidelity_metrics.py``.
|
||||
|
||||
The shipped per-face nearest matcher collided on multi-face images (two original faces
|
||||
both picking the same variant face when regeneration dropped a face), which inflated/
|
||||
corrupted the identity metric. ``assign_faces_one_to_one`` is the collision-free
|
||||
replacement. The function is pure (centers + diagonals in, index map out), so it is
|
||||
tested here without insightface / the heavy PEP723 env. Caught on the gemini_3 Qwen
|
||||
ControlNet experiment, where the original had 18 faces but the regenerated variants had
|
||||
17, producing two collisions under the old matcher.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib.util
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
_SCRIPTS = Path(__file__).resolve().parent.parent / "scripts"
|
||||
|
||||
|
||||
def _load_assign():
|
||||
# fidelity_metrics is a standalone PEP723 script, not an installed module; load it by
|
||||
# path with scripts/ on sys.path so its `_plain_console` shim import resolves.
|
||||
sys.path.insert(0, str(_SCRIPTS))
|
||||
try:
|
||||
spec = importlib.util.spec_from_file_location("fidelity_metrics", _SCRIPTS / "fidelity_metrics.py")
|
||||
assert spec is not None
|
||||
assert spec.loader is not None
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
sys.modules[spec.name] = mod # @dataclass introspection needs the module registered
|
||||
spec.loader.exec_module(mod)
|
||||
except ImportError as exc: # cv2/click absent in a bare env -> skip, not fail
|
||||
pytest.skip(f"fidelity_metrics import deps missing: {exc}")
|
||||
finally:
|
||||
sys.path.remove(str(_SCRIPTS))
|
||||
return mod.assign_faces_one_to_one
|
||||
|
||||
|
||||
def test_distinct_faces_match_nearest() -> None:
|
||||
assign = _load_assign()
|
||||
ref = [(0.0, 0.0), (100.0, 100.0)]
|
||||
var = [(2.0, 1.0), (98.0, 102.0)]
|
||||
diags = [50.0, 50.0]
|
||||
assert assign(ref, var, diags) == {0: 0, 1: 1}
|
||||
|
||||
|
||||
def test_no_collision_when_variant_drops_a_face() -> None:
|
||||
# Two original faces near the SAME single variant face: the old nearest matcher mapped
|
||||
# BOTH to index 0; one-to-one must give the nearer ref the match and drop the other.
|
||||
assign = _load_assign()
|
||||
ref = [(10.0, 10.0), (14.0, 10.0)] # both close to the lone variant
|
||||
var = [(12.0, 10.0)]
|
||||
diags = [50.0, 50.0]
|
||||
matched = assign(ref, var, diags)
|
||||
assert sorted(matched.values()) == [0] # variant 0 used at most once
|
||||
assert len(matched) == 1
|
||||
|
||||
|
||||
def test_gate_drops_implausibly_far_match() -> None:
|
||||
assign = _load_assign()
|
||||
ref = [(0.0, 0.0)]
|
||||
var = [(1000.0, 1000.0)] # far beyond 0.6 * diag
|
||||
diags = [50.0]
|
||||
assert assign(ref, var, diags) == {}
|
||||
|
||||
|
||||
def test_assignment_is_one_to_one_over_many_faces() -> None:
|
||||
assign = _load_assign()
|
||||
ref = [(float(i * 100), 0.0) for i in range(18)]
|
||||
var = [(float(i * 100) + 3.0, 0.0) for i in range(17)] # one fewer, as in the experiment
|
||||
diags = [50.0] * 18
|
||||
matched = assign(ref, var, diags)
|
||||
assert len(matched) == 17
|
||||
assert len(set(matched.values())) == 17 # every variant used at most once
|
||||
+50
-5
@@ -126,6 +126,14 @@ class TestModelProfiles:
|
||||
assert normalize_profile("CONTROLNET") == "controlnet"
|
||||
|
||||
|
||||
class _StubImage:
|
||||
"""Minimal PIL.Image stand-in: just the ``width``/``height`` the pure helper reads."""
|
||||
|
||||
def __init__(self, width: int, height: int) -> None:
|
||||
self.width = width
|
||||
self.height = height
|
||||
|
||||
|
||||
class TestQwenKwargs:
|
||||
"""_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape.
|
||||
|
||||
@@ -137,18 +145,37 @@ class TestQwenKwargs:
|
||||
from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
|
||||
|
||||
gen = object()
|
||||
kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
|
||||
img = _StubImage(2816, 1536)
|
||||
kwargs = _build_qwen_kwargs(img, strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
|
||||
# Qwen uses true_cfg_scale, NOT SDXL's guidance_scale.
|
||||
assert kwargs["true_cfg_scale"] == 4.0
|
||||
assert "guidance_scale" not in kwargs
|
||||
# The scrub still comes from strength; image + generator pass through.
|
||||
assert kwargs["strength"] == 0.3
|
||||
assert kwargs["image"] == "IMG"
|
||||
assert kwargs["image"] is img
|
||||
assert kwargs["generator"] is gen
|
||||
# Faithful-regeneration prompt + an explicit negative prompt.
|
||||
assert kwargs["prompt"]
|
||||
assert kwargs["negative_prompt"]
|
||||
|
||||
def test_passes_explicit_aspect_preserving_size(self):
|
||||
# Without height/width the pipeline defaults to 1024x1024 and squishes non-square
|
||||
# input (the abba mixed-seam regression). Both already multiples of 16 -> unchanged.
|
||||
from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
|
||||
|
||||
kwargs = _build_qwen_kwargs(
|
||||
_StubImage(2816, 1536), strength=0.25, num_inference_steps=40, true_cfg_scale=4.0, generator=None
|
||||
)
|
||||
assert kwargs["width"] == 2816
|
||||
assert kwargs["height"] == 1536
|
||||
|
||||
def test_qwen_target_size_floors_to_multiple_of_16(self):
|
||||
from remove_ai_watermarks.noai.watermark_remover import _qwen_target_size
|
||||
|
||||
assert _qwen_target_size(2816, 1536) == (2816, 1536) # already /16
|
||||
assert _qwen_target_size(1122, 1402) == (1120, 1392) # floored
|
||||
assert _qwen_target_size(10, 10) == (16, 16) # min clamp, never 0
|
||||
|
||||
def test_qwen_model_id_is_qwen_image(self):
|
||||
from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID
|
||||
|
||||
@@ -159,15 +186,33 @@ class TestResolveStrength:
|
||||
"""resolve_strength applies the vendor default only when strength is unset."""
|
||||
|
||||
def test_none_is_vendor_adaptive(self):
|
||||
# No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder
|
||||
# applies to both pipelines (the certified controlnet floors), so there is no
|
||||
# pipeline argument.
|
||||
# No vendor -> unknown default; OpenAI lower, Google == unknown. The sdxl/controlnet
|
||||
# pipelines share this ladder (the certified controlnet floors); qwen has its own
|
||||
# (see test_qwen_pipeline_uses_its_own_higher_ladder).
|
||||
assert resolve_strength(None) == UNKNOWN_STRENGTH
|
||||
assert resolve_strength(None, "openai") == OPENAI_STRENGTH
|
||||
assert resolve_strength(None, "google") == GEMINI_STRENGTH
|
||||
assert resolve_strength(None, None) == UNKNOWN_STRENGTH
|
||||
# An unrecognized vendor string falls through to the unknown default.
|
||||
assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH
|
||||
# sdxl/controlnet pipelines (and the "default" alias) use the same shared ladder.
|
||||
assert resolve_strength(None, "google", "controlnet") == GEMINI_STRENGTH
|
||||
assert resolve_strength(None, "google", "sdxl") == GEMINI_STRENGTH
|
||||
|
||||
def test_qwen_pipeline_uses_its_own_higher_ladder(self):
|
||||
# Qwen's certified Gemini floor (0.25) is HIGHER than controlnet's (0.15); OpenAI
|
||||
# matches (0.10). Unknown vendor on qwen tracks the higher Gemini value. This retires
|
||||
# the old manual "pass --strength 0.25 for Gemini on qwen" workaround.
|
||||
from remove_ai_watermarks.noai.watermark_profiles import QWEN_GEMINI_STRENGTH, QWEN_OPENAI_STRENGTH
|
||||
|
||||
assert QWEN_GEMINI_STRENGTH == 0.25
|
||||
assert QWEN_OPENAI_STRENGTH == 0.10
|
||||
assert resolve_strength(None, "google", "qwen") == QWEN_GEMINI_STRENGTH
|
||||
assert resolve_strength(None, "openai", "qwen") == QWEN_OPENAI_STRENGTH
|
||||
assert resolve_strength(None, None, "qwen") == QWEN_GEMINI_STRENGTH # unknown -> higher floor
|
||||
assert resolve_strength(None, "google", "qwen") > resolve_strength(None, "google", "controlnet")
|
||||
# An explicit strength still wins on qwen.
|
||||
assert resolve_strength(0.12, "google", "qwen") == 0.12
|
||||
|
||||
def test_ladder_is_the_certified_controlnet_floors(self):
|
||||
# The unified ladder == the oracle-certified controlnet floors. Lowered on the
|
||||
|
||||
Reference in New Issue
Block a user