feat(invisible): add Qwen-Image img2img pipeline (--pipeline qwen)

A third diffusion pipeline alongside sdxl/controlnet: Qwen-Image (20B MMDiT, Apache-2.0 code AND weights) img2img. The scrub still comes from the img2img strength; Qwen preserves text (incl. CJK) and structure markedly better than SDXL at the scrub floor, so it over-regenerates real photos far less (directly targets the controlnet over-regeneration that degrades real uploads). - watermark_profiles: QWEN_MODEL_ID, normalize_profile accepts "qwen". - WatermarkRemover: _load_qwen_pipeline (bf16, loads Qwen base unless --model overridden, clear ImportError if diffusers lacks the class), _run_qwen (no MPS fallback -- 20B is CUDA/cloud-class), dispatch in _generate_one/preload, pure _build_qwen_kwargs (true_cfg_scale, not guidance_scale). - Shared _base_load_kwargs() across all three loaders (dtype + token). - CLI --pipeline gains "qwen"; invisible_engine threads it through. - scripts/qwen_scrub_prototype.py: standalone PEP 723 GPU experiment. Prototype oracle floors (Modal A100-80GB, single seed, controls SynthID-positive, PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20 still detected), with CJK text + faces faithful where controlnet plasticizes. The Gemini floor is higher than the shared default ladder, so pass an explicit --strength for Gemini on this pipeline until a Qwen-specific ladder is certified. The model-running path is CUDA-only (untestable locally); unit tests cover the pure call-shape (_build_qwen_kwargs) and profile normalization without torch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-05 07:57:50 +02:00 · 2026-06-19 20:44:36 -07:00
parent 0c0c6c6b03
commit 76e3d4154c
10 changed files with 309 additions and 24 deletions
@@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau
 ## How to run

 - `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
+- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
 - `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
 - `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
 - `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
@@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal
 - `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
 - `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
 - `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
- `noai/watermark_remover.py` — `WatermarkRemover` with two diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated).
+- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best text/structure preservation at the scrub floor; `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen prototype oracle floors (single-seed, pending seed-repeat cert): OpenAI ~0.10, Gemini ~0.30 (higher than the controlnet Gemini floor — pass explicit `--strength` for Gemini on `qwen` until certified). No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated).
 - `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. New tile-blend tuning goes in those pure helpers; do not inline blend math into the runner.
 - `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
 - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
@@ -33,7 +33,7 @@ It does **not** target watermarks that protect someone else's paid or copyrighte
 - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
 - **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
 - **Analog Humanizer** — optional film grain and chromatic aberration post-processing
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
+- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. An experimental `--pipeline qwen` runs Qwen-Image (20B, Apache-2.0) img2img, which preserves text (including CJK) and structure better still at the scrub floor; it is CUDA/cloud-class (does not fit MPS), and its strength floors are not yet certified (pass an explicit `--strength`, especially for Gemini content). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
 - **Batch processing** — process entire directories
 - **Detection** — three-stage NCC watermark detection with confidence scoring
 - **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, EXIF, or JPEG segment), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the C2PA cloud-manifest reference (Adobe Durable Content Credentials, when the embedded manifest is stripped), the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
@@ -131,3 +131,11 @@ See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (ce
 `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`.

 **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output.
+
+## `qwen` pipeline (experimental, Qwen-Image 20B, uncertified floors)
+
+`--pipeline qwen` runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights), as an img2img alternative to the SDXL pipelines. Motivation: the controlnet over-regeneration problem above (it plasticizes real photos / loses fine text at the scrub floor). Qwen-Image renders text natively (incl. CJK) and preserves structure markedly better, so at the strength that removes SynthID it damages real content far less.
+
+The scrub still comes from the img2img `strength` (same lever as SDXL); the call shape lives in the pure `_build_qwen_kwargs` (uses Qwen's `true_cfg_scale`, not SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it, and ~4.0 is typical vs the SDXL default 7.5). bf16 on CUDA. It is **CUDA/cloud-class — the 20B does not fit MPS — so `_run_qwen` has NO MPS→CPU fallback** (unlike the SDXL paths). Cost on Modal A100-80GB is ~$0.05-0.10/image vs SDXL.
+
+**Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19 — PENDING seed-repeat cert):** on native-resolution OpenAI and Gemini cert inputs (both controls SynthID-POSITIVE), OpenAI cleared at strength **0.10** and Gemini at **0.30** (0.20 still detected). At those floors CJK text and faces stayed faithful (the zoom comparison showed controlnet-style plastication absent). Two caveats before relying on it: (1) near-floor scrub is SEED-NON-DETERMINISTIC (the general known-limitation above), so these single-seed floors are NOT certified — run a seed-repeat sweep before trusting them; (2) `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15, the certified controlnet floor) UNDER-scrubs Gemini on `qwen` (whose floor is ~0.30) — **pass an explicit `--strength` for Gemini content on `qwen`** until a Qwen-specific ladder is certified. Flat-graphic content was not in the prototype sample.
@@ -177,10 +177,12 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo

 ## `noai/watermark_remover.py`

-`noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`).
+`noai/watermark_remover.py` — the `WatermarkRemover` class has three diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id`). `sdxl`/`controlnet` share the SDXL base (`DEFAULT_MODEL_ID`); `qwen` is its own base (`QWEN_MODEL_ID`).

 **`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights).

+**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is that it preserves text (incl. CJK) and structure markedly better than SDXL at the scrub floor, so it over-regenerates real photos far less (directly targets the controlnet over-regeneration problem). Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19, PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20 still detected) — both controls were SynthID-positive; at those floors CJK text + faces stay faithful where controlnet plasticizes. The Gemini floor (0.30) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength` for Gemini content on `qwen` until a Qwen-specific ladder is certified.**
+
 **`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`).

 **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.**
@@ -0,0 +1,128 @@
+# /// script
+# requires-python = ">=3.10"
+# dependencies = [
+#   "diffusers>=0.35.0",
+#   "transformers>=4.51.0",
+#   "torch",
+#   "accelerate",
+#   "pillow",
+#   "click",
+# ]
+# ///
+"""Isolated GPU prototype: does a low-strength Qwen-Image img2img pass scrub the
+invisible watermark while keeping text/structure legible?
+
+This is the oracle-gated experiment behind Library roadmap P1#5 (migrate the
+invisible pipeline onto Qwen-Image-Edit). It is DELIBERATELY standalone:
+
+  * It is NOT imported by the package and NOT in ``uv.lock``. Qwen-Image needs a
+    newer ``diffusers``/``transformers`` (Qwen2.5-VL text encoder) than the SDXL
+    pipeline is pinned to, so wiring it into the locked env would risk the
+    certified SDXL/ControlNet pipeline (the ``cannot import Qwen3VL...`` trap).
+    PEP 723 inline metadata lets ``uv run`` build a throwaway env for it instead.
+  * Qwen-Image is ~20B, so it needs a real GPU (CUDA) -- it will not fit on MPS.
+
+Run (on a GPU box / Modal), then eyeball the outputs AND submit them to the
+matching oracle (openai.com/verify for OpenAI, the Gemini app for Google):
+
+    uv run scripts/qwen_scrub_prototype.py INPUT.png -o out/ --strengths 0.1,0.2,0.3,0.4
+
+What to look for:
+  * SCRUB: the oracle no longer reports the watermark at some strength.
+  * FIDELITY: text stays legible and faces/structure stay faithful at that same
+    strength -- the whole point of trying Qwen over SDXL (which garbles text).
+The smallest strength that clears the oracle while keeping fidelity is the result
+to compare against the SDXL/ControlNet floors (OpenAI 0.10 / Google 0.15).
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+
+import click
+
+log = logging.getLogger("qwen_proto")
+
+# A neutral, faithful-regeneration prompt (we want to scrub, not restyle); mirrors
+# the intent of the SDXL controlnet prompt. Qwen renders text natively, so a light
+# pass should keep captions legible where SDXL would garble them.
+_PROMPT = "high quality, sharp, detailed, faithful to the original"
+_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
+
+
+def _pick_device(requested: str) -> tuple[str, object]:
+    import torch
+
+    if requested != "auto":
+        device = requested
+    elif torch.cuda.is_available():
+        device = "cuda"
+    elif getattr(torch.backends, "mps", None) is not None and torch.backends.mps.is_available():
+        device = "mps"
+    else:
+        device = "cpu"
+    # bf16 on CUDA (Qwen's reference dtype); fp32 elsewhere for numerical safety.
+    dtype = torch.bfloat16 if device == "cuda" else torch.float32
+    return device, dtype
+
+
+@click.command()
+@click.argument("source", type=click.Path(exists=True, path_type=Path))
+@click.option("-o", "--output-dir", type=click.Path(path_type=Path), default=Path("qwen_out"))
+@click.option("--strengths", default="0.1,0.2,0.3,0.4", help="Comma-separated img2img strengths to sweep.")
+@click.option("--steps", type=int, default=40, help="Inference steps.")
+@click.option("--cfg", type=float, default=4.0, help="true_cfg_scale (Qwen's CFG; reference default 4.0).")
+@click.option("--model", default="Qwen/Qwen-Image", help="HF model id (Qwen-Image img2img base).")
+@click.option("--device", default="auto", type=click.Choice(["auto", "cuda", "mps", "cpu"]))
+@click.option("--seed", type=int, default=0, help="Reproducible seed.")
+def main(
+    source: Path,
+    output_dir: Path,
+    strengths: str,
+    steps: int,
+    cfg: float,
+    model: str,
+    device: str,
+    seed: int,
+) -> None:
+    """Sweep Qwen-Image img2img strength over SOURCE and save one output per strength."""
+    logging.basicConfig(level=logging.INFO, format="%(message)s")
+    import torch
+    from diffusers import QwenImageImg2ImgPipeline
+    from PIL import Image
+
+    dev, dtype = _pick_device(device)
+    log.info("Loading %s on %s (%s)...", model, dev, dtype)
+    pipe = QwenImageImg2ImgPipeline.from_pretrained(model, torch_dtype=dtype)
+    pipe = pipe.to(dev)
+
+    init_image = Image.open(source).convert("RGB")
+    output_dir.mkdir(parents=True, exist_ok=True)
+    values = [float(s) for s in strengths.split(",") if s.strip()]
+
+    for strength in values:
+        generator = torch.Generator(device="cpu").manual_seed(seed)
+        log.info("Generating strength=%.2f ...", strength)
+        result = pipe(
+            prompt=_PROMPT,
+            negative_prompt=_NEGATIVE,
+            image=init_image,
+            strength=strength,
+            num_inference_steps=steps,
+            true_cfg_scale=cfg,
+            generator=generator,
+        )
+        out_path = output_dir / f"{source.stem}_qwen_s{strength:.2f}.png"
+        result.images[0].save(out_path)
+        log.info("  saved %s", out_path)
+
+    log.info(
+        "\nDone. Eyeball text/face fidelity, then submit each output to the matching oracle "
+        "(openai.com/verify / Gemini app). The smallest strength that clears the oracle while "
+        "keeping fidelity is the number to compare against the SDXL floors (OpenAI 0.10 / Google 0.15)."
+    )
+
+
+if __name__ == "__main__":
+    main()
@@ -253,15 +253,16 @@ def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str |
    return normalized


-# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the
-# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl``
-# (warned + normalized away by _normalize_pipeline).
-_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"]
+# ``controlnet`` (the default-SELECTED value), ``sdxl`` (plain SDXL img2img) and
+# ``qwen`` (Qwen-Image, CUDA/cloud-class) are the current profiles; ``default`` is an
+# OUTDATED back-compat alias for ``sdxl`` (warned + normalized away by _normalize_pipeline).
+_PIPELINE_CHOICES = ["sdxl", "controlnet", "qwen", "default"]
 _PIPELINE_HELP = (
    "Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves "
    "text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img "
-    "(lighter, no extra model download, but leaves SynthID on flat-graphic content). "
-    "('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)"
+    "(lighter, no extra model download, but leaves SynthID on flat-graphic content); "
+    "qwen = Qwen-Image (20B, Apache-2.0) img2img, best text/structure preservation but "
+    "CUDA/cloud-class (does not fit MPS). ('default' is an OUTDATED alias for 'sdxl'.)"
 )

 # Shared --pipeline / --strength decorators so the three diffusion commands
@@ -103,8 +103,9 @@ class InvisibleEngine:
            device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto.
            pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet
                that preserves text/face structure via edge conditioning while removing
-                SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
-                flat-graphic content). "default" is a back-compat alias for "sdxl".
+                SynthID), "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
+                flat-graphic content), or "qwen" (Qwen-Image 20B img2img, best text/
+                structure preservation but CUDA/cloud-class). "default" aliases "sdxl".
            hf_token: HuggingFace API token.
            progress_callback: Optional callback for progress messages.
            controlnet_conditioning_scale: ControlNet structure-preservation
@@ -12,6 +12,17 @@ if TYPE_CHECKING:

 DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"

+# Qwen-Image (20B MMDiT, Apache-2.0 code AND weights) base for the ``qwen`` pipeline:
+# an img2img alternative to SDXL with native text rendering (incl. CJK). Loaded only
+# when ``--pipeline qwen`` is selected; CUDA/cloud-class (does not fit MPS). Prototype
+# oracle floors (single-seed, 2026-06-19, pending seed-repeat cert): OpenAI clears at
+# strength ~0.10, Google/Gemini at ~0.30 (0.20 still detected) -- the latter is HIGHER
+# than the certified controlnet Google floor (0.15), so pass an explicit ``--strength``
+# for Gemini content on this pipeline until a Qwen-specific ladder is certified.
+# (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there
+# is no QWEN_PROFILE constant -- only the model id is referenced from code.)
+QWEN_MODEL_ID = "Qwen/Qwen-Image"
+
 # Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img
 # profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's
 # name before ``controlnet`` became the default-selected pipeline, 2026-06-09).
@@ -1,6 +1,14 @@
 """Watermark removal using diffusion model regeneration attack.

-Two pipelines:
+Three pipelines (selected by the explicit ``pipeline`` ctor arg):
+
+0. ``qwen`` -- Qwen-Image (20B MMDiT, Apache-2.0) img2img. The scrub still comes from
+   the img2img ``strength``; Qwen preserves text (incl. CJK) and structure markedly
+   better than SDXL at the scrub floor, so it over-regenerates real photos far less.
+   CUDA/cloud-class (does not fit MPS). See ``watermark_profiles`` for the prototype
+   oracle floors (pending seed-repeat cert).
+
+Two SDXL pipelines:
 1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark
   REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet
   only PRESERVES structure (text/faces) by conditioning on the edge map. No original
@@ -36,6 +44,7 @@ from remove_ai_watermarks.noai.watermark_profiles import (
    CONTROLNET_CANNY_MODEL,
    DEFAULT_MODEL_ID,
    DEFAULT_STRENGTH,
+    QWEN_MODEL_ID,
    normalize_profile,
    resolve_strength,
 )
@@ -308,6 +317,29 @@ _CANNY_HIGH = 200
 _CONTROLNET_PROMPT = "best quality, high quality, sharp, detailed, photographic"
 _CONTROLNET_NEGATIVE = "blurry, lowres, deformed, distorted text, garbled text, watermark, jpeg artifacts"

+# Neutral prompts for the Qwen-Image img2img pass (faithful regeneration, not an edit).
+_QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original"
+_QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
+
+
+def _build_qwen_kwargs(
+    image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any
+) -> dict[str, Any]:
+    """Build the QwenImageImg2ImgPipeline call kwargs (pure; unit-tested without torch).
+
+    Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an
+    explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``.
+    """
+    return {
+        "prompt": _QWEN_PROMPT,
+        "negative_prompt": _QWEN_NEGATIVE,
+        "image": image,
+        "strength": strength,
+        "num_inference_steps": num_inference_steps,
+        "true_cfg_scale": true_cfg_scale,
+        "generator": generator,
+    }
+

 class WatermarkRemover:
    """Remove watermarks from images using diffusion model regeneration.
@@ -348,6 +380,11 @@ class WatermarkRemover:
        if torch_dtype is None:
            if self.device == "cpu" or self.device == "mps":
                self.torch_dtype = torch.float32  # type: ignore
+            elif self.model_profile == "qwen":
+                # Qwen-Image is published in bf16; fp16 risks overflow on the 20B MMDiT.
+                # cuda/xpu-only by construction: the cpu/mps guard above already forced
+                # fp32, and the 20B model does not fit MPS anyway.
+                self.torch_dtype = torch.bfloat16  # type: ignore
            else:
                self.torch_dtype = torch.float16  # type: ignore
        else:
@@ -355,6 +392,7 @@ class WatermarkRemover:

        self._pipeline: AutoImg2ImgPipeline | None = None
        self._controlnet_pipeline: Any = None
+        self._qwen_pipeline: Any = None
        self._progress_callback = progress_callback
        self.hf_token: str | None = hf_token or os.environ.get("HF_TOKEN")

@@ -369,7 +407,9 @@ class WatermarkRemover:

    def preload(self) -> None:
        """Eagerly load the pipeline so download progress bars are visible."""
-        if self.model_profile == "controlnet":
+        if self.model_profile == "qwen":
+            self._load_qwen_pipeline()
+        elif self.model_profile == "controlnet":
            self._load_controlnet_pipeline()
        else:
            self._load_pipeline()
@@ -420,19 +460,27 @@ class WatermarkRemover:

        return pipeline

+    def _base_load_kwargs(self) -> dict[str, Any]:
+        """The ``from_pretrained`` kwargs shared by all three loaders (dtype + token).
+
+        Each loader adds its own extras (SDXL safety_checker + fp16 VAE, the ControlNet
+        model, etc.). Centralizing the dtype/token pair avoids the drift trap of three
+        copies (a token forgotten on one loader silently breaks gated downloads there).
+        """
+        load_kwargs: dict[str, Any] = {"torch_dtype": self.torch_dtype}
+        if self.hf_token:
+            load_kwargs["token"] = self.hf_token
+        return load_kwargs
+
    def _load_pipeline(self) -> AutoImg2ImgPipeline:
        """Load the plain SDXL img2img pipeline lazily."""
        if self._pipeline is None:
            logger.info("Loading model %s on %s...", self.model_id, self.device)
            self._set_progress(f"Loading model weights: {self.model_id}")

-            load_kwargs: dict[str, Any] = {
-                "torch_dtype": self.torch_dtype,
-                "safety_checker": None,
-                "requires_safety_checker": False,
-            }
-            if self.hf_token:
-                load_kwargs["token"] = self.hf_token
+            load_kwargs = self._base_load_kwargs()
+            load_kwargs["safety_checker"] = None
+            load_kwargs["requires_safety_checker"] = False
            self._maybe_add_fp16_vae(load_kwargs)

            pipeline = AutoImg2ImgPipeline.from_pretrained(self.model_id, **load_kwargs)  # type: ignore
@@ -458,9 +506,8 @@ class WatermarkRemover:
            self._set_progress(f"Loading ControlNet: {CONTROLNET_CANNY_MODEL}")
            controlnet = ControlNetModel.from_pretrained(CONTROLNET_CANNY_MODEL, torch_dtype=self.torch_dtype)

-            load_kwargs: dict[str, Any] = {"controlnet": controlnet, "torch_dtype": self.torch_dtype}
-            if self.hf_token:
-                load_kwargs["token"] = self.hf_token
+            load_kwargs = self._base_load_kwargs()
+            load_kwargs["controlnet"] = controlnet
            self._maybe_add_fp16_vae(load_kwargs)

            self._set_progress(f"Loading model weights: {self.model_id}")
@@ -474,6 +521,37 @@ class WatermarkRemover:

        return self._controlnet_pipeline

+    def _load_qwen_pipeline(self) -> Any:
+        """Load the Qwen-Image img2img pipeline lazily.
+
+        Qwen-Image is its OWN base model (not an SDXL add-on), so it loads
+        ``QWEN_MODEL_ID`` unless the caller passed a custom ``--model``. Needs a
+        diffusers build that ships ``QwenImageImg2ImgPipeline``; raises a clear error
+        otherwise. CUDA/cloud-class (the 20B MMDiT does not fit MPS).
+        """
+        if self._qwen_pipeline is None:
+            try:
+                from diffusers import QwenImageImg2ImgPipeline
+            except ImportError as exc:
+                raise ImportError(
+                    "The 'qwen' pipeline needs a diffusers version that ships "
+                    "QwenImageImg2ImgPipeline. Upgrade: pip install -U diffusers"
+                ) from exc
+
+            # Use the Qwen base unless the user explicitly overrode --model.
+            model = self.model_id if self.model_id != self.DEFAULT_MODEL_ID else QWEN_MODEL_ID
+            logger.info("Loading Qwen-Image (%s) on %s...", model, self.device)
+            self._set_progress(f"Loading model weights: {model}")
+            pipeline = QwenImageImg2ImgPipeline.from_pretrained(model, **self._base_load_kwargs())
+            pipeline = self._move_to_device_and_optimize(pipeline)
+            with contextlib.suppress(Exception):
+                pipeline.set_progress_bar_config(disable=True)
+
+            logger.info("Qwen-Image model loaded successfully")
+            self._qwen_pipeline = pipeline
+
+        return self._qwen_pipeline
+
    # ── Core removal ─────────────────────────────────────────────────

    def remove_watermark(
@@ -552,6 +630,8 @@ class WatermarkRemover:
        _total_start = time.monotonic()

        def _generate_one(img: Image.Image) -> Image.Image:
+            if self.model_profile == "qwen":
+                return self._run_qwen(img, strength, num_inference_steps, guidance_scale, generator)
            if self.model_profile == "controlnet":
                return self._run_controlnet(img, strength, num_inference_steps, guidance_scale, generator)
            return self._run_img2img(img, strength, num_inference_steps, guidance_scale, generator)
@@ -725,6 +805,30 @@ class WatermarkRemover:
        self._controlnet_pipeline = None
        return self._load_controlnet_pipeline()

+    # ── Qwen runner ──────────────────────────────────────────────────
+
+    def _run_qwen(
+        self,
+        init_image: Image.Image,
+        strength: float,
+        num_inference_steps: int,
+        guidance_scale: float,
+        generator: Any,
+    ) -> Image.Image:
+        """Run the Qwen-Image img2img pass.
+
+        Removal comes from the img2img ``strength`` (same lever as the SDXL paths);
+        Qwen-Image preserves text/structure markedly better at the scrub floor. The
+        CLI ``guidance_scale`` maps to Qwen's ``true_cfg_scale`` (~4.0 is typical;
+        the SDXL default of 7.5 is high for Qwen). No MPS->CPU fallback: the 20B MMDiT
+        is CUDA/cloud-class and does not run on MPS, so an error here propagates.
+        """
+        pipeline = self._load_qwen_pipeline()
+        self._set_progress(f"Running Qwen-Image img2img (strength={strength}, true_cfg={guidance_scale})...")
+        kwargs = _build_qwen_kwargs(init_image, strength, num_inference_steps, guidance_scale, generator)
+        result = pipeline(**kwargs)
+        return result.images[0]
+
    # ── Batch ────────────────────────────────────────────────────────

    def remove_watermark_batch(
@@ -115,6 +115,7 @@ class TestModelProfiles:
    def test_canonical_profiles_unchanged(self):
        assert normalize_profile("sdxl") == "sdxl"
        assert normalize_profile("controlnet") == "controlnet"
+        assert normalize_profile("qwen") == "qwen"

    def test_default_alias_resolves_to_sdxl(self):
        # "default" is the legacy alias for "sdxl" (back-compat for existing scripts).
@@ -125,6 +126,35 @@ class TestModelProfiles:
        assert normalize_profile("CONTROLNET") == "controlnet"


+class TestQwenKwargs:
+    """_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape.
+
+    watermark_remover imports torch under a try/except, so the module (and this pure
+    helper) imports fine in the core+dev CI env where torch is absent.
+    """
+
+    def test_uses_true_cfg_not_guidance_scale(self):
+        from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
+
+        gen = object()
+        kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
+        # Qwen uses true_cfg_scale, NOT SDXL's guidance_scale.
+        assert kwargs["true_cfg_scale"] == 4.0
+        assert "guidance_scale" not in kwargs
+        # The scrub still comes from strength; image + generator pass through.
+        assert kwargs["strength"] == 0.3
+        assert kwargs["image"] == "IMG"
+        assert kwargs["generator"] is gen
+        # Faithful-regeneration prompt + an explicit negative prompt.
+        assert kwargs["prompt"]
+        assert kwargs["negative_prompt"]
+
+    def test_qwen_model_id_is_qwen_image(self):
+        from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID
+
+        assert QWEN_MODEL_ID == "Qwen/Qwen-Image"
+
+
 class TestResolveStrength:
    """resolve_strength applies the vendor default only when strength is unset."""