feat(invisible): add Qwen-Image img2img pipeline (--pipeline qwen)

A third diffusion pipeline alongside sdxl/controlnet: Qwen-Image (20B MMDiT,
Apache-2.0 code AND weights) img2img. The scrub still comes from the img2img
strength; Qwen preserves text (incl. CJK) and structure markedly better than
SDXL at the scrub floor, so it over-regenerates real photos far less (directly
targets the controlnet over-regeneration that degrades real uploads).

- watermark_profiles: QWEN_MODEL_ID, normalize_profile accepts "qwen".
- WatermarkRemover: _load_qwen_pipeline (bf16, loads Qwen base unless --model
  overridden, clear ImportError if diffusers lacks the class), _run_qwen (no
  MPS fallback -- 20B is CUDA/cloud-class), dispatch in _generate_one/preload,
  pure _build_qwen_kwargs (true_cfg_scale, not guidance_scale).
- Shared _base_load_kwargs() across all three loaders (dtype + token).
- CLI --pipeline gains "qwen"; invisible_engine threads it through.
- scripts/qwen_scrub_prototype.py: standalone PEP 723 GPU experiment.

Prototype oracle floors (Modal A100-80GB, single seed, controls SynthID-positive,
PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20
still detected), with CJK text + faces faithful where controlnet plasticizes. The
Gemini floor is higher than the shared default ladder, so pass an explicit
--strength for Gemini on this pipeline until a Qwen-specific ladder is certified.

The model-running path is CUDA-only (untestable locally); unit tests cover the
pure call-shape (_build_qwen_kwargs) and profile normalization without torch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-19 20:44:36 -07:00
parent 0c0c6c6b03
commit 76e3d4154c
10 changed files with 309 additions and 24 deletions
+2 -2
View File
@@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau
## How to run
- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped**`all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
- `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
- `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
@@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal
- `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
- `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
- `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
- `noai/watermark_remover.py``WatermarkRemover` with two diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated).
- `noai/watermark_remover.py``WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best text/structure preservation at the scrub floor; `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen prototype oracle floors (single-seed, pending seed-repeat cert): OpenAI ~0.10, Gemini ~0.30 (higher than the controlnet Gemini floor — pass explicit `--strength` for Gemini on `qwen` until certified). No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated).
- `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. New tile-blend tuning goes in those pure helpers; do not inline blend math into the runner.
- `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
- `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
+1 -1
View File
@@ -33,7 +33,7 @@ It does **not** target watermarks that protect someone else's paid or copyrighte
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
- **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
- **Analog Humanizer** — optional film grain and chromatic aberration post-processing
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. An experimental `--pipeline qwen` runs Qwen-Image (20B, Apache-2.0) img2img, which preserves text (including CJK) and structure better still at the scrub floor; it is CUDA/cloud-class (does not fit MPS), and its strength floors are not yet certified (pass an explicit `--strength`, especially for Gemini content). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
- **Batch processing** — process entire directories
- **Detection** — three-stage NCC watermark detection with confidence scoring
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, EXIF, or JPEG segment), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the C2PA cloud-manifest reference (Adobe Durable Content Credentials, when the embedded manifest is stripped), the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
+8
View File
@@ -131,3 +131,11 @@ See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (ce
`controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`.
**Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output.
## `qwen` pipeline (experimental, Qwen-Image 20B, uncertified floors)
`--pipeline qwen` runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights), as an img2img alternative to the SDXL pipelines. Motivation: the controlnet over-regeneration problem above (it plasticizes real photos / loses fine text at the scrub floor). Qwen-Image renders text natively (incl. CJK) and preserves structure markedly better, so at the strength that removes SynthID it damages real content far less.
The scrub still comes from the img2img `strength` (same lever as SDXL); the call shape lives in the pure `_build_qwen_kwargs` (uses Qwen's `true_cfg_scale`, not SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it, and ~4.0 is typical vs the SDXL default 7.5). bf16 on CUDA. It is **CUDA/cloud-class — the 20B does not fit MPS — so `_run_qwen` has NO MPS→CPU fallback** (unlike the SDXL paths). Cost on Modal A100-80GB is ~$0.05-0.10/image vs SDXL.
**Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19 — PENDING seed-repeat cert):** on native-resolution OpenAI and Gemini cert inputs (both controls SynthID-POSITIVE), OpenAI cleared at strength **0.10** and Gemini at **0.30** (0.20 still detected). At those floors CJK text and faces stayed faithful (the zoom comparison showed controlnet-style plastication absent). Two caveats before relying on it: (1) near-floor scrub is SEED-NON-DETERMINISTIC (the general known-limitation above), so these single-seed floors are NOT certified — run a seed-repeat sweep before trusting them; (2) `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15, the certified controlnet floor) UNDER-scrubs Gemini on `qwen` (whose floor is ~0.30) — **pass an explicit `--strength` for Gemini content on `qwen`** until a Qwen-specific ladder is certified. Flat-graphic content was not in the prototype sample.
+3 -1
View File
@@ -177,10 +177,12 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo
## `noai/watermark_remover.py`
`noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`).
`noai/watermark_remover.py` — the `WatermarkRemover` class has three diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id`). `sdxl`/`controlnet` share the SDXL base (`DEFAULT_MODEL_ID`); `qwen` is its own base (`QWEN_MODEL_ID`).
**`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights).
**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is that it preserves text (incl. CJK) and structure markedly better than SDXL at the scrub floor, so it over-regenerates real photos far less (directly targets the controlnet over-regeneration problem). Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19, PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20 still detected) — both controls were SynthID-positive; at those floors CJK text + faces stay faithful where controlnet plasticizes. The Gemini floor (0.30) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength` for Gemini content on `qwen` until a Qwen-specific ladder is certified.**
**`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`).
**Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.**
+128
View File
@@ -0,0 +1,128 @@
# /// script
# requires-python = ">=3.10"
# dependencies = [
# "diffusers>=0.35.0",
# "transformers>=4.51.0",
# "torch",
# "accelerate",
# "pillow",
# "click",
# ]
# ///
"""Isolated GPU prototype: does a low-strength Qwen-Image img2img pass scrub the
invisible watermark while keeping text/structure legible?
This is the oracle-gated experiment behind Library roadmap P1#5 (migrate the
invisible pipeline onto Qwen-Image-Edit). It is DELIBERATELY standalone:
* It is NOT imported by the package and NOT in ``uv.lock``. Qwen-Image needs a
newer ``diffusers``/``transformers`` (Qwen2.5-VL text encoder) than the SDXL
pipeline is pinned to, so wiring it into the locked env would risk the
certified SDXL/ControlNet pipeline (the ``cannot import Qwen3VL...`` trap).
PEP 723 inline metadata lets ``uv run`` build a throwaway env for it instead.
* Qwen-Image is ~20B, so it needs a real GPU (CUDA) -- it will not fit on MPS.
Run (on a GPU box / Modal), then eyeball the outputs AND submit them to the
matching oracle (openai.com/verify for OpenAI, the Gemini app for Google):
uv run scripts/qwen_scrub_prototype.py INPUT.png -o out/ --strengths 0.1,0.2,0.3,0.4
What to look for:
* SCRUB: the oracle no longer reports the watermark at some strength.
* FIDELITY: text stays legible and faces/structure stay faithful at that same
strength -- the whole point of trying Qwen over SDXL (which garbles text).
The smallest strength that clears the oracle while keeping fidelity is the result
to compare against the SDXL/ControlNet floors (OpenAI 0.10 / Google 0.15).
"""
from __future__ import annotations
import logging
from pathlib import Path
import click
log = logging.getLogger("qwen_proto")
# A neutral, faithful-regeneration prompt (we want to scrub, not restyle); mirrors
# the intent of the SDXL controlnet prompt. Qwen renders text natively, so a light
# pass should keep captions legible where SDXL would garble them.
_PROMPT = "high quality, sharp, detailed, faithful to the original"
_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
def _pick_device(requested: str) -> tuple[str, object]:
import torch
if requested != "auto":
device = requested
elif torch.cuda.is_available():
device = "cuda"
elif getattr(torch.backends, "mps", None) is not None and torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"
# bf16 on CUDA (Qwen's reference dtype); fp32 elsewhere for numerical safety.
dtype = torch.bfloat16 if device == "cuda" else torch.float32
return device, dtype
@click.command()
@click.argument("source", type=click.Path(exists=True, path_type=Path))
@click.option("-o", "--output-dir", type=click.Path(path_type=Path), default=Path("qwen_out"))
@click.option("--strengths", default="0.1,0.2,0.3,0.4", help="Comma-separated img2img strengths to sweep.")
@click.option("--steps", type=int, default=40, help="Inference steps.")
@click.option("--cfg", type=float, default=4.0, help="true_cfg_scale (Qwen's CFG; reference default 4.0).")
@click.option("--model", default="Qwen/Qwen-Image", help="HF model id (Qwen-Image img2img base).")
@click.option("--device", default="auto", type=click.Choice(["auto", "cuda", "mps", "cpu"]))
@click.option("--seed", type=int, default=0, help="Reproducible seed.")
def main(
source: Path,
output_dir: Path,
strengths: str,
steps: int,
cfg: float,
model: str,
device: str,
seed: int,
) -> None:
"""Sweep Qwen-Image img2img strength over SOURCE and save one output per strength."""
logging.basicConfig(level=logging.INFO, format="%(message)s")
import torch
from diffusers import QwenImageImg2ImgPipeline
from PIL import Image
dev, dtype = _pick_device(device)
log.info("Loading %s on %s (%s)...", model, dev, dtype)
pipe = QwenImageImg2ImgPipeline.from_pretrained(model, torch_dtype=dtype)
pipe = pipe.to(dev)
init_image = Image.open(source).convert("RGB")
output_dir.mkdir(parents=True, exist_ok=True)
values = [float(s) for s in strengths.split(",") if s.strip()]
for strength in values:
generator = torch.Generator(device="cpu").manual_seed(seed)
log.info("Generating strength=%.2f ...", strength)
result = pipe(
prompt=_PROMPT,
negative_prompt=_NEGATIVE,
image=init_image,
strength=strength,
num_inference_steps=steps,
true_cfg_scale=cfg,
generator=generator,
)
out_path = output_dir / f"{source.stem}_qwen_s{strength:.2f}.png"
result.images[0].save(out_path)
log.info(" saved %s", out_path)
log.info(
"\nDone. Eyeball text/face fidelity, then submit each output to the matching oracle "
"(openai.com/verify / Gemini app). The smallest strength that clears the oracle while "
"keeping fidelity is the number to compare against the SDXL floors (OpenAI 0.10 / Google 0.15)."
)
if __name__ == "__main__":
main()
+7 -6
View File
@@ -253,15 +253,16 @@ def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str |
return normalized
# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the
# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl``
# (warned + normalized away by _normalize_pipeline).
_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"]
# ``controlnet`` (the default-SELECTED value), ``sdxl`` (plain SDXL img2img) and
# ``qwen`` (Qwen-Image, CUDA/cloud-class) are the current profiles; ``default`` is an
# OUTDATED back-compat alias for ``sdxl`` (warned + normalized away by _normalize_pipeline).
_PIPELINE_CHOICES = ["sdxl", "controlnet", "qwen", "default"]
_PIPELINE_HELP = (
"Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves "
"text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img "
"(lighter, no extra model download, but leaves SynthID on flat-graphic content). "
"('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)"
"(lighter, no extra model download, but leaves SynthID on flat-graphic content); "
"qwen = Qwen-Image (20B, Apache-2.0) img2img, best text/structure preservation but "
"CUDA/cloud-class (does not fit MPS). ('default' is an OUTDATED alias for 'sdxl'.)"
)
# Shared --pipeline / --strength decorators so the three diffusion commands
+3 -2
View File
@@ -103,8 +103,9 @@ class InvisibleEngine:
device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto.
pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet
that preserves text/face structure via edge conditioning while removing
SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
flat-graphic content). "default" is a back-compat alias for "sdxl".
SynthID), "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
flat-graphic content), or "qwen" (Qwen-Image 20B img2img, best text/
structure preservation but CUDA/cloud-class). "default" aliases "sdxl".
hf_token: HuggingFace API token.
progress_callback: Optional callback for progress messages.
controlnet_conditioning_scale: ControlNet structure-preservation
@@ -12,6 +12,17 @@ if TYPE_CHECKING:
DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
# Qwen-Image (20B MMDiT, Apache-2.0 code AND weights) base for the ``qwen`` pipeline:
# an img2img alternative to SDXL with native text rendering (incl. CJK). Loaded only
# when ``--pipeline qwen`` is selected; CUDA/cloud-class (does not fit MPS). Prototype
# oracle floors (single-seed, 2026-06-19, pending seed-repeat cert): OpenAI clears at
# strength ~0.10, Google/Gemini at ~0.30 (0.20 still detected) -- the latter is HIGHER
# than the certified controlnet Google floor (0.15), so pass an explicit ``--strength``
# for Gemini content on this pipeline until a Qwen-specific ladder is certified.
# (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there
# is no QWEN_PROFILE constant -- only the model id is referenced from code.)
QWEN_MODEL_ID = "Qwen/Qwen-Image"
# Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img
# profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's
# name before ``controlnet`` became the default-selected pipeline, 2026-06-09).
@@ -1,6 +1,14 @@
"""Watermark removal using diffusion model regeneration attack.
Two pipelines:
Three pipelines (selected by the explicit ``pipeline`` ctor arg):
0. ``qwen`` -- Qwen-Image (20B MMDiT, Apache-2.0) img2img. The scrub still comes from
the img2img ``strength``; Qwen preserves text (incl. CJK) and structure markedly
better than SDXL at the scrub floor, so it over-regenerates real photos far less.
CUDA/cloud-class (does not fit MPS). See ``watermark_profiles`` for the prototype
oracle floors (pending seed-repeat cert).
Two SDXL pipelines:
1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark
REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet
only PRESERVES structure (text/faces) by conditioning on the edge map. No original
@@ -36,6 +44,7 @@ from remove_ai_watermarks.noai.watermark_profiles import (
CONTROLNET_CANNY_MODEL,
DEFAULT_MODEL_ID,
DEFAULT_STRENGTH,
QWEN_MODEL_ID,
normalize_profile,
resolve_strength,
)
@@ -308,6 +317,29 @@ _CANNY_HIGH = 200
_CONTROLNET_PROMPT = "best quality, high quality, sharp, detailed, photographic"
_CONTROLNET_NEGATIVE = "blurry, lowres, deformed, distorted text, garbled text, watermark, jpeg artifacts"
# Neutral prompts for the Qwen-Image img2img pass (faithful regeneration, not an edit).
_QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original"
_QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
def _build_qwen_kwargs(
image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any
) -> dict[str, Any]:
"""Build the QwenImageImg2ImgPipeline call kwargs (pure; unit-tested without torch).
Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an
explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``.
"""
return {
"prompt": _QWEN_PROMPT,
"negative_prompt": _QWEN_NEGATIVE,
"image": image,
"strength": strength,
"num_inference_steps": num_inference_steps,
"true_cfg_scale": true_cfg_scale,
"generator": generator,
}
class WatermarkRemover:
"""Remove watermarks from images using diffusion model regeneration.
@@ -348,6 +380,11 @@ class WatermarkRemover:
if torch_dtype is None:
if self.device == "cpu" or self.device == "mps":
self.torch_dtype = torch.float32 # type: ignore
elif self.model_profile == "qwen":
# Qwen-Image is published in bf16; fp16 risks overflow on the 20B MMDiT.
# cuda/xpu-only by construction: the cpu/mps guard above already forced
# fp32, and the 20B model does not fit MPS anyway.
self.torch_dtype = torch.bfloat16 # type: ignore
else:
self.torch_dtype = torch.float16 # type: ignore
else:
@@ -355,6 +392,7 @@ class WatermarkRemover:
self._pipeline: AutoImg2ImgPipeline | None = None
self._controlnet_pipeline: Any = None
self._qwen_pipeline: Any = None
self._progress_callback = progress_callback
self.hf_token: str | None = hf_token or os.environ.get("HF_TOKEN")
@@ -369,7 +407,9 @@ class WatermarkRemover:
def preload(self) -> None:
"""Eagerly load the pipeline so download progress bars are visible."""
if self.model_profile == "controlnet":
if self.model_profile == "qwen":
self._load_qwen_pipeline()
elif self.model_profile == "controlnet":
self._load_controlnet_pipeline()
else:
self._load_pipeline()
@@ -420,19 +460,27 @@ class WatermarkRemover:
return pipeline
def _base_load_kwargs(self) -> dict[str, Any]:
"""The ``from_pretrained`` kwargs shared by all three loaders (dtype + token).
Each loader adds its own extras (SDXL safety_checker + fp16 VAE, the ControlNet
model, etc.). Centralizing the dtype/token pair avoids the drift trap of three
copies (a token forgotten on one loader silently breaks gated downloads there).
"""
load_kwargs: dict[str, Any] = {"torch_dtype": self.torch_dtype}
if self.hf_token:
load_kwargs["token"] = self.hf_token
return load_kwargs
def _load_pipeline(self) -> AutoImg2ImgPipeline:
"""Load the plain SDXL img2img pipeline lazily."""
if self._pipeline is None:
logger.info("Loading model %s on %s...", self.model_id, self.device)
self._set_progress(f"Loading model weights: {self.model_id}")
load_kwargs: dict[str, Any] = {
"torch_dtype": self.torch_dtype,
"safety_checker": None,
"requires_safety_checker": False,
}
if self.hf_token:
load_kwargs["token"] = self.hf_token
load_kwargs = self._base_load_kwargs()
load_kwargs["safety_checker"] = None
load_kwargs["requires_safety_checker"] = False
self._maybe_add_fp16_vae(load_kwargs)
pipeline = AutoImg2ImgPipeline.from_pretrained(self.model_id, **load_kwargs) # type: ignore
@@ -458,9 +506,8 @@ class WatermarkRemover:
self._set_progress(f"Loading ControlNet: {CONTROLNET_CANNY_MODEL}")
controlnet = ControlNetModel.from_pretrained(CONTROLNET_CANNY_MODEL, torch_dtype=self.torch_dtype)
load_kwargs: dict[str, Any] = {"controlnet": controlnet, "torch_dtype": self.torch_dtype}
if self.hf_token:
load_kwargs["token"] = self.hf_token
load_kwargs = self._base_load_kwargs()
load_kwargs["controlnet"] = controlnet
self._maybe_add_fp16_vae(load_kwargs)
self._set_progress(f"Loading model weights: {self.model_id}")
@@ -474,6 +521,37 @@ class WatermarkRemover:
return self._controlnet_pipeline
def _load_qwen_pipeline(self) -> Any:
"""Load the Qwen-Image img2img pipeline lazily.
Qwen-Image is its OWN base model (not an SDXL add-on), so it loads
``QWEN_MODEL_ID`` unless the caller passed a custom ``--model``. Needs a
diffusers build that ships ``QwenImageImg2ImgPipeline``; raises a clear error
otherwise. CUDA/cloud-class (the 20B MMDiT does not fit MPS).
"""
if self._qwen_pipeline is None:
try:
from diffusers import QwenImageImg2ImgPipeline
except ImportError as exc:
raise ImportError(
"The 'qwen' pipeline needs a diffusers version that ships "
"QwenImageImg2ImgPipeline. Upgrade: pip install -U diffusers"
) from exc
# Use the Qwen base unless the user explicitly overrode --model.
model = self.model_id if self.model_id != self.DEFAULT_MODEL_ID else QWEN_MODEL_ID
logger.info("Loading Qwen-Image (%s) on %s...", model, self.device)
self._set_progress(f"Loading model weights: {model}")
pipeline = QwenImageImg2ImgPipeline.from_pretrained(model, **self._base_load_kwargs())
pipeline = self._move_to_device_and_optimize(pipeline)
with contextlib.suppress(Exception):
pipeline.set_progress_bar_config(disable=True)
logger.info("Qwen-Image model loaded successfully")
self._qwen_pipeline = pipeline
return self._qwen_pipeline
# ── Core removal ─────────────────────────────────────────────────
def remove_watermark(
@@ -552,6 +630,8 @@ class WatermarkRemover:
_total_start = time.monotonic()
def _generate_one(img: Image.Image) -> Image.Image:
if self.model_profile == "qwen":
return self._run_qwen(img, strength, num_inference_steps, guidance_scale, generator)
if self.model_profile == "controlnet":
return self._run_controlnet(img, strength, num_inference_steps, guidance_scale, generator)
return self._run_img2img(img, strength, num_inference_steps, guidance_scale, generator)
@@ -725,6 +805,30 @@ class WatermarkRemover:
self._controlnet_pipeline = None
return self._load_controlnet_pipeline()
# ── Qwen runner ──────────────────────────────────────────────────
def _run_qwen(
self,
init_image: Image.Image,
strength: float,
num_inference_steps: int,
guidance_scale: float,
generator: Any,
) -> Image.Image:
"""Run the Qwen-Image img2img pass.
Removal comes from the img2img ``strength`` (same lever as the SDXL paths);
Qwen-Image preserves text/structure markedly better at the scrub floor. The
CLI ``guidance_scale`` maps to Qwen's ``true_cfg_scale`` (~4.0 is typical;
the SDXL default of 7.5 is high for Qwen). No MPS->CPU fallback: the 20B MMDiT
is CUDA/cloud-class and does not run on MPS, so an error here propagates.
"""
pipeline = self._load_qwen_pipeline()
self._set_progress(f"Running Qwen-Image img2img (strength={strength}, true_cfg={guidance_scale})...")
kwargs = _build_qwen_kwargs(init_image, strength, num_inference_steps, guidance_scale, generator)
result = pipeline(**kwargs)
return result.images[0]
# ── Batch ────────────────────────────────────────────────────────
def remove_watermark_batch(
+30
View File
@@ -115,6 +115,7 @@ class TestModelProfiles:
def test_canonical_profiles_unchanged(self):
assert normalize_profile("sdxl") == "sdxl"
assert normalize_profile("controlnet") == "controlnet"
assert normalize_profile("qwen") == "qwen"
def test_default_alias_resolves_to_sdxl(self):
# "default" is the legacy alias for "sdxl" (back-compat for existing scripts).
@@ -125,6 +126,35 @@ class TestModelProfiles:
assert normalize_profile("CONTROLNET") == "controlnet"
class TestQwenKwargs:
"""_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape.
watermark_remover imports torch under a try/except, so the module (and this pure
helper) imports fine in the core+dev CI env where torch is absent.
"""
def test_uses_true_cfg_not_guidance_scale(self):
from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
gen = object()
kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
# Qwen uses true_cfg_scale, NOT SDXL's guidance_scale.
assert kwargs["true_cfg_scale"] == 4.0
assert "guidance_scale" not in kwargs
# The scrub still comes from strength; image + generator pass through.
assert kwargs["strength"] == 0.3
assert kwargs["image"] == "IMG"
assert kwargs["generator"] is gen
# Faithful-regeneration prompt + an explicit negative prompt.
assert kwargs["prompt"]
assert kwargs["negative_prompt"]
def test_qwen_model_id_is_qwen_image(self):
from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID
assert QWEN_MODEL_ID == "Qwen/Qwen-Image"
class TestResolveStrength:
"""resolve_strength applies the vendor default only when strength is unset."""