mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-05 07:57:50 +02:00
feat(invisible): add Qwen-Image img2img pipeline (--pipeline qwen)
A third diffusion pipeline alongside sdxl/controlnet: Qwen-Image (20B MMDiT, Apache-2.0 code AND weights) img2img. The scrub still comes from the img2img strength; Qwen preserves text (incl. CJK) and structure markedly better than SDXL at the scrub floor, so it over-regenerates real photos far less (directly targets the controlnet over-regeneration that degrades real uploads). - watermark_profiles: QWEN_MODEL_ID, normalize_profile accepts "qwen". - WatermarkRemover: _load_qwen_pipeline (bf16, loads Qwen base unless --model overridden, clear ImportError if diffusers lacks the class), _run_qwen (no MPS fallback -- 20B is CUDA/cloud-class), dispatch in _generate_one/preload, pure _build_qwen_kwargs (true_cfg_scale, not guidance_scale). - Shared _base_load_kwargs() across all three loaders (dtype + token). - CLI --pipeline gains "qwen"; invisible_engine threads it through. - scripts/qwen_scrub_prototype.py: standalone PEP 723 GPU experiment. Prototype oracle floors (Modal A100-80GB, single seed, controls SynthID-positive, PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20 still detected), with CJK text + faces faithful where controlnet plasticizes. The Gemini floor is higher than the shared default ladder, so pass an explicit --strength for Gemini on this pipeline until a Qwen-specific ladder is certified. The model-running path is CUDA-only (untestable locally); unit tests cover the pure call-shape (_build_qwen_kwargs) and profile normalization without torch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau
|
||||
## How to run
|
||||
|
||||
- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
|
||||
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
|
||||
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
|
||||
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
|
||||
- `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
|
||||
- `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
|
||||
@@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal
|
||||
- `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
|
||||
- `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
|
||||
- `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
|
||||
- `noai/watermark_remover.py` — `WatermarkRemover` with two diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated).
|
||||
- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best text/structure preservation at the scrub floor; `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen prototype oracle floors (single-seed, pending seed-repeat cert): OpenAI ~0.10, Gemini ~0.30 (higher than the controlnet Gemini floor — pass explicit `--strength` for Gemini on `qwen` until certified). No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated).
|
||||
- `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. New tile-blend tuning goes in those pure helpers; do not inline blend math into the runner.
|
||||
- `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
|
||||
- `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
|
||||
|
||||
@@ -33,7 +33,7 @@ It does **not** target watermarks that protect someone else's paid or copyrighte
|
||||
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
|
||||
- **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
|
||||
- **Analog Humanizer** — optional film grain and chromatic aberration post-processing
|
||||
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
|
||||
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. An experimental `--pipeline qwen` runs Qwen-Image (20B, Apache-2.0) img2img, which preserves text (including CJK) and structure better still at the scrub floor; it is CUDA/cloud-class (does not fit MPS), and its strength floors are not yet certified (pass an explicit `--strength`, especially for Gemini content). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
|
||||
- **Batch processing** — process entire directories
|
||||
- **Detection** — three-stage NCC watermark detection with confidence scoring
|
||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, EXIF, or JPEG segment), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the C2PA cloud-manifest reference (Adobe Durable Content Credentials, when the embedded manifest is stripped), the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||
|
||||
@@ -131,3 +131,11 @@ See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (ce
|
||||
`controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`.
|
||||
|
||||
**Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output.
|
||||
|
||||
## `qwen` pipeline (experimental, Qwen-Image 20B, uncertified floors)
|
||||
|
||||
`--pipeline qwen` runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights), as an img2img alternative to the SDXL pipelines. Motivation: the controlnet over-regeneration problem above (it plasticizes real photos / loses fine text at the scrub floor). Qwen-Image renders text natively (incl. CJK) and preserves structure markedly better, so at the strength that removes SynthID it damages real content far less.
|
||||
|
||||
The scrub still comes from the img2img `strength` (same lever as SDXL); the call shape lives in the pure `_build_qwen_kwargs` (uses Qwen's `true_cfg_scale`, not SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it, and ~4.0 is typical vs the SDXL default 7.5). bf16 on CUDA. It is **CUDA/cloud-class — the 20B does not fit MPS — so `_run_qwen` has NO MPS→CPU fallback** (unlike the SDXL paths). Cost on Modal A100-80GB is ~$0.05-0.10/image vs SDXL.
|
||||
|
||||
**Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19 — PENDING seed-repeat cert):** on native-resolution OpenAI and Gemini cert inputs (both controls SynthID-POSITIVE), OpenAI cleared at strength **0.10** and Gemini at **0.30** (0.20 still detected). At those floors CJK text and faces stayed faithful (the zoom comparison showed controlnet-style plastication absent). Two caveats before relying on it: (1) near-floor scrub is SEED-NON-DETERMINISTIC (the general known-limitation above), so these single-seed floors are NOT certified — run a seed-repeat sweep before trusting them; (2) `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15, the certified controlnet floor) UNDER-scrubs Gemini on `qwen` (whose floor is ~0.30) — **pass an explicit `--strength` for Gemini content on `qwen`** until a Qwen-specific ladder is certified. Flat-graphic content was not in the prototype sample.
|
||||
|
||||
@@ -177,10 +177,12 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo
|
||||
|
||||
## `noai/watermark_remover.py`
|
||||
|
||||
`noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`).
|
||||
`noai/watermark_remover.py` — the `WatermarkRemover` class has three diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id`). `sdxl`/`controlnet` share the SDXL base (`DEFAULT_MODEL_ID`); `qwen` is its own base (`QWEN_MODEL_ID`).
|
||||
|
||||
**`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights).
|
||||
|
||||
**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is that it preserves text (incl. CJK) and structure markedly better than SDXL at the scrub floor, so it over-regenerates real photos far less (directly targets the controlnet over-regeneration problem). Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19, PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20 still detected) — both controls were SynthID-positive; at those floors CJK text + faces stay faithful where controlnet plasticizes. The Gemini floor (0.30) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength` for Gemini content on `qwen` until a Qwen-specific ladder is certified.**
|
||||
|
||||
**`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`).
|
||||
|
||||
**Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.**
|
||||
|
||||
@@ -0,0 +1,128 @@
|
||||
# /// script
|
||||
# requires-python = ">=3.10"
|
||||
# dependencies = [
|
||||
# "diffusers>=0.35.0",
|
||||
# "transformers>=4.51.0",
|
||||
# "torch",
|
||||
# "accelerate",
|
||||
# "pillow",
|
||||
# "click",
|
||||
# ]
|
||||
# ///
|
||||
"""Isolated GPU prototype: does a low-strength Qwen-Image img2img pass scrub the
|
||||
invisible watermark while keeping text/structure legible?
|
||||
|
||||
This is the oracle-gated experiment behind Library roadmap P1#5 (migrate the
|
||||
invisible pipeline onto Qwen-Image-Edit). It is DELIBERATELY standalone:
|
||||
|
||||
* It is NOT imported by the package and NOT in ``uv.lock``. Qwen-Image needs a
|
||||
newer ``diffusers``/``transformers`` (Qwen2.5-VL text encoder) than the SDXL
|
||||
pipeline is pinned to, so wiring it into the locked env would risk the
|
||||
certified SDXL/ControlNet pipeline (the ``cannot import Qwen3VL...`` trap).
|
||||
PEP 723 inline metadata lets ``uv run`` build a throwaway env for it instead.
|
||||
* Qwen-Image is ~20B, so it needs a real GPU (CUDA) -- it will not fit on MPS.
|
||||
|
||||
Run (on a GPU box / Modal), then eyeball the outputs AND submit them to the
|
||||
matching oracle (openai.com/verify for OpenAI, the Gemini app for Google):
|
||||
|
||||
uv run scripts/qwen_scrub_prototype.py INPUT.png -o out/ --strengths 0.1,0.2,0.3,0.4
|
||||
|
||||
What to look for:
|
||||
* SCRUB: the oracle no longer reports the watermark at some strength.
|
||||
* FIDELITY: text stays legible and faces/structure stay faithful at that same
|
||||
strength -- the whole point of trying Qwen over SDXL (which garbles text).
|
||||
The smallest strength that clears the oracle while keeping fidelity is the result
|
||||
to compare against the SDXL/ControlNet floors (OpenAI 0.10 / Google 0.15).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
import click
|
||||
|
||||
log = logging.getLogger("qwen_proto")
|
||||
|
||||
# A neutral, faithful-regeneration prompt (we want to scrub, not restyle); mirrors
|
||||
# the intent of the SDXL controlnet prompt. Qwen renders text natively, so a light
|
||||
# pass should keep captions legible where SDXL would garble them.
|
||||
_PROMPT = "high quality, sharp, detailed, faithful to the original"
|
||||
_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
|
||||
|
||||
|
||||
def _pick_device(requested: str) -> tuple[str, object]:
|
||||
import torch
|
||||
|
||||
if requested != "auto":
|
||||
device = requested
|
||||
elif torch.cuda.is_available():
|
||||
device = "cuda"
|
||||
elif getattr(torch.backends, "mps", None) is not None and torch.backends.mps.is_available():
|
||||
device = "mps"
|
||||
else:
|
||||
device = "cpu"
|
||||
# bf16 on CUDA (Qwen's reference dtype); fp32 elsewhere for numerical safety.
|
||||
dtype = torch.bfloat16 if device == "cuda" else torch.float32
|
||||
return device, dtype
|
||||
|
||||
|
||||
@click.command()
|
||||
@click.argument("source", type=click.Path(exists=True, path_type=Path))
|
||||
@click.option("-o", "--output-dir", type=click.Path(path_type=Path), default=Path("qwen_out"))
|
||||
@click.option("--strengths", default="0.1,0.2,0.3,0.4", help="Comma-separated img2img strengths to sweep.")
|
||||
@click.option("--steps", type=int, default=40, help="Inference steps.")
|
||||
@click.option("--cfg", type=float, default=4.0, help="true_cfg_scale (Qwen's CFG; reference default 4.0).")
|
||||
@click.option("--model", default="Qwen/Qwen-Image", help="HF model id (Qwen-Image img2img base).")
|
||||
@click.option("--device", default="auto", type=click.Choice(["auto", "cuda", "mps", "cpu"]))
|
||||
@click.option("--seed", type=int, default=0, help="Reproducible seed.")
|
||||
def main(
|
||||
source: Path,
|
||||
output_dir: Path,
|
||||
strengths: str,
|
||||
steps: int,
|
||||
cfg: float,
|
||||
model: str,
|
||||
device: str,
|
||||
seed: int,
|
||||
) -> None:
|
||||
"""Sweep Qwen-Image img2img strength over SOURCE and save one output per strength."""
|
||||
logging.basicConfig(level=logging.INFO, format="%(message)s")
|
||||
import torch
|
||||
from diffusers import QwenImageImg2ImgPipeline
|
||||
from PIL import Image
|
||||
|
||||
dev, dtype = _pick_device(device)
|
||||
log.info("Loading %s on %s (%s)...", model, dev, dtype)
|
||||
pipe = QwenImageImg2ImgPipeline.from_pretrained(model, torch_dtype=dtype)
|
||||
pipe = pipe.to(dev)
|
||||
|
||||
init_image = Image.open(source).convert("RGB")
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
values = [float(s) for s in strengths.split(",") if s.strip()]
|
||||
|
||||
for strength in values:
|
||||
generator = torch.Generator(device="cpu").manual_seed(seed)
|
||||
log.info("Generating strength=%.2f ...", strength)
|
||||
result = pipe(
|
||||
prompt=_PROMPT,
|
||||
negative_prompt=_NEGATIVE,
|
||||
image=init_image,
|
||||
strength=strength,
|
||||
num_inference_steps=steps,
|
||||
true_cfg_scale=cfg,
|
||||
generator=generator,
|
||||
)
|
||||
out_path = output_dir / f"{source.stem}_qwen_s{strength:.2f}.png"
|
||||
result.images[0].save(out_path)
|
||||
log.info(" saved %s", out_path)
|
||||
|
||||
log.info(
|
||||
"\nDone. Eyeball text/face fidelity, then submit each output to the matching oracle "
|
||||
"(openai.com/verify / Gemini app). The smallest strength that clears the oracle while "
|
||||
"keeping fidelity is the number to compare against the SDXL floors (OpenAI 0.10 / Google 0.15)."
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -253,15 +253,16 @@ def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str |
|
||||
return normalized
|
||||
|
||||
|
||||
# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the
|
||||
# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl``
|
||||
# (warned + normalized away by _normalize_pipeline).
|
||||
_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"]
|
||||
# ``controlnet`` (the default-SELECTED value), ``sdxl`` (plain SDXL img2img) and
|
||||
# ``qwen`` (Qwen-Image, CUDA/cloud-class) are the current profiles; ``default`` is an
|
||||
# OUTDATED back-compat alias for ``sdxl`` (warned + normalized away by _normalize_pipeline).
|
||||
_PIPELINE_CHOICES = ["sdxl", "controlnet", "qwen", "default"]
|
||||
_PIPELINE_HELP = (
|
||||
"Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves "
|
||||
"text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img "
|
||||
"(lighter, no extra model download, but leaves SynthID on flat-graphic content). "
|
||||
"('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)"
|
||||
"(lighter, no extra model download, but leaves SynthID on flat-graphic content); "
|
||||
"qwen = Qwen-Image (20B, Apache-2.0) img2img, best text/structure preservation but "
|
||||
"CUDA/cloud-class (does not fit MPS). ('default' is an OUTDATED alias for 'sdxl'.)"
|
||||
)
|
||||
|
||||
# Shared --pipeline / --strength decorators so the three diffusion commands
|
||||
|
||||
@@ -103,8 +103,9 @@ class InvisibleEngine:
|
||||
device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto.
|
||||
pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet
|
||||
that preserves text/face structure via edge conditioning while removing
|
||||
SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
|
||||
flat-graphic content). "default" is a back-compat alias for "sdxl".
|
||||
SynthID), "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
|
||||
flat-graphic content), or "qwen" (Qwen-Image 20B img2img, best text/
|
||||
structure preservation but CUDA/cloud-class). "default" aliases "sdxl".
|
||||
hf_token: HuggingFace API token.
|
||||
progress_callback: Optional callback for progress messages.
|
||||
controlnet_conditioning_scale: ControlNet structure-preservation
|
||||
|
||||
@@ -12,6 +12,17 @@ if TYPE_CHECKING:
|
||||
|
||||
DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
|
||||
|
||||
# Qwen-Image (20B MMDiT, Apache-2.0 code AND weights) base for the ``qwen`` pipeline:
|
||||
# an img2img alternative to SDXL with native text rendering (incl. CJK). Loaded only
|
||||
# when ``--pipeline qwen`` is selected; CUDA/cloud-class (does not fit MPS). Prototype
|
||||
# oracle floors (single-seed, 2026-06-19, pending seed-repeat cert): OpenAI clears at
|
||||
# strength ~0.10, Google/Gemini at ~0.30 (0.20 still detected) -- the latter is HIGHER
|
||||
# than the certified controlnet Google floor (0.15), so pass an explicit ``--strength``
|
||||
# for Gemini content on this pipeline until a Qwen-specific ladder is certified.
|
||||
# (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there
|
||||
# is no QWEN_PROFILE constant -- only the model id is referenced from code.)
|
||||
QWEN_MODEL_ID = "Qwen/Qwen-Image"
|
||||
|
||||
# Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img
|
||||
# profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's
|
||||
# name before ``controlnet`` became the default-selected pipeline, 2026-06-09).
|
||||
|
||||
@@ -1,6 +1,14 @@
|
||||
"""Watermark removal using diffusion model regeneration attack.
|
||||
|
||||
Two pipelines:
|
||||
Three pipelines (selected by the explicit ``pipeline`` ctor arg):
|
||||
|
||||
0. ``qwen`` -- Qwen-Image (20B MMDiT, Apache-2.0) img2img. The scrub still comes from
|
||||
the img2img ``strength``; Qwen preserves text (incl. CJK) and structure markedly
|
||||
better than SDXL at the scrub floor, so it over-regenerates real photos far less.
|
||||
CUDA/cloud-class (does not fit MPS). See ``watermark_profiles`` for the prototype
|
||||
oracle floors (pending seed-repeat cert).
|
||||
|
||||
Two SDXL pipelines:
|
||||
1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark
|
||||
REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet
|
||||
only PRESERVES structure (text/faces) by conditioning on the edge map. No original
|
||||
@@ -36,6 +44,7 @@ from remove_ai_watermarks.noai.watermark_profiles import (
|
||||
CONTROLNET_CANNY_MODEL,
|
||||
DEFAULT_MODEL_ID,
|
||||
DEFAULT_STRENGTH,
|
||||
QWEN_MODEL_ID,
|
||||
normalize_profile,
|
||||
resolve_strength,
|
||||
)
|
||||
@@ -308,6 +317,29 @@ _CANNY_HIGH = 200
|
||||
_CONTROLNET_PROMPT = "best quality, high quality, sharp, detailed, photographic"
|
||||
_CONTROLNET_NEGATIVE = "blurry, lowres, deformed, distorted text, garbled text, watermark, jpeg artifacts"
|
||||
|
||||
# Neutral prompts for the Qwen-Image img2img pass (faithful regeneration, not an edit).
|
||||
_QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original"
|
||||
_QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts"
|
||||
|
||||
|
||||
def _build_qwen_kwargs(
|
||||
image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any
|
||||
) -> dict[str, Any]:
|
||||
"""Build the QwenImageImg2ImgPipeline call kwargs (pure; unit-tested without torch).
|
||||
|
||||
Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an
|
||||
explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``.
|
||||
"""
|
||||
return {
|
||||
"prompt": _QWEN_PROMPT,
|
||||
"negative_prompt": _QWEN_NEGATIVE,
|
||||
"image": image,
|
||||
"strength": strength,
|
||||
"num_inference_steps": num_inference_steps,
|
||||
"true_cfg_scale": true_cfg_scale,
|
||||
"generator": generator,
|
||||
}
|
||||
|
||||
|
||||
class WatermarkRemover:
|
||||
"""Remove watermarks from images using diffusion model regeneration.
|
||||
@@ -348,6 +380,11 @@ class WatermarkRemover:
|
||||
if torch_dtype is None:
|
||||
if self.device == "cpu" or self.device == "mps":
|
||||
self.torch_dtype = torch.float32 # type: ignore
|
||||
elif self.model_profile == "qwen":
|
||||
# Qwen-Image is published in bf16; fp16 risks overflow on the 20B MMDiT.
|
||||
# cuda/xpu-only by construction: the cpu/mps guard above already forced
|
||||
# fp32, and the 20B model does not fit MPS anyway.
|
||||
self.torch_dtype = torch.bfloat16 # type: ignore
|
||||
else:
|
||||
self.torch_dtype = torch.float16 # type: ignore
|
||||
else:
|
||||
@@ -355,6 +392,7 @@ class WatermarkRemover:
|
||||
|
||||
self._pipeline: AutoImg2ImgPipeline | None = None
|
||||
self._controlnet_pipeline: Any = None
|
||||
self._qwen_pipeline: Any = None
|
||||
self._progress_callback = progress_callback
|
||||
self.hf_token: str | None = hf_token or os.environ.get("HF_TOKEN")
|
||||
|
||||
@@ -369,7 +407,9 @@ class WatermarkRemover:
|
||||
|
||||
def preload(self) -> None:
|
||||
"""Eagerly load the pipeline so download progress bars are visible."""
|
||||
if self.model_profile == "controlnet":
|
||||
if self.model_profile == "qwen":
|
||||
self._load_qwen_pipeline()
|
||||
elif self.model_profile == "controlnet":
|
||||
self._load_controlnet_pipeline()
|
||||
else:
|
||||
self._load_pipeline()
|
||||
@@ -420,19 +460,27 @@ class WatermarkRemover:
|
||||
|
||||
return pipeline
|
||||
|
||||
def _base_load_kwargs(self) -> dict[str, Any]:
|
||||
"""The ``from_pretrained`` kwargs shared by all three loaders (dtype + token).
|
||||
|
||||
Each loader adds its own extras (SDXL safety_checker + fp16 VAE, the ControlNet
|
||||
model, etc.). Centralizing the dtype/token pair avoids the drift trap of three
|
||||
copies (a token forgotten on one loader silently breaks gated downloads there).
|
||||
"""
|
||||
load_kwargs: dict[str, Any] = {"torch_dtype": self.torch_dtype}
|
||||
if self.hf_token:
|
||||
load_kwargs["token"] = self.hf_token
|
||||
return load_kwargs
|
||||
|
||||
def _load_pipeline(self) -> AutoImg2ImgPipeline:
|
||||
"""Load the plain SDXL img2img pipeline lazily."""
|
||||
if self._pipeline is None:
|
||||
logger.info("Loading model %s on %s...", self.model_id, self.device)
|
||||
self._set_progress(f"Loading model weights: {self.model_id}")
|
||||
|
||||
load_kwargs: dict[str, Any] = {
|
||||
"torch_dtype": self.torch_dtype,
|
||||
"safety_checker": None,
|
||||
"requires_safety_checker": False,
|
||||
}
|
||||
if self.hf_token:
|
||||
load_kwargs["token"] = self.hf_token
|
||||
load_kwargs = self._base_load_kwargs()
|
||||
load_kwargs["safety_checker"] = None
|
||||
load_kwargs["requires_safety_checker"] = False
|
||||
self._maybe_add_fp16_vae(load_kwargs)
|
||||
|
||||
pipeline = AutoImg2ImgPipeline.from_pretrained(self.model_id, **load_kwargs) # type: ignore
|
||||
@@ -458,9 +506,8 @@ class WatermarkRemover:
|
||||
self._set_progress(f"Loading ControlNet: {CONTROLNET_CANNY_MODEL}")
|
||||
controlnet = ControlNetModel.from_pretrained(CONTROLNET_CANNY_MODEL, torch_dtype=self.torch_dtype)
|
||||
|
||||
load_kwargs: dict[str, Any] = {"controlnet": controlnet, "torch_dtype": self.torch_dtype}
|
||||
if self.hf_token:
|
||||
load_kwargs["token"] = self.hf_token
|
||||
load_kwargs = self._base_load_kwargs()
|
||||
load_kwargs["controlnet"] = controlnet
|
||||
self._maybe_add_fp16_vae(load_kwargs)
|
||||
|
||||
self._set_progress(f"Loading model weights: {self.model_id}")
|
||||
@@ -474,6 +521,37 @@ class WatermarkRemover:
|
||||
|
||||
return self._controlnet_pipeline
|
||||
|
||||
def _load_qwen_pipeline(self) -> Any:
|
||||
"""Load the Qwen-Image img2img pipeline lazily.
|
||||
|
||||
Qwen-Image is its OWN base model (not an SDXL add-on), so it loads
|
||||
``QWEN_MODEL_ID`` unless the caller passed a custom ``--model``. Needs a
|
||||
diffusers build that ships ``QwenImageImg2ImgPipeline``; raises a clear error
|
||||
otherwise. CUDA/cloud-class (the 20B MMDiT does not fit MPS).
|
||||
"""
|
||||
if self._qwen_pipeline is None:
|
||||
try:
|
||||
from diffusers import QwenImageImg2ImgPipeline
|
||||
except ImportError as exc:
|
||||
raise ImportError(
|
||||
"The 'qwen' pipeline needs a diffusers version that ships "
|
||||
"QwenImageImg2ImgPipeline. Upgrade: pip install -U diffusers"
|
||||
) from exc
|
||||
|
||||
# Use the Qwen base unless the user explicitly overrode --model.
|
||||
model = self.model_id if self.model_id != self.DEFAULT_MODEL_ID else QWEN_MODEL_ID
|
||||
logger.info("Loading Qwen-Image (%s) on %s...", model, self.device)
|
||||
self._set_progress(f"Loading model weights: {model}")
|
||||
pipeline = QwenImageImg2ImgPipeline.from_pretrained(model, **self._base_load_kwargs())
|
||||
pipeline = self._move_to_device_and_optimize(pipeline)
|
||||
with contextlib.suppress(Exception):
|
||||
pipeline.set_progress_bar_config(disable=True)
|
||||
|
||||
logger.info("Qwen-Image model loaded successfully")
|
||||
self._qwen_pipeline = pipeline
|
||||
|
||||
return self._qwen_pipeline
|
||||
|
||||
# ── Core removal ─────────────────────────────────────────────────
|
||||
|
||||
def remove_watermark(
|
||||
@@ -552,6 +630,8 @@ class WatermarkRemover:
|
||||
_total_start = time.monotonic()
|
||||
|
||||
def _generate_one(img: Image.Image) -> Image.Image:
|
||||
if self.model_profile == "qwen":
|
||||
return self._run_qwen(img, strength, num_inference_steps, guidance_scale, generator)
|
||||
if self.model_profile == "controlnet":
|
||||
return self._run_controlnet(img, strength, num_inference_steps, guidance_scale, generator)
|
||||
return self._run_img2img(img, strength, num_inference_steps, guidance_scale, generator)
|
||||
@@ -725,6 +805,30 @@ class WatermarkRemover:
|
||||
self._controlnet_pipeline = None
|
||||
return self._load_controlnet_pipeline()
|
||||
|
||||
# ── Qwen runner ──────────────────────────────────────────────────
|
||||
|
||||
def _run_qwen(
|
||||
self,
|
||||
init_image: Image.Image,
|
||||
strength: float,
|
||||
num_inference_steps: int,
|
||||
guidance_scale: float,
|
||||
generator: Any,
|
||||
) -> Image.Image:
|
||||
"""Run the Qwen-Image img2img pass.
|
||||
|
||||
Removal comes from the img2img ``strength`` (same lever as the SDXL paths);
|
||||
Qwen-Image preserves text/structure markedly better at the scrub floor. The
|
||||
CLI ``guidance_scale`` maps to Qwen's ``true_cfg_scale`` (~4.0 is typical;
|
||||
the SDXL default of 7.5 is high for Qwen). No MPS->CPU fallback: the 20B MMDiT
|
||||
is CUDA/cloud-class and does not run on MPS, so an error here propagates.
|
||||
"""
|
||||
pipeline = self._load_qwen_pipeline()
|
||||
self._set_progress(f"Running Qwen-Image img2img (strength={strength}, true_cfg={guidance_scale})...")
|
||||
kwargs = _build_qwen_kwargs(init_image, strength, num_inference_steps, guidance_scale, generator)
|
||||
result = pipeline(**kwargs)
|
||||
return result.images[0]
|
||||
|
||||
# ── Batch ────────────────────────────────────────────────────────
|
||||
|
||||
def remove_watermark_batch(
|
||||
|
||||
@@ -115,6 +115,7 @@ class TestModelProfiles:
|
||||
def test_canonical_profiles_unchanged(self):
|
||||
assert normalize_profile("sdxl") == "sdxl"
|
||||
assert normalize_profile("controlnet") == "controlnet"
|
||||
assert normalize_profile("qwen") == "qwen"
|
||||
|
||||
def test_default_alias_resolves_to_sdxl(self):
|
||||
# "default" is the legacy alias for "sdxl" (back-compat for existing scripts).
|
||||
@@ -125,6 +126,35 @@ class TestModelProfiles:
|
||||
assert normalize_profile("CONTROLNET") == "controlnet"
|
||||
|
||||
|
||||
class TestQwenKwargs:
|
||||
"""_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape.
|
||||
|
||||
watermark_remover imports torch under a try/except, so the module (and this pure
|
||||
helper) imports fine in the core+dev CI env where torch is absent.
|
||||
"""
|
||||
|
||||
def test_uses_true_cfg_not_guidance_scale(self):
|
||||
from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs
|
||||
|
||||
gen = object()
|
||||
kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen)
|
||||
# Qwen uses true_cfg_scale, NOT SDXL's guidance_scale.
|
||||
assert kwargs["true_cfg_scale"] == 4.0
|
||||
assert "guidance_scale" not in kwargs
|
||||
# The scrub still comes from strength; image + generator pass through.
|
||||
assert kwargs["strength"] == 0.3
|
||||
assert kwargs["image"] == "IMG"
|
||||
assert kwargs["generator"] is gen
|
||||
# Faithful-regeneration prompt + an explicit negative prompt.
|
||||
assert kwargs["prompt"]
|
||||
assert kwargs["negative_prompt"]
|
||||
|
||||
def test_qwen_model_id_is_qwen_image(self):
|
||||
from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID
|
||||
|
||||
assert QWEN_MODEL_ID == "Qwen/Qwen-Image"
|
||||
|
||||
|
||||
class TestResolveStrength:
|
||||
"""resolve_strength applies the vendor default only when strength is unset."""
|
||||
|
||||
|
||||
Reference in New Issue
Block a user