diff --git a/CLAUDE.md b/CLAUDE.md index 89c038b..18ae668 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -18,7 +18,7 @@ Consequences for contributors (do not drift back into the stock niche just becau ## How to run - `uv run remove-ai-watermarks all -o ` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True. -- `uv run remove-ai-watermarks invisible -o ` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). +- `uv run remove-ai-watermarks invisible -o ` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). - `uv run remove-ai-watermarks visible -o ` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark ` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0). - `uv run remove-ai-watermarks erase --region x,y,w,h -o ` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable. - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector @@ -61,7 +61,7 @@ Compact map. The full per-module detail (design decisions, tuned thresholds, cal - `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet). - `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`. - `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise. -- `noai/watermark_remover.py` — `WatermarkRemover` with two diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img) and `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). +- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best text/structure preservation at the scrub floor; `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen prototype oracle floors (single-seed, pending seed-repeat cert): OpenAI ~0.10, Gemini ~0.30 (higher than the controlnet Gemini floor — pass explicit `--strength` for Gemini on `qwen` until certified). No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). - `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. New tile-blend tuning goes in those pure helpers; do not inline blend math into the runner. - `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit). - `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text. diff --git a/README.md b/README.md index a1082d8..ea5ace5 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ It does **not** target watermarks that protect someone else's paid or copyrighte - **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType - **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph) - **Analog Humanizer** — optional film grain and chromatic aberration post-processing -- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID. +- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. An experimental `--pipeline qwen` runs Qwen-Image (20B, Apache-2.0) img2img, which preserves text (including CJK) and structure better still at the scrub floor; it is CUDA/cloud-class (does not fit MPS), and its strength floors are not yet certified (pass an explicit `--strength`, especially for Gemini content). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID. - **Batch processing** — process entire directories - **Detection** — three-stage NCC watermark detection with confidence scoring - **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, EXIF, or JPEG segment), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the C2PA cloud-manifest reference (Adobe Durable Content Credentials, when the embedded manifest is stripped), the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output) diff --git a/docs/known-limitations.md b/docs/known-limitations.md index d3d8e59..ec237e5 100644 --- a/docs/known-limitations.md +++ b/docs/known-limitations.md @@ -131,3 +131,11 @@ See `docs/synthid.md` §5.5 + `docs/controlnet-removal-pipeline-research.md` (ce `controlnet_conditioning_scale` (CLI `--controlnet-scale`, default 1.0) is the structure-preservation knob (higher = closer to the original structure); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The `controlnet` profile is threaded explicitly (`WatermarkRemover(pipeline=...)` / `InvisibleEngine(pipeline=...)`), NOT inferred from `model_id`. This productionizes the `scripts/controlnet_sweep.py` prototype; see `docs/controlnet-removal-pipeline-research.md`. **Forensic-stealth caveat still applies** (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output. + +## `qwen` pipeline (experimental, Qwen-Image 20B, uncertified floors) + +`--pipeline qwen` runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights), as an img2img alternative to the SDXL pipelines. Motivation: the controlnet over-regeneration problem above (it plasticizes real photos / loses fine text at the scrub floor). Qwen-Image renders text natively (incl. CJK) and preserves structure markedly better, so at the strength that removes SynthID it damages real content far less. + +The scrub still comes from the img2img `strength` (same lever as SDXL); the call shape lives in the pure `_build_qwen_kwargs` (uses Qwen's `true_cfg_scale`, not SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it, and ~4.0 is typical vs the SDXL default 7.5). bf16 on CUDA. It is **CUDA/cloud-class — the 20B does not fit MPS — so `_run_qwen` has NO MPS→CPU fallback** (unlike the SDXL paths). Cost on Modal A100-80GB is ~$0.05-0.10/image vs SDXL. + +**Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19 — PENDING seed-repeat cert):** on native-resolution OpenAI and Gemini cert inputs (both controls SynthID-POSITIVE), OpenAI cleared at strength **0.10** and Gemini at **0.30** (0.20 still detected). At those floors CJK text and faces stayed faithful (the zoom comparison showed controlnet-style plastication absent). Two caveats before relying on it: (1) near-floor scrub is SEED-NON-DETERMINISTIC (the general known-limitation above), so these single-seed floors are NOT certified — run a seed-repeat sweep before trusting them; (2) `resolve_strength` is shared and pipeline-independent, so the Gemini default (0.15, the certified controlnet floor) UNDER-scrubs Gemini on `qwen` (whose floor is ~0.30) — **pass an explicit `--strength` for Gemini content on `qwen`** until a Qwen-specific ladder is certified. Flat-graphic content was not in the prototype sample. diff --git a/docs/module-internals.md b/docs/module-internals.md index 292d57e..e68b2cc 100644 --- a/docs/module-internals.md +++ b/docs/module-internals.md @@ -177,10 +177,12 @@ Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight bo ## `noai/watermark_remover.py` -`noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). +`noai/watermark_remover.py` — the `WatermarkRemover` class has three diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id`). `sdxl`/`controlnet` share the SDXL base (`DEFAULT_MODEL_ID`); `qwen` is its own base (`QWEN_MODEL_ID`). **`sdxl`** (renamed from `default` 2026-06-09; `default` kept as a back-compat alias via `normalize_profile`) runs plain SDXL img2img (`_run_img2img`); it is the lighter opt-down alternative (no ControlNet weights). +**`qwen`** (`_run_qwen`, `_load_qwen_pipeline`) runs `QwenImageImg2ImgPipeline` on `Qwen/Qwen-Image` (20B MMDiT, Apache-2.0 code AND weights). The scrub still comes from the img2img `strength`; Qwen's value is that it preserves text (incl. CJK) and structure markedly better than SDXL at the scrub floor, so it over-regenerates real photos far less (directly targets the controlnet over-regeneration problem). Specifics: bf16 on CUDA (fp16 risks overflow on the 20B MMDiT — see the dtype branch in `__init__`); loads `QWEN_MODEL_ID` unless `--model` is overridden; the call shape lives in the pure module helper `_build_qwen_kwargs` (unit-tested without torch in `tests/test_platform.py::TestQwenKwargs`), which uses Qwen's `true_cfg_scale` (NOT SDXL's `guidance_scale` — the CLI `--guidance-scale` maps onto it; ~4.0 is typical, the SDXL default 7.5 is high for Qwen) and an explicit `negative_prompt` (`_QWEN_PROMPT`/`_QWEN_NEGATIVE`). It is CUDA/cloud-class (the 20B does not fit MPS), so `_run_qwen` has NO MPS->CPU fallback — an error propagates. `_load_qwen_pipeline` raises a clear ImportError if the installed diffusers lacks `QwenImageImg2ImgPipeline`. **Prototype oracle floors (Modal A100-80GB, single seed, 2026-06-19, PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20 still detected) — both controls were SynthID-positive; at those floors CJK text + faces stay faithful where controlnet plasticizes. The Gemini floor (0.30) is HIGHER than the certified controlnet Gemini floor (0.15), and `resolve_strength` is shared/pipeline-independent, so pass an explicit `--strength` for Gemini content on `qwen` until a Qwen-specific ladder is certified.** + **`controlnet`** (**the DEFAULT pipeline since 2026-06-09** for `invisible`/`all`/`batch` and both engine ctors; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map.** diff --git a/scripts/qwen_scrub_prototype.py b/scripts/qwen_scrub_prototype.py new file mode 100644 index 0000000..5b6a86e --- /dev/null +++ b/scripts/qwen_scrub_prototype.py @@ -0,0 +1,128 @@ +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "diffusers>=0.35.0", +# "transformers>=4.51.0", +# "torch", +# "accelerate", +# "pillow", +# "click", +# ] +# /// +"""Isolated GPU prototype: does a low-strength Qwen-Image img2img pass scrub the +invisible watermark while keeping text/structure legible? + +This is the oracle-gated experiment behind Library roadmap P1#5 (migrate the +invisible pipeline onto Qwen-Image-Edit). It is DELIBERATELY standalone: + + * It is NOT imported by the package and NOT in ``uv.lock``. Qwen-Image needs a + newer ``diffusers``/``transformers`` (Qwen2.5-VL text encoder) than the SDXL + pipeline is pinned to, so wiring it into the locked env would risk the + certified SDXL/ControlNet pipeline (the ``cannot import Qwen3VL...`` trap). + PEP 723 inline metadata lets ``uv run`` build a throwaway env for it instead. + * Qwen-Image is ~20B, so it needs a real GPU (CUDA) -- it will not fit on MPS. + +Run (on a GPU box / Modal), then eyeball the outputs AND submit them to the +matching oracle (openai.com/verify for OpenAI, the Gemini app for Google): + + uv run scripts/qwen_scrub_prototype.py INPUT.png -o out/ --strengths 0.1,0.2,0.3,0.4 + +What to look for: + * SCRUB: the oracle no longer reports the watermark at some strength. + * FIDELITY: text stays legible and faces/structure stay faithful at that same + strength -- the whole point of trying Qwen over SDXL (which garbles text). +The smallest strength that clears the oracle while keeping fidelity is the result +to compare against the SDXL/ControlNet floors (OpenAI 0.10 / Google 0.15). +""" + +from __future__ import annotations + +import logging +from pathlib import Path + +import click + +log = logging.getLogger("qwen_proto") + +# A neutral, faithful-regeneration prompt (we want to scrub, not restyle); mirrors +# the intent of the SDXL controlnet prompt. Qwen renders text natively, so a light +# pass should keep captions legible where SDXL would garble them. +_PROMPT = "high quality, sharp, detailed, faithful to the original" +_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts" + + +def _pick_device(requested: str) -> tuple[str, object]: + import torch + + if requested != "auto": + device = requested + elif torch.cuda.is_available(): + device = "cuda" + elif getattr(torch.backends, "mps", None) is not None and torch.backends.mps.is_available(): + device = "mps" + else: + device = "cpu" + # bf16 on CUDA (Qwen's reference dtype); fp32 elsewhere for numerical safety. + dtype = torch.bfloat16 if device == "cuda" else torch.float32 + return device, dtype + + +@click.command() +@click.argument("source", type=click.Path(exists=True, path_type=Path)) +@click.option("-o", "--output-dir", type=click.Path(path_type=Path), default=Path("qwen_out")) +@click.option("--strengths", default="0.1,0.2,0.3,0.4", help="Comma-separated img2img strengths to sweep.") +@click.option("--steps", type=int, default=40, help="Inference steps.") +@click.option("--cfg", type=float, default=4.0, help="true_cfg_scale (Qwen's CFG; reference default 4.0).") +@click.option("--model", default="Qwen/Qwen-Image", help="HF model id (Qwen-Image img2img base).") +@click.option("--device", default="auto", type=click.Choice(["auto", "cuda", "mps", "cpu"])) +@click.option("--seed", type=int, default=0, help="Reproducible seed.") +def main( + source: Path, + output_dir: Path, + strengths: str, + steps: int, + cfg: float, + model: str, + device: str, + seed: int, +) -> None: + """Sweep Qwen-Image img2img strength over SOURCE and save one output per strength.""" + logging.basicConfig(level=logging.INFO, format="%(message)s") + import torch + from diffusers import QwenImageImg2ImgPipeline + from PIL import Image + + dev, dtype = _pick_device(device) + log.info("Loading %s on %s (%s)...", model, dev, dtype) + pipe = QwenImageImg2ImgPipeline.from_pretrained(model, torch_dtype=dtype) + pipe = pipe.to(dev) + + init_image = Image.open(source).convert("RGB") + output_dir.mkdir(parents=True, exist_ok=True) + values = [float(s) for s in strengths.split(",") if s.strip()] + + for strength in values: + generator = torch.Generator(device="cpu").manual_seed(seed) + log.info("Generating strength=%.2f ...", strength) + result = pipe( + prompt=_PROMPT, + negative_prompt=_NEGATIVE, + image=init_image, + strength=strength, + num_inference_steps=steps, + true_cfg_scale=cfg, + generator=generator, + ) + out_path = output_dir / f"{source.stem}_qwen_s{strength:.2f}.png" + result.images[0].save(out_path) + log.info(" saved %s", out_path) + + log.info( + "\nDone. Eyeball text/face fidelity, then submit each output to the matching oracle " + "(openai.com/verify / Gemini app). The smallest strength that clears the oracle while " + "keeping fidelity is the number to compare against the SDXL floors (OpenAI 0.10 / Google 0.15)." + ) + + +if __name__ == "__main__": + main() diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index d88a009..667258d 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -253,15 +253,16 @@ def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str | return normalized -# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the -# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl`` -# (warned + normalized away by _normalize_pipeline). -_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"] +# ``controlnet`` (the default-SELECTED value), ``sdxl`` (plain SDXL img2img) and +# ``qwen`` (Qwen-Image, CUDA/cloud-class) are the current profiles; ``default`` is an +# OUTDATED back-compat alias for ``sdxl`` (warned + normalized away by _normalize_pipeline). +_PIPELINE_CHOICES = ["sdxl", "controlnet", "qwen", "default"] _PIPELINE_HELP = ( "Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves " "text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img " - "(lighter, no extra model download, but leaves SynthID on flat-graphic content). " - "('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)" + "(lighter, no extra model download, but leaves SynthID on flat-graphic content); " + "qwen = Qwen-Image (20B, Apache-2.0) img2img, best text/structure preservation but " + "CUDA/cloud-class (does not fit MPS). ('default' is an OUTDATED alias for 'sdxl'.)" ) # Shared --pipeline / --strength decorators so the three diffusion commands diff --git a/src/remove_ai_watermarks/invisible_engine.py b/src/remove_ai_watermarks/invisible_engine.py index cb2076e..59af9d9 100644 --- a/src/remove_ai_watermarks/invisible_engine.py +++ b/src/remove_ai_watermarks/invisible_engine.py @@ -103,8 +103,9 @@ class InvisibleEngine: device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto. pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet that preserves text/face structure via edge conditioning while removing - SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on - flat-graphic content). "default" is a back-compat alias for "sdxl". + SynthID), "sdxl" (plain SDXL img2img, lighter but leaves SynthID on + flat-graphic content), or "qwen" (Qwen-Image 20B img2img, best text/ + structure preservation but CUDA/cloud-class). "default" aliases "sdxl". hf_token: HuggingFace API token. progress_callback: Optional callback for progress messages. controlnet_conditioning_scale: ControlNet structure-preservation diff --git a/src/remove_ai_watermarks/noai/watermark_profiles.py b/src/remove_ai_watermarks/noai/watermark_profiles.py index 96c9995..8448f0e 100644 --- a/src/remove_ai_watermarks/noai/watermark_profiles.py +++ b/src/remove_ai_watermarks/noai/watermark_profiles.py @@ -12,6 +12,17 @@ if TYPE_CHECKING: DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0" +# Qwen-Image (20B MMDiT, Apache-2.0 code AND weights) base for the ``qwen`` pipeline: +# an img2img alternative to SDXL with native text rendering (incl. CJK). Loaded only +# when ``--pipeline qwen`` is selected; CUDA/cloud-class (does not fit MPS). Prototype +# oracle floors (single-seed, 2026-06-19, pending seed-repeat cert): OpenAI clears at +# strength ~0.10, Google/Gemini at ~0.30 (0.20 still detected) -- the latter is HIGHER +# than the certified controlnet Google floor (0.15), so pass an explicit ``--strength`` +# for Gemini content on this pipeline until a Qwen-specific ladder is certified. +# (Dispatch uses the bare "qwen" literal, matching the sdxl/controlnet sites, so there +# is no QWEN_PROFILE constant -- only the model id is referenced from code.) +QWEN_MODEL_ID = "Qwen/Qwen-Image" + # Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img # profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's # name before ``controlnet`` became the default-selected pipeline, 2026-06-09). diff --git a/src/remove_ai_watermarks/noai/watermark_remover.py b/src/remove_ai_watermarks/noai/watermark_remover.py index 666951a..3c4a2c6 100644 --- a/src/remove_ai_watermarks/noai/watermark_remover.py +++ b/src/remove_ai_watermarks/noai/watermark_remover.py @@ -1,6 +1,14 @@ """Watermark removal using diffusion model regeneration attack. -Two pipelines: +Three pipelines (selected by the explicit ``pipeline`` ctor arg): + +0. ``qwen`` -- Qwen-Image (20B MMDiT, Apache-2.0) img2img. The scrub still comes from + the img2img ``strength``; Qwen preserves text (incl. CJK) and structure markedly + better than SDXL at the scrub floor, so it over-regenerates real photos far less. + CUDA/cloud-class (does not fit MPS). See ``watermark_profiles`` for the prototype + oracle floors (pending seed-repeat cert). + +Two SDXL pipelines: 1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet only PRESERVES structure (text/faces) by conditioning on the edge map. No original @@ -36,6 +44,7 @@ from remove_ai_watermarks.noai.watermark_profiles import ( CONTROLNET_CANNY_MODEL, DEFAULT_MODEL_ID, DEFAULT_STRENGTH, + QWEN_MODEL_ID, normalize_profile, resolve_strength, ) @@ -308,6 +317,29 @@ _CANNY_HIGH = 200 _CONTROLNET_PROMPT = "best quality, high quality, sharp, detailed, photographic" _CONTROLNET_NEGATIVE = "blurry, lowres, deformed, distorted text, garbled text, watermark, jpeg artifacts" +# Neutral prompts for the Qwen-Image img2img pass (faithful regeneration, not an edit). +_QWEN_PROMPT = "high quality, sharp, detailed, faithful to the original" +_QWEN_NEGATIVE = "blurry, lowres, distorted text, garbled text, artifacts" + + +def _build_qwen_kwargs( + image: Image.Image, strength: float, num_inference_steps: int, true_cfg_scale: float, generator: Any +) -> dict[str, Any]: + """Build the QwenImageImg2ImgPipeline call kwargs (pure; unit-tested without torch). + + Qwen-Image uses ``true_cfg_scale`` (not SDXL's ``guidance_scale``) and takes an + explicit ``negative_prompt``; the scrub still comes from the img2img ``strength``. + """ + return { + "prompt": _QWEN_PROMPT, + "negative_prompt": _QWEN_NEGATIVE, + "image": image, + "strength": strength, + "num_inference_steps": num_inference_steps, + "true_cfg_scale": true_cfg_scale, + "generator": generator, + } + class WatermarkRemover: """Remove watermarks from images using diffusion model regeneration. @@ -348,6 +380,11 @@ class WatermarkRemover: if torch_dtype is None: if self.device == "cpu" or self.device == "mps": self.torch_dtype = torch.float32 # type: ignore + elif self.model_profile == "qwen": + # Qwen-Image is published in bf16; fp16 risks overflow on the 20B MMDiT. + # cuda/xpu-only by construction: the cpu/mps guard above already forced + # fp32, and the 20B model does not fit MPS anyway. + self.torch_dtype = torch.bfloat16 # type: ignore else: self.torch_dtype = torch.float16 # type: ignore else: @@ -355,6 +392,7 @@ class WatermarkRemover: self._pipeline: AutoImg2ImgPipeline | None = None self._controlnet_pipeline: Any = None + self._qwen_pipeline: Any = None self._progress_callback = progress_callback self.hf_token: str | None = hf_token or os.environ.get("HF_TOKEN") @@ -369,7 +407,9 @@ class WatermarkRemover: def preload(self) -> None: """Eagerly load the pipeline so download progress bars are visible.""" - if self.model_profile == "controlnet": + if self.model_profile == "qwen": + self._load_qwen_pipeline() + elif self.model_profile == "controlnet": self._load_controlnet_pipeline() else: self._load_pipeline() @@ -420,19 +460,27 @@ class WatermarkRemover: return pipeline + def _base_load_kwargs(self) -> dict[str, Any]: + """The ``from_pretrained`` kwargs shared by all three loaders (dtype + token). + + Each loader adds its own extras (SDXL safety_checker + fp16 VAE, the ControlNet + model, etc.). Centralizing the dtype/token pair avoids the drift trap of three + copies (a token forgotten on one loader silently breaks gated downloads there). + """ + load_kwargs: dict[str, Any] = {"torch_dtype": self.torch_dtype} + if self.hf_token: + load_kwargs["token"] = self.hf_token + return load_kwargs + def _load_pipeline(self) -> AutoImg2ImgPipeline: """Load the plain SDXL img2img pipeline lazily.""" if self._pipeline is None: logger.info("Loading model %s on %s...", self.model_id, self.device) self._set_progress(f"Loading model weights: {self.model_id}") - load_kwargs: dict[str, Any] = { - "torch_dtype": self.torch_dtype, - "safety_checker": None, - "requires_safety_checker": False, - } - if self.hf_token: - load_kwargs["token"] = self.hf_token + load_kwargs = self._base_load_kwargs() + load_kwargs["safety_checker"] = None + load_kwargs["requires_safety_checker"] = False self._maybe_add_fp16_vae(load_kwargs) pipeline = AutoImg2ImgPipeline.from_pretrained(self.model_id, **load_kwargs) # type: ignore @@ -458,9 +506,8 @@ class WatermarkRemover: self._set_progress(f"Loading ControlNet: {CONTROLNET_CANNY_MODEL}") controlnet = ControlNetModel.from_pretrained(CONTROLNET_CANNY_MODEL, torch_dtype=self.torch_dtype) - load_kwargs: dict[str, Any] = {"controlnet": controlnet, "torch_dtype": self.torch_dtype} - if self.hf_token: - load_kwargs["token"] = self.hf_token + load_kwargs = self._base_load_kwargs() + load_kwargs["controlnet"] = controlnet self._maybe_add_fp16_vae(load_kwargs) self._set_progress(f"Loading model weights: {self.model_id}") @@ -474,6 +521,37 @@ class WatermarkRemover: return self._controlnet_pipeline + def _load_qwen_pipeline(self) -> Any: + """Load the Qwen-Image img2img pipeline lazily. + + Qwen-Image is its OWN base model (not an SDXL add-on), so it loads + ``QWEN_MODEL_ID`` unless the caller passed a custom ``--model``. Needs a + diffusers build that ships ``QwenImageImg2ImgPipeline``; raises a clear error + otherwise. CUDA/cloud-class (the 20B MMDiT does not fit MPS). + """ + if self._qwen_pipeline is None: + try: + from diffusers import QwenImageImg2ImgPipeline + except ImportError as exc: + raise ImportError( + "The 'qwen' pipeline needs a diffusers version that ships " + "QwenImageImg2ImgPipeline. Upgrade: pip install -U diffusers" + ) from exc + + # Use the Qwen base unless the user explicitly overrode --model. + model = self.model_id if self.model_id != self.DEFAULT_MODEL_ID else QWEN_MODEL_ID + logger.info("Loading Qwen-Image (%s) on %s...", model, self.device) + self._set_progress(f"Loading model weights: {model}") + pipeline = QwenImageImg2ImgPipeline.from_pretrained(model, **self._base_load_kwargs()) + pipeline = self._move_to_device_and_optimize(pipeline) + with contextlib.suppress(Exception): + pipeline.set_progress_bar_config(disable=True) + + logger.info("Qwen-Image model loaded successfully") + self._qwen_pipeline = pipeline + + return self._qwen_pipeline + # ── Core removal ───────────────────────────────────────────────── def remove_watermark( @@ -552,6 +630,8 @@ class WatermarkRemover: _total_start = time.monotonic() def _generate_one(img: Image.Image) -> Image.Image: + if self.model_profile == "qwen": + return self._run_qwen(img, strength, num_inference_steps, guidance_scale, generator) if self.model_profile == "controlnet": return self._run_controlnet(img, strength, num_inference_steps, guidance_scale, generator) return self._run_img2img(img, strength, num_inference_steps, guidance_scale, generator) @@ -725,6 +805,30 @@ class WatermarkRemover: self._controlnet_pipeline = None return self._load_controlnet_pipeline() + # ── Qwen runner ────────────────────────────────────────────────── + + def _run_qwen( + self, + init_image: Image.Image, + strength: float, + num_inference_steps: int, + guidance_scale: float, + generator: Any, + ) -> Image.Image: + """Run the Qwen-Image img2img pass. + + Removal comes from the img2img ``strength`` (same lever as the SDXL paths); + Qwen-Image preserves text/structure markedly better at the scrub floor. The + CLI ``guidance_scale`` maps to Qwen's ``true_cfg_scale`` (~4.0 is typical; + the SDXL default of 7.5 is high for Qwen). No MPS->CPU fallback: the 20B MMDiT + is CUDA/cloud-class and does not run on MPS, so an error here propagates. + """ + pipeline = self._load_qwen_pipeline() + self._set_progress(f"Running Qwen-Image img2img (strength={strength}, true_cfg={guidance_scale})...") + kwargs = _build_qwen_kwargs(init_image, strength, num_inference_steps, guidance_scale, generator) + result = pipeline(**kwargs) + return result.images[0] + # ── Batch ──────────────────────────────────────────────────────── def remove_watermark_batch( diff --git a/tests/test_platform.py b/tests/test_platform.py index 463e62b..ae90334 100644 --- a/tests/test_platform.py +++ b/tests/test_platform.py @@ -115,6 +115,7 @@ class TestModelProfiles: def test_canonical_profiles_unchanged(self): assert normalize_profile("sdxl") == "sdxl" assert normalize_profile("controlnet") == "controlnet" + assert normalize_profile("qwen") == "qwen" def test_default_alias_resolves_to_sdxl(self): # "default" is the legacy alias for "sdxl" (back-compat for existing scripts). @@ -125,6 +126,35 @@ class TestModelProfiles: assert normalize_profile("CONTROLNET") == "controlnet" +class TestQwenKwargs: + """_build_qwen_kwargs is pure (no torch); guards the Qwen-Image call shape. + + watermark_remover imports torch under a try/except, so the module (and this pure + helper) imports fine in the core+dev CI env where torch is absent. + """ + + def test_uses_true_cfg_not_guidance_scale(self): + from remove_ai_watermarks.noai.watermark_remover import _build_qwen_kwargs + + gen = object() + kwargs = _build_qwen_kwargs("IMG", strength=0.3, num_inference_steps=40, true_cfg_scale=4.0, generator=gen) + # Qwen uses true_cfg_scale, NOT SDXL's guidance_scale. + assert kwargs["true_cfg_scale"] == 4.0 + assert "guidance_scale" not in kwargs + # The scrub still comes from strength; image + generator pass through. + assert kwargs["strength"] == 0.3 + assert kwargs["image"] == "IMG" + assert kwargs["generator"] is gen + # Faithful-regeneration prompt + an explicit negative prompt. + assert kwargs["prompt"] + assert kwargs["negative_prompt"] + + def test_qwen_model_id_is_qwen_image(self): + from remove_ai_watermarks.noai.watermark_profiles import QWEN_MODEL_ID + + assert QWEN_MODEL_ID == "Qwen/Qwen-Image" + + class TestResolveStrength: """resolve_strength applies the vendor default only when strength is unset."""