mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-10 04:43:54 +02:00
feat(invisible): controlnet default, unified strength, retire --auto, add --model/--guidance-scale
Overhaul the diffusion-removal surface around a single robust default and a complete, consistent CLI. Pipeline + strength: - controlnet is now the DEFAULT pipeline (CLI --pipeline + both engine ctors). With the certified higher strength it clears both photoreal and flat-graphic content, whereas plain SDXL left SynthID on flat graphics. - Rename the plain-SDXL profile default -> sdxl; "default" stays as a back-compat alias (normalize_profile + a click callback that warns). - Unify the strength ladder: resolve_strength applies ONE vendor-adaptive ladder (the certified controlnet floors OpenAI 0.20 / Google 0.30 / unknown 0.30) to both pipelines. sdxl is the weaker remover on its own hard case (flat fills), so the certified floor is the right floor for it too. CLI completeness: - Add --model (HF model id) to invisible + batch (was only on all) and --guidance-scale (CFG) to all three diffusion commands; both were library knobs the CLI did not expose. - Flip --adaptive-polish to ON by default (it self-gates to a no-op where there is no detail deficit, so default-on is safe). - Share --pipeline / --strength / --model / --guidance-scale as single decorators so invisible/all/batch keep an identical surface; the --strength help is derived from the strength constants (strength_default_help) so it can never drift from the ladder. Removals: - Delete the auto_config content-detection planner + its YuNet/DBNet assets (~2.6 MB): with controlnet always the pipeline and the polish self-gating, the face/text/edge detection no longer changed behavior. --auto is now a deprecated no-op that only warns (the polish it enabled is the default). Docs (README, CLAUDE.md, docs/synthid.md) updated throughout; added an InvisibleEngine Python API example. Tests cover the alias warnings, the polish default, and the --model/--guidance-scale wiring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
|
||||
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
|
||||
- **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
|
||||
- **Analog Humanizer** — optional film grain and chromatic aberration post-processing
|
||||
- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
|
||||
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
|
||||
- **Batch processing** — process entire directories
|
||||
- **Detection** — three-stage NCC watermark detection with confidence scoring
|
||||
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
|
||||
@@ -118,15 +118,16 @@ The removal pipeline (default profile, SDXL):
|
||||
image → encode to latent space (VAE) at native resolution
|
||||
→ add controlled noise (forward diffusion)
|
||||
→ denoise (reverse diffusion, ~50 steps; strength is vendor-adaptive:
|
||||
0.10 OpenAI / 0.15 Google / 0.15 unknown, override with --strength)
|
||||
0.20 OpenAI / 0.30 Google / 0.30 unknown, same for both pipelines;
|
||||
override with --strength)
|
||||
→ decode back to pixels (VAE)
|
||||
```
|
||||
|
||||
- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. The floor upscale uses Lanczos by default; `--upscaler esrgan` (the `esrgan` extra) runs Real-ESRGAN first for sharper detail and falls back to Lanczos if the extra is absent. ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it is best for photo/texture content -- it can degrade faces (the diffusion pass regenerates them, so the final recovers) and thin text; keep Lanczos for text-heavy inputs.
|
||||
|
||||
> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength that clears it with the least quality loss: **OpenAI gpt-image → `0.10`**, **Google Gemini → `0.15`**, **unknown source → `0.15`**. An oracle-verified June 2026 study (clean pipeline, per-image openai.com/verify or Gemini app) found OpenAI's watermark clears at `0.05` across `1024`-`1600` px (resolution-independent) while Google's is ~3x more robust and needs `0.15`. The dominant factor is the vendor, not resolution. There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine text, lower it. (Caveat: Google's `0.15` was validated on the capped `--max-resolution 1536` path; a very large native Gemini image may need more.)
|
||||
> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength accordingly: **OpenAI gpt-image → `0.20`**, **Google Gemini → `0.30`**, **unknown source → `0.30`**. The **same ladder applies to both pipelines** — these are the oracle-certified `controlnet` floors (June 2026 Modal cert, multi-seed). They also cover plain `sdxl`: the two pipelines have opposite hard cases (controlnet leaves SynthID on photoreal, sdxl on flat graphics), but on its own hard case sdxl is the weaker remover, so it needs at least controlnet's strength — using one certified ladder is the safe choice (margin-based for sdxl, not separately certified). The dominant factor is the vendor (Google's SynthID is ~3x more robust). There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine detail, lower it. (Caveat: Google's `0.30` was validated only at `--max-resolution 1536`; a very large native Gemini image may need ~`0.35`+.)
|
||||
>
|
||||
> **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically).
|
||||
> **The default pipeline is `controlnet` — it preserves text and face structure.** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen. The default strength ladder (OpenAI `0.20` / Google `0.30`) is the oracle-certified controlnet floor. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically). Pass `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces.
|
||||
>
|
||||
> **No face-restore extra in the library.** Every ArcFace-based regeneration approach we evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps, 2026-06-04 - 2026-06-08 Modal cert sweeps) regenerated the face via SDXL diffusion — the output face pixels were diffusion-fresh (SynthID not re-introduced), but the face inherently looked more AI-generated than the cleaned image (SDXL "clean skin" gloss, lost original identity precision). The cleaned image from the main controlnet 0.20 pass is the least-AI face state we can reach without re-introducing SynthID. Empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md`.
|
||||
|
||||
@@ -136,7 +137,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P
|
||||
|
||||
> **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203).
|
||||
|
||||
**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The library does not ship a face-restore extra (see the callout above).
|
||||
**Text and face preservation** (the default pipeline; `--pipeline sdxl` opts down to plain SDXL): a canny ControlNet keeps text and face *structure* sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The library does not ship a face-restore extra (see the callout above).
|
||||
|
||||
**Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.)
|
||||
|
||||
@@ -292,14 +293,15 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0
|
||||
# first (disable with --min-resolution 0); --upscaler esrgan uses Real-ESRGAN for
|
||||
# that floor upscale (needs the 'esrgan' extra). On a very large image that OOMs the
|
||||
# GPU/MPS, cap the long side: --max-resolution 2048
|
||||
# Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override
|
||||
# with --strength. To preserve text/face structure, use --pipeline controlnet
|
||||
# Or let it choose: --auto picks the pipeline and an adaptive polish
|
||||
# from the image content (controlnet when there is text/structure, polish that
|
||||
# restores the input's detail level while sparing text). Every choice is
|
||||
# overridable: --pipeline and --no-adaptive-polish win over the auto pick.
|
||||
# Experimental.
|
||||
# (SDXL + canny ControlNet); tune preservation with --controlnet-scale. Add
|
||||
# Strength is vendor-adaptive by default (OpenAI 0.20 / Google 0.30, same
|
||||
# for both pipelines); override with --strength. controlnet (text/face
|
||||
# structure preservation) is the default pipeline; --pipeline sdxl opts down
|
||||
# to plain SDXL for non-structure inputs. Tune structure preservation with
|
||||
# --controlnet-scale, the CFG with --guidance-scale (default 7.5), and the
|
||||
# diffusion model with --model (default: SDXL base).
|
||||
# --adaptive-polish (ON by default) restores the input's detail level (sparing
|
||||
# text) to counter the over-smoothed look; it self-limits to a no-op where
|
||||
# there is no detail deficit. Disable with --no-adaptive-polish.
|
||||
|
||||
# Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels)
|
||||
# --check also flags SynthID-bearing sources: a C2PA manifest signed by
|
||||
@@ -312,9 +314,9 @@ remove-ai-watermarks metadata image.png --remove
|
||||
# Batch with a specific mode
|
||||
remove-ai-watermarks batch ./images/ --mode visible
|
||||
|
||||
# Batch also accepts --auto (and --adaptive-polish): the plan is recomputed per
|
||||
# image, so a mixed directory routes each file to the right pipeline
|
||||
remove-ai-watermarks batch ./images/ --mode all --auto
|
||||
# Batch accepts the full invisible knob set (--strength/--guidance-scale/--model/
|
||||
# --pipeline/...); --adaptive-polish is on by default (--no-adaptive-polish to disable)
|
||||
remove-ai-watermarks batch ./images/ --mode all
|
||||
```
|
||||
|
||||
### Python API
|
||||
@@ -335,6 +337,30 @@ clean = engine.remove_watermark(image)
|
||||
cv2.imwrite("clean.png", clean)
|
||||
```
|
||||
|
||||
#### Invisible removal (diffusion)
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
from remove_ai_watermarks.invisible_engine import InvisibleEngine
|
||||
|
||||
# pipeline: "controlnet" (default, preserves text/face structure) or "sdxl" (plain).
|
||||
# model_id=None uses the SDXL base; controlnet_conditioning_scale tunes preservation.
|
||||
engine = InvisibleEngine(pipeline="controlnet")
|
||||
|
||||
engine.remove_watermark(
|
||||
Path("watermarked.png"),
|
||||
Path("clean.png"),
|
||||
strength=None, # None = vendor-adaptive default (OpenAI 0.20 / Google 0.30)
|
||||
num_inference_steps=50,
|
||||
guidance_scale=None, # None = the library default (7.5)
|
||||
seed=None, # set for reproducible output
|
||||
adaptive_polish=True, # detail-targeted polish, self-gating (default on in the CLI)
|
||||
min_resolution=1024, # upscale tiny inputs to this floor before diffusion
|
||||
max_resolution=0, # 0 = native; set only to cap GPU/MPS memory
|
||||
upscaler="lanczos", # or "esrgan" for the floor upscale (needs the 'esrgan' extra)
|
||||
)
|
||||
```
|
||||
|
||||
### Metadata stripping
|
||||
|
||||
```python
|
||||
|
||||
+30
-15
@@ -382,12 +382,10 @@ the payload, reconstituting SynthID in text. The lesson held and shaped the
|
||||
current design: **content is preserved by REGENERATING it under structural
|
||||
conditioning, never by copying original pixels.**
|
||||
|
||||
Both preservation features below are **EXPERIMENTAL and opt-in (off by default)**;
|
||||
the plain `default` SDXL img2img pass is the shippable path.
|
||||
|
||||
- **Text + structure:** `--pipeline controlnet` (SDXL img2img + a canny ControlNet,
|
||||
experimental/opt-in) conditions the regeneration on the edge map, so text and
|
||||
structure stay sharp while every pixel is still regenerated. Text legibility is
|
||||
- **Text + structure:** `--pipeline controlnet` (SDXL img2img + a canny ControlNet) is
|
||||
**THE DEFAULT pipeline since 2026-06-09** (`--pipeline default` opts down to plain
|
||||
SDXL img2img for inputs without text/faces). It conditions the regeneration on the
|
||||
edge map, so text and structure stay sharp while every pixel is still regenerated. Text legibility is
|
||||
better than plain img2img at the same strength (text stays readable where plain
|
||||
garbles it). **BUT removal efficacy at the low vendor-adaptive strength is CONTENT ×
|
||||
PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated
|
||||
@@ -407,7 +405,13 @@ the plain `default` SDXL img2img pass is the shippable path.
|
||||
removal guarantee at today's strength -- pick by what you must PRESERVE (controlnet
|
||||
for text/structure), then raise strength until the oracle reads clean. (The earlier
|
||||
"reads clean on the oracle" claim held only for the one flat/text-background case it
|
||||
was checked on; it does not generalize.)
|
||||
was checked on; it does not generalize.) **UPDATE 2026-06-09: the default strengths
|
||||
were raised and made pipeline-aware (controlnet ladder = the certified
|
||||
0.20/0.30/0.30 floors, applied to BOTH pipelines as a single ladder -- see §5.2 for
|
||||
why one ladder covers plain `sdxl` too) and controlnet is now the default pipeline.
|
||||
The plain-SDXL profile was also renamed `default` -> `sdxl` (`default` stays as an
|
||||
alias). The 0.10/0.15 numbers in this analysis are the PRE-raise values it was
|
||||
measured at. See §5.2.**
|
||||
- **Face identity:** canny holds face *structure* but not *identity*. Shipped as the
|
||||
optional `--restore-faces` GFPGAN post-pass (`face_restore.py`, the `restore`
|
||||
extra, experimental/opt-in, off by default). It runs GFPGAN on the ORIGINAL
|
||||
@@ -448,14 +452,25 @@ study (section 2.2) gives empirical floors:
|
||||
resolution stack). Use a GPU or `--max-resolution 1536`.
|
||||
|
||||
The default is **vendor-adaptive** (`watermark_profiles.resolve_strength` +
|
||||
`vendor_for_strength`): the tool reads the C2PA issuer on the original input and
|
||||
picks `OPENAI_STRENGTH` 0.10 / `GEMINI_STRENGTH` 0.15 / `UNKNOWN_STRENGTH` 0.15.
|
||||
This uses the vendor signal we DO have locally (the C2PA SynthID proxy) to avoid
|
||||
the overkill of a single high default on OpenAI images, without needing a local
|
||||
pixel detector. An explicit `--strength` always wins. If the watermark still
|
||||
survives (e.g. a large native Gemini beyond the capped-1536 validation), raise
|
||||
toward 0.30 then 0.35-0.40 (0.40 visibly corrupts dense text), using the lowest
|
||||
value that reads clean on the oracle.
|
||||
`vendor_for_strength`): the tool reads the C2PA issuer on the original input and picks
|
||||
`OPENAI_STRENGTH` 0.20 / `GEMINI_STRENGTH` 0.30 / `UNKNOWN_STRENGTH` 0.30. **The SAME
|
||||
ladder applies to both pipelines** (`sdxl` and `controlnet`) -- these are the
|
||||
oracle-certified controlnet floors (§5.5, the 2026-06-04 Modal cert). Why one ladder
|
||||
covers plain `sdxl` too: the certification was run on controlnet and does NOT transfer
|
||||
by symmetry (the two pipelines have OPPOSITE hard cases -- controlnet leaves SynthID on
|
||||
photoreal, `sdxl` on flat graphics, the §5.1 content-x-pipeline table), BUT on its own
|
||||
hard case (flat fills) `sdxl` is the WEAKER remover (plain img2img barely perturbs a
|
||||
flat region at low strength), so it needs AT LEAST controlnet's strength -- the
|
||||
certified floor is therefore the right floor for `sdxl` too. This is a MARGIN argument
|
||||
for `sdxl`, not a separate certification (no local SynthID detector to self-verify).
|
||||
The higher strength costs little quality where it matters, because `controlnet` is now
|
||||
the default pipeline, so `sdxl` is reached only via an explicit `--pipeline sdxl` (a
|
||||
deliberate opt-down), where over-regeneration has no faces/text to damage.
|
||||
This uses the vendor signal we DO have locally (the C2PA SynthID proxy) to avoid the
|
||||
overkill of a single high default on OpenAI images, without needing a local pixel
|
||||
detector. An explicit `--strength` always wins. If the watermark still survives (e.g. a
|
||||
large native Gemini beyond the capped-1536 validation), raise toward 0.35-0.40 (0.40
|
||||
visibly corrupts dense text), using the lowest value that reads clean on the oracle.
|
||||
|
||||
### 5.3 Test methodology
|
||||
|
||||
|
||||
Binary file not shown.
Binary file not shown.
@@ -1,270 +0,0 @@
|
||||
"""Automatic pipeline planning for the ``--auto`` quality mode.
|
||||
|
||||
``plan(image_path)`` inspects the INPUT image (before the diffusion model loads)
|
||||
and returns the quality modes to use, so the pipeline can adapt to content. It is
|
||||
meant to run as the FIRST step of the invisible/all pipeline, wherever that pipeline
|
||||
runs (locally, or the raiw.cc Modal GPU worker) -- never on a memory-constrained web
|
||||
host (image work there OOM-crashes the container).
|
||||
|
||||
Routing is **quality-priority**: ControlNet (text/face-structure preservation) is the
|
||||
default; it is only skipped for a clearly structure-less image (no face, no text,
|
||||
near-zero edges), where plain SDXL is cheaper and just as good. A detected face only
|
||||
routes to controlnet (canny preserves face STRUCTURE, not identity); there is no
|
||||
identity restoration -- the whole face-restore family was removed (it regenerated the
|
||||
face via SDXL and looked MORE AI-generated, see
|
||||
docs/synthid-robust-identity-research-2026-06-08.md). When the controlnet smoothing
|
||||
pass ran, the **adaptive polish** (``humanizer.adaptive_polish``) restores the input's
|
||||
detail level -- a capped unsharp + edge-masked grain targeting the input's Laplacian
|
||||
variance -- to counter the over-smoothed "AI look". It is self-limiting on
|
||||
text/graphics (already high-frequency, so almost no polish) and spares text/edges by
|
||||
masking the grain.
|
||||
|
||||
Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for
|
||||
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- DBNet (PP-OCRv3
|
||||
differentiable-binarization via ``cv2.dnn.TextDetectionModel_DB``, a 2.4 MB Apache-2.0
|
||||
model bundled in ``assets/``) for text, and a Canny ``edge_density``. The whole planner
|
||||
peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs
|
||||
anywhere the pipeline runs.
|
||||
|
||||
The text detector falls back to the old MSER region heuristic if the DBNet model can't
|
||||
load. Either way text only ever ADDS controlnet, so a miss is backstopped by the
|
||||
edge-density route and a false positive only costs a controlnet run.
|
||||
"""
|
||||
|
||||
# cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules
|
||||
# for this file only.
|
||||
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportOptionalMemberAccess=false, reportOptionalCall=false, reportOptionalSubscript=false, reportOptionalOperand=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false, reportPrivateUsage=false, reportInvalidTypeForm=false, reportConstantRedefinition=false, reportUnnecessaryComparison=false
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from numpy.typing import NDArray
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── Routing thresholds (tunable; quality-priority -> controlnet unless clearly flat) ──
|
||||
# Canny edge-density below this, AND no face AND no text -> plain SDXL (nothing to
|
||||
# preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat
|
||||
# gradient/solid image falls under 0.008.
|
||||
_STRUCTURELESS_EDGE_MAX = 0.008
|
||||
# MSER regions per megapixel above this -> likely text. The MSER path is now only the
|
||||
# FALLBACK when the bundled DBNet model can't load; DBNet (below) is the primary text
|
||||
# detector. Rough heuristic: a no-text portrait measures a few hundred/MP, dense text
|
||||
# far more. Set high so it rarely false-fires; text only ever ADDS controlnet.
|
||||
_TEXT_MSER_PER_MP = 1500.0
|
||||
_FACE_SCORE = 0.6 # YuNet confidence for a face to count
|
||||
# Downscale the long side to this for DETECTION only (faces stay detectable down to
|
||||
# ~10px, and this bounds YuNet/DBNet/MSER cost on huge inputs). Removal runs at full res.
|
||||
_DETECT_MAX_SIDE = 1024
|
||||
|
||||
# DBNet (PP-OCRv3 differentiable-binarization) text-region detector via cv2.dnn -- the
|
||||
# primary "has meaningful text" signal. The model is the shared PP-OCRv3 detection net
|
||||
# from OpenCV Zoo (Apache-2.0); en/cn variants are byte-identical, so it is bundled
|
||||
# language-neutral. cv2.dnn is core OpenCV, so this adds NO new pip dependency.
|
||||
_DBNET_ASSET = "text_detection_ppocrv3_2023may.onnx" # Apache-2.0 (OpenCV Zoo PP-OCRv3 DB)
|
||||
_DBNET_BINARY_THRESHOLD = 0.3
|
||||
_DBNET_POLYGON_THRESHOLD = 0.5
|
||||
_DBNET_MAX_CANDIDATES = 200
|
||||
_DBNET_UNCLIP_RATIO = 2.0
|
||||
_DBNET_INPUT_SIDE = 736 # square input, multiple of 32 (PP-OCRv3 default)
|
||||
_DBNET_MEAN = (122.67891434, 116.66876762, 104.00698793) # ImageNet mean * 255
|
||||
_dbnet: Any = None # lazy singleton; set to False after a load failure (-> MSER fallback)
|
||||
|
||||
# When the controlnet smoothing pass ran, the adaptive polish
|
||||
# (humanizer.adaptive_polish) restores the input's detail level, sparing text --
|
||||
# replacing the old fixed unsharp/grain which over-/under-corrected and speckled text.
|
||||
_UPSCALE_FLOOR = 1024
|
||||
|
||||
_YUNET_ASSET = "face_detection_yunet_2023mar.onnx" # MIT (Shiqi Yu), OpenCV Zoo
|
||||
_yunet: Any = None # lazy singleton
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AutoConfig:
|
||||
"""Resolved quality modes from content analysis (the ``--auto`` plan)."""
|
||||
|
||||
pipeline: str # "default" | "controlnet"
|
||||
adaptive_polish: bool # restore the input's detail level (sharpen + masked grain), sparing text
|
||||
unsharp: float # fixed-polish knobs, 0 in auto (the adaptive polish replaces them)
|
||||
humanize: float
|
||||
min_resolution: int
|
||||
# signals retained for logging / debugging a bad pick
|
||||
has_face: bool
|
||||
has_text: bool
|
||||
edge_density: float
|
||||
width: int
|
||||
height: int
|
||||
|
||||
@property
|
||||
def reason(self) -> str:
|
||||
"""One-line human-readable summary of the plan (logged per image)."""
|
||||
bits = ["face" if self.has_face else "no-face"]
|
||||
if self.has_text:
|
||||
bits.append("text")
|
||||
bits.append(f"edges={self.edge_density:.3f}")
|
||||
if self.adaptive_polish:
|
||||
polish = ", adaptive polish"
|
||||
elif self.unsharp or self.humanize:
|
||||
polish = f", unsharp {self.unsharp}/grain {self.humanize}"
|
||||
else:
|
||||
polish = ""
|
||||
return f"{'+'.join(bits)} -> {self.pipeline} pipeline{polish}"
|
||||
|
||||
|
||||
def _to_bgr(image: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Normalize a 2D grayscale or 4-channel BGRA array to 3-channel BGR."""
|
||||
import cv2
|
||||
|
||||
if image.ndim == 2:
|
||||
return cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
|
||||
if image.shape[2] == 4:
|
||||
return cv2.cvtColor(image, cv2.COLOR_BGRA2BGR)
|
||||
return image
|
||||
|
||||
|
||||
def _to_gray(image: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Single-channel grayscale; passes a 2D (already-gray) input through unchanged."""
|
||||
import cv2
|
||||
|
||||
if image.ndim == 3 and image.shape[2] >= 3:
|
||||
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
return image
|
||||
|
||||
|
||||
def _downscale_for_detection(image: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Shrink the long side to ``_DETECT_MAX_SIDE`` for cheap, bounded detection."""
|
||||
import cv2
|
||||
|
||||
h, w = image.shape[:2]
|
||||
long_side = max(h, w)
|
||||
if long_side <= _DETECT_MAX_SIDE:
|
||||
return image
|
||||
scale = _DETECT_MAX_SIDE / long_side
|
||||
return cv2.resize(image, (max(1, round(w * scale)), max(1, round(h * scale))), interpolation=cv2.INTER_AREA)
|
||||
|
||||
|
||||
def detect_face(image: NDArray[Any]) -> bool:
|
||||
"""True if OpenCV YuNet finds at least one face. cv2-only, torch-free."""
|
||||
import cv2
|
||||
|
||||
global _yunet
|
||||
img = _to_bgr(image)
|
||||
h, w = img.shape[:2]
|
||||
if h < 1 or w < 1:
|
||||
return False
|
||||
try:
|
||||
if _yunet is None:
|
||||
model = Path(__file__).parent / "assets" / _YUNET_ASSET
|
||||
_yunet = cv2.FaceDetectorYN.create(str(model), "", (w, h), _FACE_SCORE, 0.3, 5000)
|
||||
_yunet.setInputSize((w, h))
|
||||
_, faces = _yunet.detect(img)
|
||||
except cv2.error as e: # malformed input / model
|
||||
logger.debug("YuNet face detect failed (%s); assuming no face", e)
|
||||
return False
|
||||
return faces is not None and len(faces) > 0
|
||||
|
||||
|
||||
def _detect_text_dbnet(image: NDArray[Any]) -> bool | None:
|
||||
"""DBNet (PP-OCRv3) text-region presence via cv2.dnn.
|
||||
|
||||
Returns True/False on a successful run, or None if the bundled model can't load
|
||||
(the caller then falls back to the MSER heuristic). Loads once, lazily.
|
||||
"""
|
||||
import cv2
|
||||
|
||||
global _dbnet
|
||||
if _dbnet is False: # a prior load failed; skip straight to the MSER fallback
|
||||
return None
|
||||
img = _to_bgr(image)
|
||||
h, w = img.shape[:2]
|
||||
if h < 1 or w < 1:
|
||||
return False
|
||||
try:
|
||||
if _dbnet is None:
|
||||
model = Path(__file__).parent / "assets" / _DBNET_ASSET
|
||||
net = cv2.dnn.TextDetectionModel_DB(str(model))
|
||||
net.setBinaryThreshold(_DBNET_BINARY_THRESHOLD)
|
||||
net.setPolygonThreshold(_DBNET_POLYGON_THRESHOLD)
|
||||
net.setMaxCandidates(_DBNET_MAX_CANDIDATES)
|
||||
net.setUnclipRatio(_DBNET_UNCLIP_RATIO)
|
||||
net.setInputParams(1.0 / 255.0, (_DBNET_INPUT_SIDE, _DBNET_INPUT_SIDE), _DBNET_MEAN)
|
||||
_dbnet = net
|
||||
boxes, _ = _dbnet.detect(img)
|
||||
except Exception as e: # model load / inference can raise cv2.error or others
|
||||
logger.debug("DBNet text detect failed (%s); falling back to MSER", e)
|
||||
_dbnet = False
|
||||
return None
|
||||
return boxes is not None and len(boxes) > 0
|
||||
|
||||
|
||||
def _detect_text_mser(image: NDArray[Any]) -> bool:
|
||||
"""Fallback MSER-based text-presence heuristic (used only if DBNet can't load)."""
|
||||
import cv2
|
||||
|
||||
gray = _to_gray(image)
|
||||
h, w = gray.shape[:2]
|
||||
try:
|
||||
regions, _ = cv2.MSER_create().detectRegions(gray)
|
||||
except cv2.error:
|
||||
return False
|
||||
per_mp = len(regions) / max(1e-6, (h * w) / 1e6)
|
||||
return per_mp > _TEXT_MSER_PER_MP
|
||||
|
||||
|
||||
def detect_text(image: NDArray[Any]) -> bool:
|
||||
"""Text-presence: DBNet (cv2.dnn) when the bundled model loads, else the MSER heuristic."""
|
||||
dbnet = _detect_text_dbnet(image)
|
||||
return _detect_text_mser(image) if dbnet is None else dbnet
|
||||
|
||||
|
||||
def edge_density(image: NDArray[Any]) -> float:
|
||||
"""Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1]."""
|
||||
import cv2
|
||||
|
||||
gray = _to_gray(image)
|
||||
edges = cv2.Canny(gray, 100, 200)
|
||||
return float((edges > 0).mean())
|
||||
|
||||
|
||||
def plan(image_path: Path) -> AutoConfig | None:
|
||||
"""Inspect the input image and return the quality modes, or None if unreadable.
|
||||
|
||||
Pure analysis: loads the image, runs the cv2 detectors on a downscaled copy, and
|
||||
applies the quality-priority routing rules. Safe to call wherever the pipeline
|
||||
runs; no diffusion model is loaded.
|
||||
"""
|
||||
from remove_ai_watermarks import image_io
|
||||
|
||||
image = image_io.imread(image_path)
|
||||
if image is None:
|
||||
return None
|
||||
|
||||
h, w = image.shape[:2]
|
||||
small = _downscale_for_detection(image)
|
||||
gray = _to_gray(small) # convert once; edge density + the MSER fallback use gray
|
||||
has_face = detect_face(small) # YuNet needs the 3-channel image
|
||||
has_text = detect_text(small) # DBNet wants BGR; the MSER fallback grays it internally
|
||||
edges = edge_density(gray)
|
||||
|
||||
structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX
|
||||
pipeline = "default" if structureless else "controlnet"
|
||||
smoothing = pipeline == "controlnet"
|
||||
|
||||
cfg = AutoConfig(
|
||||
pipeline=pipeline,
|
||||
adaptive_polish=smoothing, # adaptive (detail-targeted) polish when a smoothing pass ran
|
||||
unsharp=0.0,
|
||||
humanize=0.0,
|
||||
min_resolution=_UPSCALE_FLOOR,
|
||||
has_face=has_face,
|
||||
has_text=has_text,
|
||||
edge_density=edges,
|
||||
width=w,
|
||||
height=h,
|
||||
)
|
||||
logger.debug("auto plan for %s: %s", image_path, cfg.reason)
|
||||
return cfg
|
||||
+123
-81
@@ -18,7 +18,11 @@ from typing import TYPE_CHECKING, Any, Literal
|
||||
import click
|
||||
|
||||
from remove_ai_watermarks import __version__, watermark_registry
|
||||
from remove_ai_watermarks.noai.watermark_profiles import resolve_strength, vendor_for_strength
|
||||
from remove_ai_watermarks.noai.watermark_profiles import (
|
||||
resolve_strength,
|
||||
strength_default_help,
|
||||
vendor_for_strength,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Generator
|
||||
@@ -143,8 +147,8 @@ _controlnet_scale_option = click.option(
|
||||
"--controlnet-scale",
|
||||
type=float,
|
||||
default=1.0,
|
||||
help="ControlNet conditioning scale (structure/text preservation strength), controlnet pipeline "
|
||||
"only (EXPERIMENTAL).",
|
||||
help="ControlNet conditioning scale (structure/text preservation strength); "
|
||||
"applies to the controlnet pipeline (the default). Higher = closer to original structure.",
|
||||
)
|
||||
|
||||
_min_resolution_option = click.option(
|
||||
@@ -173,48 +177,103 @@ _auto_option = click.option(
|
||||
"--auto",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="Auto-pick the pipeline and adaptive polish from image content. "
|
||||
"Every choice is overridable -- an explicit --pipeline / --adaptive-polish "
|
||||
"always wins. EXPERIMENTAL.",
|
||||
help="DEPRECATED: controlnet is already the default pipeline, so --auto now only "
|
||||
"enables --adaptive-polish (the content detectors were removed). Use "
|
||||
"--adaptive-polish instead.",
|
||||
)
|
||||
|
||||
_adaptive_polish_option = click.option(
|
||||
"--adaptive-polish/--no-adaptive-polish",
|
||||
default=False,
|
||||
default=True,
|
||||
help="Restore the input's detail level after removal (capped unsharp + edge-masked grain "
|
||||
"targeting the input's sharpness, sparing text). On by default under --auto; pass "
|
||||
"--no-adaptive-polish to disable it there, or --adaptive-polish to use it without --auto. "
|
||||
"Independent of the fixed --unsharp/--humanize. EXPERIMENTAL.",
|
||||
"targeting the input's sharpness, sparing text), countering the over-smoothed look. ON by "
|
||||
"default; it self-limits where there is no detail deficit (text/flat graphics), so it is a "
|
||||
"no-op there. Pass --no-adaptive-polish to disable. Independent of --unsharp/--humanize.",
|
||||
)
|
||||
|
||||
# HuggingFace model + CFG knobs, shared by the diffusion commands (invisible/all/batch)
|
||||
# so the surface stays identical across them.
|
||||
_model_option = click.option(
|
||||
"--model",
|
||||
type=str,
|
||||
default=None,
|
||||
help="HuggingFace model ID for the diffusion pipeline. Default: the SDXL base checkpoint.",
|
||||
)
|
||||
_guidance_scale_option = click.option(
|
||||
"--guidance-scale",
|
||||
type=float,
|
||||
default=None,
|
||||
help="Classifier-free guidance scale (CFG). Default: 7.5 (the library default). "
|
||||
"Lower = follow the prompt less / stay closer to the input.",
|
||||
)
|
||||
|
||||
|
||||
def _apply_auto(
|
||||
ctx: click.Context,
|
||||
source: Path,
|
||||
pipeline: str,
|
||||
adaptive_polish: bool,
|
||||
) -> tuple[str, bool]:
|
||||
"""Resolve ``--auto``: plan the three content-adaptive modes (pipeline, face
|
||||
restore, adaptive polish) from the image, overriding only the ones the user left
|
||||
at their default (an explicit flag always wins). The fixed ``--unsharp``/
|
||||
``--humanize`` filters are independent and untouched. Prints the chosen plan.
|
||||
def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str | None) -> str | None:
|
||||
"""Resolve the legacy ``default`` profile name to ``sdxl`` (click option callback).
|
||||
|
||||
Emits a one-line deprecation notice when the user explicitly passes the outdated
|
||||
``default`` value, pointing at the two current choices (``sdxl`` / ``controlnet``).
|
||||
"""
|
||||
from remove_ai_watermarks import auto_config
|
||||
if value is None:
|
||||
return None
|
||||
from remove_ai_watermarks.noai.watermark_profiles import normalize_profile
|
||||
|
||||
cfg = auto_config.plan(source)
|
||||
if cfg is None:
|
||||
console.print(" Auto: could not read image; using defaults")
|
||||
return pipeline, adaptive_polish
|
||||
normalized = normalize_profile(value)
|
||||
if value.strip().lower() == "default":
|
||||
click.echo(
|
||||
"Warning: --pipeline default is deprecated and maps to 'sdxl'. "
|
||||
"Use --pipeline sdxl (plain SDXL) or --pipeline controlnet (the default).",
|
||||
err=True,
|
||||
)
|
||||
return normalized
|
||||
|
||||
def _is_default(name: str) -> bool:
|
||||
return ctx.get_parameter_source(name) == click.core.ParameterSource.DEFAULT
|
||||
|
||||
if _is_default("pipeline"):
|
||||
pipeline = cfg.pipeline
|
||||
if _is_default("adaptive_polish"):
|
||||
adaptive_polish = cfg.adaptive_polish
|
||||
console.print(f" Auto: {cfg.reason}")
|
||||
return pipeline, adaptive_polish
|
||||
# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the
|
||||
# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl``
|
||||
# (warned + normalized away by _normalize_pipeline).
|
||||
_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"]
|
||||
_PIPELINE_HELP = (
|
||||
"Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves "
|
||||
"text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img "
|
||||
"(lighter, no extra model download, but leaves SynthID on flat-graphic content). "
|
||||
"('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)"
|
||||
)
|
||||
|
||||
# Shared --pipeline / --strength decorators so the three diffusion commands
|
||||
# (invisible/all/batch) keep an identical surface and the strength help can never
|
||||
# drift from the watermark_profiles constants (strength_default_help derives it).
|
||||
_pipeline_option = click.option(
|
||||
"--pipeline",
|
||||
type=click.Choice(_PIPELINE_CHOICES),
|
||||
default="controlnet",
|
||||
callback=_normalize_pipeline,
|
||||
help=_PIPELINE_HELP,
|
||||
)
|
||||
_strength_option = click.option(
|
||||
"--strength",
|
||||
type=float,
|
||||
default=None,
|
||||
help=f"Denoising strength (0.0-1.0). Default: {strength_default_help()}.",
|
||||
)
|
||||
|
||||
|
||||
def _resolve_auto_polish(auto: bool, adaptive_polish: bool) -> bool:
|
||||
"""Warn on the retired ``--auto`` flag, returning ``adaptive_polish`` unchanged.
|
||||
|
||||
``--auto`` used to plan the pipeline + polish from content detection, but the
|
||||
pipeline is now always controlnet (the default) and the adaptive polish is ON by
|
||||
default (it self-gates by detail level), so the content detectors were removed and
|
||||
``--auto`` is now a no-op alias: the polish it used to enable is already the default,
|
||||
and an explicit ``--no-adaptive-polish`` still wins. So it only emits a deprecation
|
||||
warning and passes ``adaptive_polish`` through.
|
||||
"""
|
||||
if auto:
|
||||
click.echo(
|
||||
"Warning: --auto is deprecated and now does nothing (the adaptive polish it "
|
||||
"enabled is ON by default). Use --no-adaptive-polish to turn the polish off.",
|
||||
err=True,
|
||||
)
|
||||
return adaptive_polish
|
||||
|
||||
|
||||
def _warn_if_esrgan_unavailable(upscaler: str) -> None:
|
||||
@@ -524,21 +583,9 @@ def cmd_erase(
|
||||
@click.option(
|
||||
"-o", "--output", type=click.Path(path_type=Path), default=None, help="Output path (default: <source>_clean.<ext>)."
|
||||
)
|
||||
@click.option(
|
||||
"--strength",
|
||||
type=float,
|
||||
default=None,
|
||||
help="Denoising strength (0.0-1.0). Default: vendor-adaptive (OpenAI 0.10 / Google 0.15 / "
|
||||
"unknown 0.15, from the C2PA issuer).",
|
||||
)
|
||||
@_strength_option
|
||||
@click.option("--steps", type=int, default=50, help="Number of denoising steps. Default: 50.")
|
||||
@click.option(
|
||||
"--pipeline",
|
||||
type=click.Choice(["default", "controlnet"]),
|
||||
default="default",
|
||||
help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves "
|
||||
"text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).",
|
||||
)
|
||||
@_pipeline_option
|
||||
@click.option(
|
||||
"--device",
|
||||
type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]),
|
||||
@@ -560,6 +607,8 @@ def cmd_erase(
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_upscaler_option
|
||||
@_model_option
|
||||
@_guidance_scale_option
|
||||
@_auto_option
|
||||
@_adaptive_polish_option
|
||||
@click.pass_context
|
||||
@@ -579,6 +628,8 @@ def cmd_invisible(
|
||||
min_resolution: int,
|
||||
controlnet_scale: float,
|
||||
upscaler: str,
|
||||
model: str | None,
|
||||
guidance_scale: float | None,
|
||||
auto: bool,
|
||||
adaptive_polish: bool,
|
||||
) -> None:
|
||||
@@ -599,8 +650,7 @@ def cmd_invisible(
|
||||
|
||||
source = _validate_image(source)
|
||||
_warn_if_esrgan_unavailable(upscaler)
|
||||
if auto:
|
||||
pipeline, adaptive_polish = _apply_auto(ctx, source, pipeline, adaptive_polish)
|
||||
adaptive_polish = _resolve_auto_polish(auto, adaptive_polish)
|
||||
if output is None:
|
||||
output = source.with_stem(source.stem + "_clean")
|
||||
|
||||
@@ -610,6 +660,7 @@ def cmd_invisible(
|
||||
console.print(f" {msg}")
|
||||
|
||||
engine = InvisibleEngine(
|
||||
model_id=model,
|
||||
device=device_str,
|
||||
pipeline=pipeline,
|
||||
hf_token=hf_token,
|
||||
@@ -630,7 +681,7 @@ def cmd_invisible(
|
||||
output_path=output,
|
||||
strength=strength,
|
||||
num_inference_steps=steps,
|
||||
guidance_scale=None,
|
||||
guidance_scale=guidance_scale,
|
||||
seed=seed,
|
||||
humanize=humanize,
|
||||
unsharp=unsharp,
|
||||
@@ -781,21 +832,10 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
|
||||
@click.option(
|
||||
"--inpaint-method", type=click.Choice(["ns", "telea", "gaussian"]), default="ns", help="Inpainting method."
|
||||
)
|
||||
@click.option(
|
||||
"--strength",
|
||||
type=float,
|
||||
default=None,
|
||||
help="Invisible watermark denoising strength. Default: vendor-adaptive (OpenAI 0.10 / Google 0.15 / unknown 0.15).",
|
||||
)
|
||||
@_strength_option
|
||||
@click.option("--steps", type=int, default=50, help="Number of denoising steps for invisible removal.")
|
||||
@click.option(
|
||||
"--pipeline",
|
||||
type=click.Choice(["default", "controlnet"]),
|
||||
default="default",
|
||||
help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves "
|
||||
"text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).",
|
||||
)
|
||||
@click.option("--model", type=str, default=None, help="HuggingFace model ID for invisible removal.")
|
||||
@_pipeline_option
|
||||
@_model_option
|
||||
@click.option(
|
||||
"--device",
|
||||
type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]),
|
||||
@@ -817,6 +857,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_upscaler_option
|
||||
@_guidance_scale_option
|
||||
@_auto_option
|
||||
@_adaptive_polish_option
|
||||
@click.pass_context
|
||||
@@ -839,6 +880,7 @@ def cmd_all(
|
||||
min_resolution: int,
|
||||
controlnet_scale: float,
|
||||
upscaler: str,
|
||||
guidance_scale: float | None,
|
||||
auto: bool,
|
||||
adaptive_polish: bool,
|
||||
) -> None:
|
||||
@@ -856,8 +898,7 @@ def cmd_all(
|
||||
_banner()
|
||||
source = _validate_image(source)
|
||||
_warn_if_esrgan_unavailable(upscaler)
|
||||
if auto:
|
||||
pipeline, adaptive_polish = _apply_auto(ctx, source, pipeline, adaptive_polish)
|
||||
adaptive_polish = _resolve_auto_polish(auto, adaptive_polish)
|
||||
|
||||
if output is None:
|
||||
output = source.with_stem(source.stem + "_clean")
|
||||
@@ -937,6 +978,7 @@ def cmd_all(
|
||||
output_path=tmp_path,
|
||||
strength=strength,
|
||||
num_inference_steps=steps,
|
||||
guidance_scale=guidance_scale,
|
||||
seed=seed,
|
||||
humanize=humanize,
|
||||
unsharp=unsharp,
|
||||
@@ -1001,7 +1043,8 @@ def _process_batch_image(
|
||||
min_resolution: int = 1024,
|
||||
controlnet_scale: float = 1.0,
|
||||
upscaler: str = "lanczos",
|
||||
auto: bool = False,
|
||||
model: str | None = None,
|
||||
guidance_scale: float | None = None,
|
||||
adaptive_polish: bool = False,
|
||||
) -> None:
|
||||
"""Process a single image for batch mode.
|
||||
@@ -1048,14 +1091,12 @@ def _process_batch_image(
|
||||
if invisible_available():
|
||||
from remove_ai_watermarks.invisible_engine import InvisibleEngine
|
||||
|
||||
# --auto re-plans the pipeline / face-restore / polish per image; only the
|
||||
# pipeline choice changes the engine ctor, so cache one engine per pipeline
|
||||
# (controlnet vs default) rather than a single shared instance.
|
||||
if auto:
|
||||
pipeline, adaptive_polish = _apply_auto(ctx, img_path, pipeline, adaptive_polish)
|
||||
# Cache the engine in ctx.obj so the batch builds it once (pipeline is a
|
||||
# single CLI value, constant across the run).
|
||||
engines = ctx.obj.setdefault("_inv_engines", {})
|
||||
if pipeline not in engines:
|
||||
engines[pipeline] = InvisibleEngine(
|
||||
model_id=model,
|
||||
device=None if device == "auto" else device,
|
||||
pipeline=pipeline,
|
||||
hf_token=hf_token,
|
||||
@@ -1067,6 +1108,7 @@ def _process_batch_image(
|
||||
out_path,
|
||||
strength=strength,
|
||||
num_inference_steps=steps,
|
||||
guidance_scale=guidance_scale,
|
||||
seed=seed,
|
||||
humanize=humanize,
|
||||
unsharp=unsharp,
|
||||
@@ -1104,19 +1146,13 @@ def _process_batch_image(
|
||||
@click.option(
|
||||
"--mode", type=click.Choice(["visible", "invisible", "metadata", "all"]), default="visible", help="Processing mode."
|
||||
)
|
||||
@click.option("--strength", type=float, default=None, help="Denoising strength (invisible mode).")
|
||||
@_strength_option
|
||||
@click.option("--steps", type=int, default=50, help="Number of denoising steps (invisible mode).")
|
||||
@click.option("--inpaint/--no-inpaint", default=True, help="Apply inpainting (visible mode).")
|
||||
@click.option(
|
||||
"--humanize", type=float, default=0.0, help="Analog Humanizer film grain intensity (0 = off, typical: 2.0-6.0)."
|
||||
)
|
||||
@click.option(
|
||||
"--pipeline",
|
||||
type=click.Choice(["default", "controlnet"]),
|
||||
default="default",
|
||||
help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves "
|
||||
"text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).",
|
||||
)
|
||||
@_pipeline_option
|
||||
@click.option(
|
||||
"--device",
|
||||
type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]),
|
||||
@@ -1135,6 +1171,8 @@ def _process_batch_image(
|
||||
@_unsharp_option
|
||||
@_upscaler_option
|
||||
@_controlnet_scale_option
|
||||
@_model_option
|
||||
@_guidance_scale_option
|
||||
@_auto_option
|
||||
@_adaptive_polish_option
|
||||
@click.pass_context
|
||||
@@ -1156,6 +1194,8 @@ def cmd_batch(
|
||||
min_resolution: int,
|
||||
controlnet_scale: float,
|
||||
upscaler: str,
|
||||
model: str | None,
|
||||
guidance_scale: float | None,
|
||||
auto: bool,
|
||||
adaptive_polish: bool,
|
||||
) -> None:
|
||||
@@ -1177,6 +1217,7 @@ def cmd_batch(
|
||||
console.print(f" Mode: {mode}")
|
||||
if mode in ("invisible", "all"):
|
||||
_warn_if_esrgan_unavailable(upscaler)
|
||||
adaptive_polish = _resolve_auto_polish(auto, adaptive_polish)
|
||||
|
||||
processed = 0
|
||||
errors = 0
|
||||
@@ -1214,7 +1255,8 @@ def cmd_batch(
|
||||
min_resolution=min_resolution,
|
||||
controlnet_scale=controlnet_scale,
|
||||
upscaler=upscaler,
|
||||
auto=auto,
|
||||
model=model,
|
||||
guidance_scale=guidance_scale,
|
||||
adaptive_polish=adaptive_polish,
|
||||
)
|
||||
processed += 1
|
||||
|
||||
@@ -89,7 +89,7 @@ class InvisibleEngine:
|
||||
self,
|
||||
model_id: str | None = None,
|
||||
device: str | None = None,
|
||||
pipeline: str = "default",
|
||||
pipeline: str = "controlnet",
|
||||
hf_token: str | None = None,
|
||||
progress_callback: Callable[[str], None] | None = None,
|
||||
controlnet_conditioning_scale: float = 1.0,
|
||||
@@ -99,9 +99,10 @@ class InvisibleEngine:
|
||||
Args:
|
||||
model_id: HuggingFace model ID. None = use the SDXL base default.
|
||||
device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto.
|
||||
pipeline: Pipeline profile. "default" (plain SDXL img2img) or
|
||||
"controlnet" (SDXL + canny ControlNet that preserves text/face
|
||||
structure via edge conditioning while removing SynthID).
|
||||
pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet
|
||||
that preserves text/face structure via edge conditioning while removing
|
||||
SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
|
||||
flat-graphic content). "default" is a back-compat alias for "sdxl".
|
||||
hf_token: HuggingFace API token.
|
||||
progress_callback: Optional callback for progress messages.
|
||||
controlnet_conditioning_scale: ControlNet structure-preservation
|
||||
@@ -182,12 +183,11 @@ class InvisibleEngine:
|
||||
unsharp: Final unsharp-mask sharpening strength (0 = off, default).
|
||||
Applied last to counter the soft / over-smoothed look of the
|
||||
diffusion pass; ~0.5-0.8 is a safe range, higher risks edge halos.
|
||||
adaptive_polish: When True (the --auto mode default), restore the input's
|
||||
detail level in the softened output instead of fixed unsharp/humanize:
|
||||
a capped unsharp + edge-masked grain targeting the input's Laplacian
|
||||
variance (self-limiting on text/graphics). Runs LAST, after face
|
||||
restoration. The fixed ``humanize``/``unsharp`` knobs are normally 0
|
||||
when this is on.
|
||||
adaptive_polish: When True (the CLI default), restore the input's detail
|
||||
level in the softened output: a capped unsharp + edge-masked grain
|
||||
targeting the input's Laplacian variance. Self-limiting -- a no-op when
|
||||
the output already meets the input's detail level (text/flat graphics),
|
||||
so it only acts on over-smoothed photo/face texture. Runs LAST.
|
||||
max_resolution: Cap the long side (px) before diffusion. 0 (default)
|
||||
= no cap. Set a positive value only to bound GPU/MPS memory on
|
||||
very large inputs (it reintroduces a lossy downscale->upscale
|
||||
@@ -316,8 +316,8 @@ class InvisibleEngine:
|
||||
self._progress_callback(f"Sharpening (unsharp mask: {unsharp})...")
|
||||
image_io.imwrite(out_path, unsharp_mask(out_cv, amount=unsharp))
|
||||
|
||||
# Adaptive polish (--auto): restore the input's detail level in the softened
|
||||
# output, sparing text/edges. Replaces the fixed unsharp/humanize knobs.
|
||||
# Adaptive polish (CLI default): restore the input's detail level in the
|
||||
# softened output, sparing text/edges. Self-limiting where there is no deficit.
|
||||
if adaptive_polish:
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
@@ -12,34 +12,56 @@ if TYPE_CHECKING:
|
||||
|
||||
DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
|
||||
|
||||
# Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img
|
||||
# profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's
|
||||
# name before ``controlnet`` became the default-selected pipeline, 2026-06-09).
|
||||
SDXL_PROFILE = "sdxl"
|
||||
CONTROLNET_PROFILE = "controlnet"
|
||||
_PROFILE_ALIASES = {"default": SDXL_PROFILE}
|
||||
|
||||
|
||||
def normalize_profile(profile: str) -> str:
|
||||
"""Canonicalize a pipeline-profile name, resolving the ``default`` -> ``sdxl`` alias."""
|
||||
normalized = profile.strip().lower()
|
||||
return _PROFILE_ALIASES.get(normalized, normalized)
|
||||
|
||||
|
||||
# The SDXL-native canny ControlNet used by the ``controlnet`` pipeline. The
|
||||
# ControlNet is an add-on to the SDXL base checkpoint (DEFAULT_MODEL_ID), not a
|
||||
# separate base model, so both the ``default`` and ``controlnet`` profiles load
|
||||
# the same base weights and share the same vendor-adaptive strength.
|
||||
# separate base model, so both the ``sdxl`` and ``controlnet`` profiles load the
|
||||
# same base weights and share the same vendor-adaptive strength ladder (see below).
|
||||
CONTROLNET_CANNY_MODEL = "xinsir/controlnet-canny-sdxl-1.0"
|
||||
|
||||
# Vendor-adaptive default denoising strength for the SDXL img2img scrub, overridable
|
||||
# from the CLI (`--strength`). The right strength depends on which vendor's SynthID is
|
||||
# present, detected from the C2PA issuer (metadata.synthid_source). Oracle-verified
|
||||
# controlled study (2026-06-01, clean v0.8.6, per-image openai.com/verify or Gemini-app
|
||||
# verdict; see docs/synthid.md section 2.2):
|
||||
# - OpenAI gpt-image: removed at 0.05 across 1024-1600 (n=4), resolution-independent.
|
||||
# OPENAI_STRENGTH 0.10 = the 0.05 floor plus a 2x margin (keeps quality high).
|
||||
# - Google Gemini: removed at 0.15 on the capped-1536 path (n=4); 0.05/0.10 do NOT
|
||||
# clear. GEMINI_STRENGTH 0.15. CAVEAT: 0.15 was validated only on
|
||||
# `--max-resolution 1536`; native 2816 (the default path) was not locally
|
||||
# measurable (OOM on Apple Silicon) and may need more -- pending GPU validation on
|
||||
# the raiw.cc backend. If a native large Gemini still verifies positive at 0.15,
|
||||
# raise `--strength`.
|
||||
# - Unknown vendor (metadata stripped, or non-OpenAI/Google C2PA): UNKNOWN_STRENGTH
|
||||
# 0.15, the safe middle that clears both vendors at the tested resolutions.
|
||||
# The dominant factor is VENDOR, not resolution: Google's SynthID is ~3x more robust
|
||||
# than OpenAI's. The ``controlnet`` pipeline shares these strengths (same SDXL base; the
|
||||
# canny ControlNet only preserves structure, the strength still drives removal).
|
||||
OPENAI_STRENGTH = 0.10
|
||||
GEMINI_STRENGTH = 0.15
|
||||
UNKNOWN_STRENGTH = 0.15
|
||||
# Backwards-compatible alias: the vendor-unknown default (what a caller gets without a
|
||||
# present (detected from the C2PA issuer, metadata.synthid_source). The SAME ladder
|
||||
# applies to BOTH pipelines (`sdxl` plain img2img and `controlnet`) -- see "why one
|
||||
# ladder" below.
|
||||
#
|
||||
# Data basis (see docs/synthid.md sections 2.2 / 5.5): the values are the ORACLE-
|
||||
# CERTIFIED controlnet floors (2026-06-04, isolated Modal cert app, each vendor on its
|
||||
# own verifier): OpenAI 0.20 (2 photoreal x 3 seeds = 6/6 clean, resolution-independent),
|
||||
# Google 0.30 (clean on 2/2 seeds, validated ONLY at <= 1536 -- Gemini is resolution-
|
||||
# sensitive, native ~2816 likely needs ~0.35+). Unknown vendor gets the Google (more
|
||||
# robust watermark) value: safe-by-default.
|
||||
#
|
||||
# Why ONE ladder for both pipelines (2026-06-09): the certification was run on
|
||||
# controlnet, and it does NOT transfer to `sdxl` by symmetry -- the two pipelines have
|
||||
# OPPOSITE hard cases (controlnet leaves SynthID on photoreal, `sdxl` leaves it on flat
|
||||
# graphics; the content-x-pipeline table in docs/synthid.md §5.1). BUT on its OWN hard
|
||||
# case (flat fills) `sdxl` is the WEAKER remover -- plain img2img at low strength barely
|
||||
# perturbs a flat region -- so it needs AT LEAST as much strength as controlnet, not
|
||||
# less. Hence the certified controlnet floor is the right floor for `sdxl` too. The
|
||||
# higher strength costs little quality where it matters: `controlnet` is now the default
|
||||
# pipeline, so `sdxl` is reached only for structure-less inputs (via `--auto`) or an
|
||||
# explicit `--pipeline sdxl`, where over-regeneration has no faces/text to damage. NOTE:
|
||||
# this is a MARGIN argument for `sdxl`, not a fresh certification -- there is no local
|
||||
# SynthID detector, so if an oracle still reads SynthID on a flat `sdxl` output, raise
|
||||
# `--strength`.
|
||||
OPENAI_STRENGTH = 0.20
|
||||
GEMINI_STRENGTH = 0.30
|
||||
UNKNOWN_STRENGTH = 0.30
|
||||
# Backwards-compatible alias: the vendor-unknown value (what a caller gets without a
|
||||
# detected vendor). Kept as DEFAULT_STRENGTH for existing references.
|
||||
DEFAULT_STRENGTH = UNKNOWN_STRENGTH
|
||||
|
||||
@@ -47,17 +69,29 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH
|
||||
_VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH}
|
||||
|
||||
|
||||
def strength_default_help() -> str:
|
||||
"""One-line description of the vendor-adaptive default, derived from the constants.
|
||||
|
||||
Single source of truth for the CLI ``--strength`` help so the numbers can never
|
||||
drift from the actual ladder (they did once when the per-pipeline split was unified).
|
||||
"""
|
||||
return (
|
||||
f"vendor-adaptive (OpenAI {OPENAI_STRENGTH} / Google {GEMINI_STRENGTH} / "
|
||||
f"unknown {UNKNOWN_STRENGTH}, from the C2PA issuer; same ladder for both pipelines)"
|
||||
)
|
||||
|
||||
|
||||
def resolve_strength(strength: float | None, vendor: str | None = None) -> float:
|
||||
"""Resolve the denoising strength, applying the vendor default when unset.
|
||||
|
||||
``None`` means "the user did not pass ``--strength``", which resolves
|
||||
**vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from
|
||||
``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
|
||||
``UNKNOWN_STRENGTH``. An explicit value always wins (including ``0.0`` -- the check
|
||||
is ``is None``, not falsiness). The ``default`` and ``controlnet`` profiles share
|
||||
the same SDXL base (the ControlNet only preserves structure), so the default does
|
||||
NOT depend on the profile. Shared by the CLI (for display) and the engine (for
|
||||
execution) so the two never disagree -- both must pass the SAME ``vendor``.
|
||||
``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module
|
||||
comment for why one ladder is correct). An explicit value always wins (including
|
||||
``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display)
|
||||
and the engine (for execution) so the two never disagree -- both must pass the SAME
|
||||
``vendor``.
|
||||
"""
|
||||
if strength is not None:
|
||||
return strength
|
||||
@@ -90,11 +124,11 @@ def vendor_for_strength(image_path: Path) -> Literal["openai", "google"] | None:
|
||||
def get_model_id_for_profile(profile: str) -> str:
|
||||
"""Map CLI model profile names to concrete Hugging Face model IDs.
|
||||
|
||||
Both ``default`` and ``controlnet`` use the SDXL base checkpoint -- the canny
|
||||
Both ``sdxl`` and ``controlnet`` use the SDXL base checkpoint -- the canny
|
||||
ControlNet (``CONTROLNET_CANNY_MODEL``) is an add-on loaded on top of it, not a
|
||||
separate base model.
|
||||
separate base model. The legacy ``default`` alias resolves to ``sdxl``.
|
||||
"""
|
||||
normalized = profile.strip().lower()
|
||||
if normalized in ("default", "controlnet"):
|
||||
normalized = normalize_profile(profile)
|
||||
if normalized in (SDXL_PROFILE, CONTROLNET_PROFILE):
|
||||
return DEFAULT_MODEL_ID
|
||||
raise ValueError(f"Unknown model profile '{profile}'. Use one of: default, controlnet.")
|
||||
raise ValueError(f"Unknown model profile '{profile}'. Use one of: sdxl, controlnet.")
|
||||
|
||||
@@ -1,13 +1,17 @@
|
||||
"""Watermark removal using diffusion model regeneration attack.
|
||||
|
||||
Two pipelines:
|
||||
1. ``default`` -- plain SDXL img2img. Partial-noise regeneration scrubs the
|
||||
invisible watermark; ``strength`` controls how much is regenerated.
|
||||
2. ``controlnet`` -- SDXL img2img with a canny ControlNet. The watermark REMOVAL
|
||||
still comes from the img2img regeneration (``strength``); the ControlNet only
|
||||
PRESERVES structure (text/faces) by conditioning on the edge map. No original
|
||||
pixels are ever copied or frozen, so SynthID does not survive.
|
||||
1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark
|
||||
REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet
|
||||
only PRESERVES structure (text/faces) by conditioning on the edge map. No original
|
||||
pixels are ever copied or frozen. Because the edge map keeps the regeneration
|
||||
closer to the original, it needs a higher ``strength`` floor than ``default`` to
|
||||
destroy SynthID (the certified controlnet ladder; see ``watermark_profiles``).
|
||||
``controlnet_conditioning_scale`` is the preservation knob.
|
||||
2. ``default`` -- plain SDXL img2img. Partial-noise regeneration scrubs the
|
||||
invisible watermark; ``strength`` controls how much is regenerated. Lighter (no
|
||||
ControlNet weights), but at the low default strength it leaves SynthID on
|
||||
flat-graphic content -- use it for inputs without text/faces.
|
||||
"""
|
||||
|
||||
# torch/diffusers/cv2 boundary: these libs ship no usable types for the tensor and
|
||||
@@ -32,6 +36,7 @@ from remove_ai_watermarks.noai.watermark_profiles import (
|
||||
CONTROLNET_CANNY_MODEL,
|
||||
DEFAULT_MODEL_ID,
|
||||
DEFAULT_STRENGTH,
|
||||
normalize_profile,
|
||||
resolve_strength,
|
||||
)
|
||||
|
||||
@@ -323,13 +328,14 @@ class WatermarkRemover:
|
||||
torch_dtype: Any = None,
|
||||
progress_callback: Callable[[str], None] | None = None,
|
||||
hf_token: str | None = None,
|
||||
pipeline: str = "default",
|
||||
pipeline: str = "controlnet",
|
||||
controlnet_conditioning_scale: float = 1.0,
|
||||
) -> None:
|
||||
self.model_id = model_id or self.DEFAULT_MODEL_ID
|
||||
# The pipeline profile is threaded explicitly (not inferred from model_id):
|
||||
# both "default" and "controlnet" use the same SDXL base checkpoint.
|
||||
self.model_profile = pipeline
|
||||
# both "sdxl" and "controlnet" use the same SDXL base checkpoint. Normalize so
|
||||
# the legacy "default" alias resolves to "sdxl".
|
||||
self.model_profile = normalize_profile(pipeline)
|
||||
self.controlnet_conditioning_scale = controlnet_conditioning_scale
|
||||
|
||||
if not is_watermark_removal_available():
|
||||
|
||||
@@ -1,117 +0,0 @@
|
||||
"""Tests for the --auto pipeline planner (content-adaptive mode selection).
|
||||
|
||||
Detection runs on synthetic images; the face-present routing is exercised by
|
||||
monkeypatching ``detect_face`` (a real detectable face fixture is private, never
|
||||
committed). The planner is cv2-only and torch-free.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from remove_ai_watermarks import auto_config, image_io
|
||||
|
||||
|
||||
def _write(img, tmp_path, name="x.png"):
|
||||
p = tmp_path / name
|
||||
image_io.imwrite(p, img)
|
||||
return p
|
||||
|
||||
|
||||
class TestDetectors:
|
||||
def test_detect_face_false_on_flat(self):
|
||||
flat = np.full((200, 200, 3), 128, dtype=np.uint8)
|
||||
assert auto_config.detect_face(flat) is False
|
||||
|
||||
def test_edge_density_flat_near_zero(self):
|
||||
flat = np.full((200, 200, 3), 128, dtype=np.uint8)
|
||||
assert auto_config.edge_density(flat) < 0.001
|
||||
|
||||
def test_edge_density_text_higher_than_blank(self):
|
||||
blank = np.full((200, 400, 3), 255, dtype=np.uint8)
|
||||
text = blank.copy()
|
||||
cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3)
|
||||
assert auto_config.edge_density(text) > auto_config.edge_density(blank)
|
||||
|
||||
def test_dbnet_detects_text_card(self):
|
||||
"""The bundled PP-OCRv3 DBNet model fires on a clear text card and not on flat."""
|
||||
card = np.full((300, 500, 3), 255, dtype=np.uint8)
|
||||
cv2.putText(card, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
|
||||
assert auto_config._detect_text_dbnet(card) is True
|
||||
assert auto_config._detect_text_dbnet(np.full((300, 500, 3), 128, dtype=np.uint8)) is False
|
||||
|
||||
def test_detect_text_falls_back_to_mser_when_dbnet_unavailable(self, monkeypatch):
|
||||
"""If DBNet can't load (returns None), detect_text uses the MSER heuristic."""
|
||||
monkeypatch.setattr(auto_config, "_detect_text_dbnet", lambda _img: None)
|
||||
called = {}
|
||||
|
||||
def _fake_mser(_img):
|
||||
called["mser"] = True
|
||||
return True
|
||||
|
||||
monkeypatch.setattr(auto_config, "_detect_text_mser", _fake_mser)
|
||||
assert auto_config.detect_text(np.full((100, 100, 3), 128, dtype=np.uint8)) is True
|
||||
assert called.get("mser") is True
|
||||
|
||||
|
||||
class TestPlan:
|
||||
def test_unreadable_returns_none(self, tmp_path):
|
||||
assert auto_config.plan(tmp_path / "does_not_exist.png") is None
|
||||
|
||||
def test_flat_image_is_default_pipeline_no_polish(self, tmp_path):
|
||||
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
|
||||
cfg = auto_config.plan(_write(flat, tmp_path))
|
||||
assert cfg is not None
|
||||
assert cfg.pipeline == "default" # structure-less -> plain SDXL
|
||||
assert cfg.adaptive_polish is False # no smoothing pass -> no polish
|
||||
assert cfg.unsharp == 0.0
|
||||
assert cfg.humanize == 0.0
|
||||
assert cfg.min_resolution == 1024
|
||||
|
||||
def test_text_image_uses_controlnet(self, tmp_path):
|
||||
img = np.full((300, 500, 3), 255, dtype=np.uint8)
|
||||
cv2.putText(img, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
|
||||
cfg = auto_config.plan(_write(img, tmp_path))
|
||||
assert cfg is not None
|
||||
# Text creates edges above the structure-less floor -> controlnet preserves them.
|
||||
assert cfg.pipeline == "controlnet"
|
||||
|
||||
def test_face_routes_to_controlnet_and_polish(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setattr(auto_config, "detect_face", lambda _img: True)
|
||||
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
|
||||
cfg = auto_config.plan(_write(flat, tmp_path))
|
||||
assert cfg is not None
|
||||
assert cfg.has_face
|
||||
assert cfg.pipeline == "controlnet"
|
||||
assert cfg.adaptive_polish # smoothing pass ran -> adaptive polish on
|
||||
assert cfg.unsharp == 0.0 # fixed knobs off; the adaptive polish replaces them
|
||||
assert cfg.humanize == 0.0
|
||||
|
||||
def test_text_signal_forces_controlnet_on_flat(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setattr(auto_config, "detect_text", lambda _img: True)
|
||||
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
|
||||
cfg = auto_config.plan(_write(flat, tmp_path))
|
||||
assert cfg is not None
|
||||
assert cfg.has_text
|
||||
assert cfg.pipeline == "controlnet"
|
||||
|
||||
|
||||
class TestReason:
|
||||
def test_reason_summarizes_plan(self):
|
||||
cfg = auto_config.AutoConfig(
|
||||
pipeline="controlnet",
|
||||
adaptive_polish=True,
|
||||
unsharp=0.0,
|
||||
humanize=0.0,
|
||||
min_resolution=1024,
|
||||
has_face=True,
|
||||
has_text=False,
|
||||
edge_density=0.05,
|
||||
width=800,
|
||||
height=600,
|
||||
)
|
||||
r = cfg.reason
|
||||
assert "controlnet" in r
|
||||
assert "face" in r
|
||||
assert "adaptive polish" in r
|
||||
+71
-20
@@ -277,6 +277,72 @@ class TestInvisibleCommand:
|
||||
expected = sample_png.with_stem(sample_png.stem + "_clean")
|
||||
assert expected.exists()
|
||||
|
||||
def test_invisible_adaptive_polish_on_by_default(self, runner, sample_png):
|
||||
mock_cls, mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
):
|
||||
result = runner.invoke(main, ["invisible", str(sample_png)])
|
||||
assert result.exit_code == 0, result.output
|
||||
# adaptive_polish is ON by default (self-gating, so a no-op where not needed).
|
||||
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
|
||||
# Default model is None (the SDXL base) and CFG is None (the library's 7.5).
|
||||
assert mock_cls.call_args.kwargs["model_id"] is None
|
||||
assert mock_engine.remove_watermark.call_args.kwargs["guidance_scale"] is None
|
||||
|
||||
def test_invisible_no_adaptive_polish_disables(self, runner, sample_png):
|
||||
mock_cls, mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
):
|
||||
result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish"])
|
||||
assert result.exit_code == 0, result.output
|
||||
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is False
|
||||
|
||||
def test_invisible_model_and_guidance_scale_flow_to_engine(self, runner, sample_png):
|
||||
mock_cls, mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
):
|
||||
result = runner.invoke(
|
||||
main,
|
||||
["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5"],
|
||||
)
|
||||
assert result.exit_code == 0, result.output
|
||||
assert mock_cls.call_args.kwargs["model_id"] == "org/custom-sdxl"
|
||||
assert mock_engine.remove_watermark.call_args.kwargs["guidance_scale"] == 5.5
|
||||
|
||||
def test_pipeline_default_alias_warns_and_maps_to_sdxl(self, runner, sample_png):
|
||||
mock_cls, _mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
):
|
||||
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default"])
|
||||
assert result.exit_code == 0, result.output
|
||||
# The legacy value warns and is normalized to "sdxl" before the engine is built.
|
||||
assert "deprecated" in result.output.lower()
|
||||
assert mock_cls.call_args.kwargs["pipeline"] == "sdxl"
|
||||
|
||||
def test_pipeline_sdxl_does_not_warn(self, runner, sample_png):
|
||||
mock_cls, _mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
):
|
||||
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl"])
|
||||
assert result.exit_code == 0, result.output
|
||||
assert "deprecated" not in result.output.lower()
|
||||
assert mock_cls.call_args.kwargs["pipeline"] == "sdxl"
|
||||
|
||||
def test_invisible_nonexistent_file(self, runner):
|
||||
result = runner.invoke(main, ["invisible", "/nonexistent/file.png"])
|
||||
assert result.exit_code != 0
|
||||
@@ -514,32 +580,17 @@ class TestBatchCommand:
|
||||
assert out[0, 0, 3] == 0
|
||||
assert out[100, 100, 3] == 255
|
||||
|
||||
def test_batch_auto_plans_pipeline_per_image(self, runner, tmp_path):
|
||||
"""--auto in batch re-plans the pipeline/restore/polish per image and
|
||||
builds one engine per resolved pipeline."""
|
||||
from remove_ai_watermarks import auto_config
|
||||
|
||||
def test_batch_auto_is_deprecated_and_enables_polish(self, runner, tmp_path):
|
||||
"""--auto is retired: it warns and just enables the adaptive polish (the
|
||||
pipeline is always the default controlnet now)."""
|
||||
input_dir = _make_batch_dir(tmp_path, count=2)
|
||||
output_dir = tmp_path / "output"
|
||||
plan = auto_config.AutoConfig(
|
||||
pipeline="controlnet",
|
||||
adaptive_polish=True,
|
||||
unsharp=0.0,
|
||||
humanize=0.0,
|
||||
min_resolution=1024,
|
||||
has_face=True,
|
||||
has_text=False,
|
||||
edge_density=0.05,
|
||||
width=200,
|
||||
height=200,
|
||||
)
|
||||
mock_cls, mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.auto_config.plan", return_value=plan),
|
||||
):
|
||||
result = runner.invoke(
|
||||
main,
|
||||
@@ -547,9 +598,9 @@ class TestBatchCommand:
|
||||
)
|
||||
assert result.exit_code == 0, result.output
|
||||
assert "2 processed" in result.output
|
||||
# Engine built with the auto-resolved controlnet pipeline.
|
||||
assert "deprecated" in result.output.lower()
|
||||
# Pipeline stays the default controlnet; --auto only turned the polish on.
|
||||
assert mock_cls.call_args.kwargs["pipeline"] == "controlnet"
|
||||
# The auto plan's adaptive polish reached the engine call.
|
||||
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
|
||||
|
||||
def test_batch_default_output_dir(self, runner, tmp_path):
|
||||
|
||||
+27
-4
@@ -21,7 +21,9 @@ from remove_ai_watermarks.noai.watermark_profiles import (
|
||||
OPENAI_STRENGTH,
|
||||
UNKNOWN_STRENGTH,
|
||||
get_model_id_for_profile,
|
||||
normalize_profile,
|
||||
resolve_strength,
|
||||
strength_default_help,
|
||||
)
|
||||
from remove_ai_watermarks.noai.watermark_remover import get_device, is_watermark_removal_available
|
||||
|
||||
@@ -111,8 +113,14 @@ class TestMpsErrorDetection:
|
||||
class TestModelProfiles:
|
||||
"""Tests for watermark_profiles.py."""
|
||||
|
||||
def test_default_profile(self):
|
||||
def test_sdxl_profile(self):
|
||||
assert get_model_id_for_profile("sdxl") == "stabilityai/stable-diffusion-xl-base-1.0"
|
||||
|
||||
def test_default_alias_resolves_to_sdxl(self):
|
||||
# "default" is the legacy alias for "sdxl" (back-compat for existing scripts).
|
||||
assert get_model_id_for_profile("default") == "stabilityai/stable-diffusion-xl-base-1.0"
|
||||
assert normalize_profile("default") == "sdxl"
|
||||
assert normalize_profile("controlnet") == "controlnet"
|
||||
|
||||
def test_controlnet_profile(self):
|
||||
# controlnet shares the SDXL base checkpoint (the ControlNet is an add-on).
|
||||
@@ -127,9 +135,9 @@ class TestResolveStrength:
|
||||
"""resolve_strength applies the vendor default only when strength is unset."""
|
||||
|
||||
def test_none_is_vendor_adaptive(self):
|
||||
# No vendor -> unknown default; OpenAI lower, Google == unknown. The default
|
||||
# is vendor-adaptive and does NOT depend on the pipeline profile (default and
|
||||
# controlnet share the same SDXL base).
|
||||
# No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder
|
||||
# applies to both pipelines (the certified controlnet floors), so there is no
|
||||
# pipeline argument.
|
||||
assert resolve_strength(None) == UNKNOWN_STRENGTH
|
||||
assert resolve_strength(None, "openai") == OPENAI_STRENGTH
|
||||
assert resolve_strength(None, "google") == GEMINI_STRENGTH
|
||||
@@ -137,10 +145,25 @@ class TestResolveStrength:
|
||||
# An unrecognized vendor string falls through to the unknown default.
|
||||
assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH
|
||||
|
||||
def test_ladder_is_the_certified_controlnet_floors(self):
|
||||
# The unified ladder == the oracle-certified controlnet floors (OpenAI 0.20,
|
||||
# Google/unknown 0.30); Google is the more-robust watermark, so it is higher.
|
||||
assert OPENAI_STRENGTH == 0.20
|
||||
assert GEMINI_STRENGTH == 0.30
|
||||
assert UNKNOWN_STRENGTH == 0.30
|
||||
assert OPENAI_STRENGTH < GEMINI_STRENGTH
|
||||
|
||||
def test_default_strength_alias_is_unknown_vendor_value(self):
|
||||
assert DEFAULT_STRENGTH == UNKNOWN_STRENGTH
|
||||
assert OPENAI_STRENGTH < UNKNOWN_STRENGTH
|
||||
|
||||
def test_strength_default_help_derives_from_constants(self):
|
||||
# The CLI --strength help is built from this, so it can never drift from the ladder.
|
||||
h = strength_default_help()
|
||||
assert str(OPENAI_STRENGTH) in h
|
||||
assert str(GEMINI_STRENGTH) in h
|
||||
assert str(UNKNOWN_STRENGTH) in h
|
||||
|
||||
def test_explicit_value_overrides_vendor(self):
|
||||
assert resolve_strength(0.3) == 0.3
|
||||
assert resolve_strength(0.3, "openai") == 0.3
|
||||
|
||||
Reference in New Issue
Block a user