feat(invisible): controlnet default, unified strength, retire --auto, add --model/--guidance-scale

Overhaul the diffusion-removal surface around a single robust default and a
complete, consistent CLI.

Pipeline + strength:
- controlnet is now the DEFAULT pipeline (CLI --pipeline + both engine ctors).
  With the certified higher strength it clears both photoreal and flat-graphic
  content, whereas plain SDXL left SynthID on flat graphics.
- Rename the plain-SDXL profile default -> sdxl; "default" stays as a back-compat
  alias (normalize_profile + a click callback that warns).
- Unify the strength ladder: resolve_strength applies ONE vendor-adaptive ladder
  (the certified controlnet floors OpenAI 0.20 / Google 0.30 / unknown 0.30) to
  both pipelines. sdxl is the weaker remover on its own hard case (flat fills),
  so the certified floor is the right floor for it too.

CLI completeness:
- Add --model (HF model id) to invisible + batch (was only on all) and
  --guidance-scale (CFG) to all three diffusion commands; both were library
  knobs the CLI did not expose.
- Flip --adaptive-polish to ON by default (it self-gates to a no-op where there
  is no detail deficit, so default-on is safe).
- Share --pipeline / --strength / --model / --guidance-scale as single
  decorators so invisible/all/batch keep an identical surface; the --strength
  help is derived from the strength constants (strength_default_help) so it can
  never drift from the ladder.

Removals:
- Delete the auto_config content-detection planner + its YuNet/DBNet assets
  (~2.6 MB): with controlnet always the pipeline and the polish self-gating, the
  face/text/edge detection no longer changed behavior. --auto is now a deprecated
  no-op that only warns (the polish it enabled is the default).

Docs (README, CLAUDE.md, docs/synthid.md) updated throughout; added an
InvisibleEngine Python API example. Tests cover the alias warnings, the
polish default, and the --model/--guidance-scale wiring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-09 12:40:45 -07:00
parent efc5b4a9af
commit b1189549b8
13 changed files with 395 additions and 584 deletions
+9 -8
View File
File diff suppressed because one or more lines are too long
+42 -16
View File
@@ -23,7 +23,7 @@ If this tool saves you time, consider [sponsoring its development](https://githu
- **AI metadata stripping** — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, **MP4 / MOV / M4V / M4A** at the container level, and **WebM / MP3 / WAV / FLAC / OGG** losslessly via ffmpeg), XMP DigitalSourceType
- **"Made with AI" label removal** — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
- **Analog Humanizer** — optional film grain and chromatic aberration post-processing
- **Text and face preservation (experimental)** — optional `--pipeline controlnet` adds a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
- **Text and face preservation (default)** — the default pipeline is a canny ControlNet that keeps text and face structure sharp through the removal pass (without copying original pixels, so SynthID is still removed). Use `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces. Canny preserves face *structure*, not *identity* (the regenerated face drifts in likeness). The library does not ship a face-restore extra: every approach evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned) regenerated the face via SDXL and made the output look more AI-generated than the cleaned image. The cleaned controlnet output is the least-AI face state achievable without re-introducing SynthID.
- **Batch processing** — process entire directories
- **Detection** — three-stage NCC watermark detection with confidence scoring
- **Provenance detection (`identify`)** — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 `AISystemUsed` field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace `hf-job-id` job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" / Samsung Galaxy AI "Contenuti generati dall'AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the `trustmark` extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (`--json` for machine output)
@@ -118,15 +118,16 @@ The removal pipeline (default profile, SDXL):
image → encode to latent space (VAE) at native resolution
→ add controlled noise (forward diffusion)
→ denoise (reverse diffusion, ~50 steps; strength is vendor-adaptive:
0.10 OpenAI / 0.15 Google / 0.15 unknown, override with --strength)
0.20 OpenAI / 0.30 Google / 0.30 unknown, same for both pipelines;
override with --strength)
→ decode back to pixels (VAE)
```
- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. The floor upscale uses Lanczos by default; `--upscaler esrgan` (the `esrgan` extra) runs Real-ESRGAN first for sharper detail and falls back to Lanczos if the extra is absent. ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it is best for photo/texture content -- it can degrade faces (the diffusion pass regenerates them, so the final recovers) and thin text; keep Lanczos for text-heavy inputs.
> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength that clears it with the least quality loss: **OpenAI gpt-image → `0.10`**, **Google Gemini → `0.15`**, **unknown source → `0.15`**. An oracle-verified June 2026 study (clean pipeline, per-image openai.com/verify or Gemini app) found OpenAI's watermark clears at `0.05` across `1024`-`1600` px (resolution-independent) while Google's is ~3x more robust and needs `0.15`. The dominant factor is the vendor, not resolution. There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine text, lower it. (Caveat: Google's `0.15` was validated on the capped `--max-resolution 1536` path; a very large native Gemini image may need more.)
> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength accordingly: **OpenAI gpt-image → `0.20`**, **Google Gemini → `0.30`**, **unknown source → `0.30`**. The **same ladder applies to both pipelines** — these are the oracle-certified `controlnet` floors (June 2026 Modal cert, multi-seed). They also cover plain `sdxl`: the two pipelines have opposite hard cases (controlnet leaves SynthID on photoreal, sdxl on flat graphics), but on its own hard case sdxl is the weaker remover, so it needs at least controlnet's strength — using one certified ladder is the safe choice (margin-based for sdxl, not separately certified). The dominant factor is the vendor (Google's SynthID is ~3x more robust). There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine detail, lower it. (Caveat: Google's `0.30` was validated only at `--max-resolution 1536`; a very large native Gemini image may need ~`0.35`+.)
>
> **`--pipeline controlnet` preserves text and face structure (experimental, opt-in).** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen, so SynthID does not survive. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically).
> **The default pipeline is `controlnet` — it preserves text and face structure.** It runs the same SDXL img2img scrub but adds a canny ControlNet that conditions the regeneration on the image's edge map, so text and structure stay sharp at the strengths that remove SynthID. The watermark removal still comes from the img2img regeneration (`--strength`); the ControlNet only preserves structure — no original pixels are copied or frozen. The default strength ladder (OpenAI `0.20` / Google `0.30`) is the oracle-certified controlnet floor. `--controlnet-scale` tunes the preservation strength (higher = closer to the original structure). Runs fp32 on mps/cpu (fp16 only on cuda/xpu, where the fp16-fixed SDXL VAE is loaded automatically). Pass `--pipeline sdxl` for plain SDXL img2img (lighter, no extra model download) on inputs without text or faces.
>
> **No face-restore extra in the library.** Every ArcFace-based regeneration approach we evaluated (GFPGAN-on-cleaned, PhotoMaker-V2, InstantID txt2img, InstantID img2img-on-cleaned at three parameter sweeps, 2026-06-04 - 2026-06-08 Modal cert sweeps) regenerated the face via SDXL diffusion — the output face pixels were diffusion-fresh (SynthID not re-introduced), but the face inherently looked more AI-generated than the cleaned image (SDXL "clean skin" gloss, lost original identity precision). The cleaned image from the main controlnet 0.20 pass is the least-AI face state we can reach without re-introducing SynthID. Empirical conclusion in `docs/synthid-robust-identity-research-2026-06-08.md`.
@@ -136,7 +137,7 @@ SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 P
> **Technical deep-dive:** see [`docs/synthid.md`](docs/synthid.md) for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203).
**Text and face preservation** (experimental, opt-in `--pipeline controlnet`): adds a canny ControlNet so text and face *structure* stay sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The library does not ship a face-restore extra (see the callout above).
**Text and face preservation** (the default pipeline; `--pipeline sdxl` opts down to plain SDXL): a canny ControlNet keeps text and face *structure* sharp through the removal pass, without copying or freezing any original pixels (so SynthID is still removed). Tune the preservation strength with `--controlnet-scale`. Canny preserves structure but not face *identity*: the regenerated face drifts in likeness. The library does not ship a face-restore extra (see the callout above).
**Analog Humanizer**: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the [arXiv:2605.09203](https://arxiv.org/abs/2605.09203) note above.)
@@ -292,14 +293,15 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0
# first (disable with --min-resolution 0); --upscaler esrgan uses Real-ESRGAN for
# that floor upscale (needs the 'esrgan' extra). On a very large image that OOMs the
# GPU/MPS, cap the long side: --max-resolution 2048
# Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override
# with --strength. To preserve text/face structure, use --pipeline controlnet
# Or let it choose: --auto picks the pipeline and an adaptive polish
# from the image content (controlnet when there is text/structure, polish that
# restores the input's detail level while sparing text). Every choice is
# overridable: --pipeline and --no-adaptive-polish win over the auto pick.
# Experimental.
# (SDXL + canny ControlNet); tune preservation with --controlnet-scale. Add
# Strength is vendor-adaptive by default (OpenAI 0.20 / Google 0.30, same
# for both pipelines); override with --strength. controlnet (text/face
# structure preservation) is the default pipeline; --pipeline sdxl opts down
# to plain SDXL for non-structure inputs. Tune structure preservation with
# --controlnet-scale, the CFG with --guidance-scale (default 7.5), and the
# diffusion model with --model (default: SDXL base).
# --adaptive-polish (ON by default) restores the input's detail level (sparing
# text) to counter the over-smoothed look; it self-limits to a no-op where
# there is no detail deficit. Disable with --no-adaptive-polish.
# Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels)
# --check also flags SynthID-bearing sources: a C2PA manifest signed by
@@ -312,9 +314,9 @@ remove-ai-watermarks metadata image.png --remove
# Batch with a specific mode
remove-ai-watermarks batch ./images/ --mode visible
# Batch also accepts --auto (and --adaptive-polish): the plan is recomputed per
# image, so a mixed directory routes each file to the right pipeline
remove-ai-watermarks batch ./images/ --mode all --auto
# Batch accepts the full invisible knob set (--strength/--guidance-scale/--model/
# --pipeline/...); --adaptive-polish is on by default (--no-adaptive-polish to disable)
remove-ai-watermarks batch ./images/ --mode all
```
### Python API
@@ -335,6 +337,30 @@ clean = engine.remove_watermark(image)
cv2.imwrite("clean.png", clean)
```
#### Invisible removal (diffusion)
```python
from pathlib import Path
from remove_ai_watermarks.invisible_engine import InvisibleEngine
# pipeline: "controlnet" (default, preserves text/face structure) or "sdxl" (plain).
# model_id=None uses the SDXL base; controlnet_conditioning_scale tunes preservation.
engine = InvisibleEngine(pipeline="controlnet")
engine.remove_watermark(
Path("watermarked.png"),
Path("clean.png"),
strength=None, # None = vendor-adaptive default (OpenAI 0.20 / Google 0.30)
num_inference_steps=50,
guidance_scale=None, # None = the library default (7.5)
seed=None, # set for reproducible output
adaptive_polish=True, # detail-targeted polish, self-gating (default on in the CLI)
min_resolution=1024, # upscale tiny inputs to this floor before diffusion
max_resolution=0, # 0 = native; set only to cap GPU/MPS memory
upscaler="lanczos", # or "esrgan" for the floor upscale (needs the 'esrgan' extra)
)
```
### Metadata stripping
```python
+30 -15
View File
@@ -382,12 +382,10 @@ the payload, reconstituting SynthID in text. The lesson held and shaped the
current design: **content is preserved by REGENERATING it under structural
conditioning, never by copying original pixels.**
Both preservation features below are **EXPERIMENTAL and opt-in (off by default)**;
the plain `default` SDXL img2img pass is the shippable path.
- **Text + structure:** `--pipeline controlnet` (SDXL img2img + a canny ControlNet,
experimental/opt-in) conditions the regeneration on the edge map, so text and
structure stay sharp while every pixel is still regenerated. Text legibility is
- **Text + structure:** `--pipeline controlnet` (SDXL img2img + a canny ControlNet) is
**THE DEFAULT pipeline since 2026-06-09** (`--pipeline default` opts down to plain
SDXL img2img for inputs without text/faces). It conditions the regeneration on the
edge map, so text and structure stay sharp while every pixel is still regenerated. Text legibility is
better than plain img2img at the same strength (text stays readable where plain
garbles it). **BUT removal efficacy at the low vendor-adaptive strength is CONTENT ×
PIPELINE dependent and NEITHER pipeline clears all content -- oracle-validated
@@ -407,7 +405,13 @@ the plain `default` SDXL img2img pass is the shippable path.
removal guarantee at today's strength -- pick by what you must PRESERVE (controlnet
for text/structure), then raise strength until the oracle reads clean. (The earlier
"reads clean on the oracle" claim held only for the one flat/text-background case it
was checked on; it does not generalize.)
was checked on; it does not generalize.) **UPDATE 2026-06-09: the default strengths
were raised and made pipeline-aware (controlnet ladder = the certified
0.20/0.30/0.30 floors, applied to BOTH pipelines as a single ladder -- see §5.2 for
why one ladder covers plain `sdxl` too) and controlnet is now the default pipeline.
The plain-SDXL profile was also renamed `default` -> `sdxl` (`default` stays as an
alias). The 0.10/0.15 numbers in this analysis are the PRE-raise values it was
measured at. See §5.2.**
- **Face identity:** canny holds face *structure* but not *identity*. Shipped as the
optional `--restore-faces` GFPGAN post-pass (`face_restore.py`, the `restore`
extra, experimental/opt-in, off by default). It runs GFPGAN on the ORIGINAL
@@ -448,14 +452,25 @@ study (section 2.2) gives empirical floors:
resolution stack). Use a GPU or `--max-resolution 1536`.
The default is **vendor-adaptive** (`watermark_profiles.resolve_strength` +
`vendor_for_strength`): the tool reads the C2PA issuer on the original input and
picks `OPENAI_STRENGTH` 0.10 / `GEMINI_STRENGTH` 0.15 / `UNKNOWN_STRENGTH` 0.15.
This uses the vendor signal we DO have locally (the C2PA SynthID proxy) to avoid
the overkill of a single high default on OpenAI images, without needing a local
pixel detector. An explicit `--strength` always wins. If the watermark still
survives (e.g. a large native Gemini beyond the capped-1536 validation), raise
toward 0.30 then 0.35-0.40 (0.40 visibly corrupts dense text), using the lowest
value that reads clean on the oracle.
`vendor_for_strength`): the tool reads the C2PA issuer on the original input and picks
`OPENAI_STRENGTH` 0.20 / `GEMINI_STRENGTH` 0.30 / `UNKNOWN_STRENGTH` 0.30. **The SAME
ladder applies to both pipelines** (`sdxl` and `controlnet`) -- these are the
oracle-certified controlnet floors (§5.5, the 2026-06-04 Modal cert). Why one ladder
covers plain `sdxl` too: the certification was run on controlnet and does NOT transfer
by symmetry (the two pipelines have OPPOSITE hard cases -- controlnet leaves SynthID on
photoreal, `sdxl` on flat graphics, the §5.1 content-x-pipeline table), BUT on its own
hard case (flat fills) `sdxl` is the WEAKER remover (plain img2img barely perturbs a
flat region at low strength), so it needs AT LEAST controlnet's strength -- the
certified floor is therefore the right floor for `sdxl` too. This is a MARGIN argument
for `sdxl`, not a separate certification (no local SynthID detector to self-verify).
The higher strength costs little quality where it matters, because `controlnet` is now
the default pipeline, so `sdxl` is reached only via an explicit `--pipeline sdxl` (a
deliberate opt-down), where over-regeneration has no faces/text to damage.
This uses the vendor signal we DO have locally (the C2PA SynthID proxy) to avoid the
overkill of a single high default on OpenAI images, without needing a local pixel
detector. An explicit `--strength` always wins. If the watermark still survives (e.g. a
large native Gemini beyond the capped-1536 validation), raise toward 0.35-0.40 (0.40
visibly corrupts dense text), using the lowest value that reads clean on the oracle.
### 5.3 Test methodology
-270
View File
@@ -1,270 +0,0 @@
"""Automatic pipeline planning for the ``--auto`` quality mode.
``plan(image_path)`` inspects the INPUT image (before the diffusion model loads)
and returns the quality modes to use, so the pipeline can adapt to content. It is
meant to run as the FIRST step of the invisible/all pipeline, wherever that pipeline
runs (locally, or the raiw.cc Modal GPU worker) -- never on a memory-constrained web
host (image work there OOM-crashes the container).
Routing is **quality-priority**: ControlNet (text/face-structure preservation) is the
default; it is only skipped for a clearly structure-less image (no face, no text,
near-zero edges), where plain SDXL is cheaper and just as good. A detected face only
routes to controlnet (canny preserves face STRUCTURE, not identity); there is no
identity restoration -- the whole face-restore family was removed (it regenerated the
face via SDXL and looked MORE AI-generated, see
docs/synthid-robust-identity-research-2026-06-08.md). When the controlnet smoothing
pass ran, the **adaptive polish** (``humanizer.adaptive_polish``) restores the input's
detail level -- a capped unsharp + edge-masked grain targeting the input's Laplacian
variance -- to counter the over-smoothed "AI look". It is self-limiting on
text/graphics (already high-frequency, so almost no polish) and spares text/edges by
masking the grain.
Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- DBNet (PP-OCRv3
differentiable-binarization via ``cv2.dnn.TextDetectionModel_DB``, a 2.4 MB Apache-2.0
model bundled in ``assets/``) for text, and a Canny ``edge_density``. The whole planner
peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs
anywhere the pipeline runs.
The text detector falls back to the old MSER region heuristic if the DBNet model can't
load. Either way text only ever ADDS controlnet, so a miss is backstopped by the
edge-density route and a false positive only costs a controlnet run.
"""
# cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules
# for this file only.
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportOptionalMemberAccess=false, reportOptionalCall=false, reportOptionalSubscript=false, reportOptionalOperand=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false, reportPrivateUsage=false, reportInvalidTypeForm=false, reportConstantRedefinition=false, reportUnnecessaryComparison=false
from __future__ import annotations
import logging
from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
# ── Routing thresholds (tunable; quality-priority -> controlnet unless clearly flat) ──
# Canny edge-density below this, AND no face AND no text -> plain SDXL (nothing to
# preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat
# gradient/solid image falls under 0.008.
_STRUCTURELESS_EDGE_MAX = 0.008
# MSER regions per megapixel above this -> likely text. The MSER path is now only the
# FALLBACK when the bundled DBNet model can't load; DBNet (below) is the primary text
# detector. Rough heuristic: a no-text portrait measures a few hundred/MP, dense text
# far more. Set high so it rarely false-fires; text only ever ADDS controlnet.
_TEXT_MSER_PER_MP = 1500.0
_FACE_SCORE = 0.6 # YuNet confidence for a face to count
# Downscale the long side to this for DETECTION only (faces stay detectable down to
# ~10px, and this bounds YuNet/DBNet/MSER cost on huge inputs). Removal runs at full res.
_DETECT_MAX_SIDE = 1024
# DBNet (PP-OCRv3 differentiable-binarization) text-region detector via cv2.dnn -- the
# primary "has meaningful text" signal. The model is the shared PP-OCRv3 detection net
# from OpenCV Zoo (Apache-2.0); en/cn variants are byte-identical, so it is bundled
# language-neutral. cv2.dnn is core OpenCV, so this adds NO new pip dependency.
_DBNET_ASSET = "text_detection_ppocrv3_2023may.onnx" # Apache-2.0 (OpenCV Zoo PP-OCRv3 DB)
_DBNET_BINARY_THRESHOLD = 0.3
_DBNET_POLYGON_THRESHOLD = 0.5
_DBNET_MAX_CANDIDATES = 200
_DBNET_UNCLIP_RATIO = 2.0
_DBNET_INPUT_SIDE = 736 # square input, multiple of 32 (PP-OCRv3 default)
_DBNET_MEAN = (122.67891434, 116.66876762, 104.00698793) # ImageNet mean * 255
_dbnet: Any = None # lazy singleton; set to False after a load failure (-> MSER fallback)
# When the controlnet smoothing pass ran, the adaptive polish
# (humanizer.adaptive_polish) restores the input's detail level, sparing text --
# replacing the old fixed unsharp/grain which over-/under-corrected and speckled text.
_UPSCALE_FLOOR = 1024
_YUNET_ASSET = "face_detection_yunet_2023mar.onnx" # MIT (Shiqi Yu), OpenCV Zoo
_yunet: Any = None # lazy singleton
@dataclass(frozen=True)
class AutoConfig:
"""Resolved quality modes from content analysis (the ``--auto`` plan)."""
pipeline: str # "default" | "controlnet"
adaptive_polish: bool # restore the input's detail level (sharpen + masked grain), sparing text
unsharp: float # fixed-polish knobs, 0 in auto (the adaptive polish replaces them)
humanize: float
min_resolution: int
# signals retained for logging / debugging a bad pick
has_face: bool
has_text: bool
edge_density: float
width: int
height: int
@property
def reason(self) -> str:
"""One-line human-readable summary of the plan (logged per image)."""
bits = ["face" if self.has_face else "no-face"]
if self.has_text:
bits.append("text")
bits.append(f"edges={self.edge_density:.3f}")
if self.adaptive_polish:
polish = ", adaptive polish"
elif self.unsharp or self.humanize:
polish = f", unsharp {self.unsharp}/grain {self.humanize}"
else:
polish = ""
return f"{'+'.join(bits)} -> {self.pipeline} pipeline{polish}"
def _to_bgr(image: NDArray[Any]) -> NDArray[Any]:
"""Normalize a 2D grayscale or 4-channel BGRA array to 3-channel BGR."""
import cv2
if image.ndim == 2:
return cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
if image.shape[2] == 4:
return cv2.cvtColor(image, cv2.COLOR_BGRA2BGR)
return image
def _to_gray(image: NDArray[Any]) -> NDArray[Any]:
"""Single-channel grayscale; passes a 2D (already-gray) input through unchanged."""
import cv2
if image.ndim == 3 and image.shape[2] >= 3:
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
return image
def _downscale_for_detection(image: NDArray[Any]) -> NDArray[Any]:
"""Shrink the long side to ``_DETECT_MAX_SIDE`` for cheap, bounded detection."""
import cv2
h, w = image.shape[:2]
long_side = max(h, w)
if long_side <= _DETECT_MAX_SIDE:
return image
scale = _DETECT_MAX_SIDE / long_side
return cv2.resize(image, (max(1, round(w * scale)), max(1, round(h * scale))), interpolation=cv2.INTER_AREA)
def detect_face(image: NDArray[Any]) -> bool:
"""True if OpenCV YuNet finds at least one face. cv2-only, torch-free."""
import cv2
global _yunet
img = _to_bgr(image)
h, w = img.shape[:2]
if h < 1 or w < 1:
return False
try:
if _yunet is None:
model = Path(__file__).parent / "assets" / _YUNET_ASSET
_yunet = cv2.FaceDetectorYN.create(str(model), "", (w, h), _FACE_SCORE, 0.3, 5000)
_yunet.setInputSize((w, h))
_, faces = _yunet.detect(img)
except cv2.error as e: # malformed input / model
logger.debug("YuNet face detect failed (%s); assuming no face", e)
return False
return faces is not None and len(faces) > 0
def _detect_text_dbnet(image: NDArray[Any]) -> bool | None:
"""DBNet (PP-OCRv3) text-region presence via cv2.dnn.
Returns True/False on a successful run, or None if the bundled model can't load
(the caller then falls back to the MSER heuristic). Loads once, lazily.
"""
import cv2
global _dbnet
if _dbnet is False: # a prior load failed; skip straight to the MSER fallback
return None
img = _to_bgr(image)
h, w = img.shape[:2]
if h < 1 or w < 1:
return False
try:
if _dbnet is None:
model = Path(__file__).parent / "assets" / _DBNET_ASSET
net = cv2.dnn.TextDetectionModel_DB(str(model))
net.setBinaryThreshold(_DBNET_BINARY_THRESHOLD)
net.setPolygonThreshold(_DBNET_POLYGON_THRESHOLD)
net.setMaxCandidates(_DBNET_MAX_CANDIDATES)
net.setUnclipRatio(_DBNET_UNCLIP_RATIO)
net.setInputParams(1.0 / 255.0, (_DBNET_INPUT_SIDE, _DBNET_INPUT_SIDE), _DBNET_MEAN)
_dbnet = net
boxes, _ = _dbnet.detect(img)
except Exception as e: # model load / inference can raise cv2.error or others
logger.debug("DBNet text detect failed (%s); falling back to MSER", e)
_dbnet = False
return None
return boxes is not None and len(boxes) > 0
def _detect_text_mser(image: NDArray[Any]) -> bool:
"""Fallback MSER-based text-presence heuristic (used only if DBNet can't load)."""
import cv2
gray = _to_gray(image)
h, w = gray.shape[:2]
try:
regions, _ = cv2.MSER_create().detectRegions(gray)
except cv2.error:
return False
per_mp = len(regions) / max(1e-6, (h * w) / 1e6)
return per_mp > _TEXT_MSER_PER_MP
def detect_text(image: NDArray[Any]) -> bool:
"""Text-presence: DBNet (cv2.dnn) when the bundled model loads, else the MSER heuristic."""
dbnet = _detect_text_dbnet(image)
return _detect_text_mser(image) if dbnet is None else dbnet
def edge_density(image: NDArray[Any]) -> float:
"""Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1]."""
import cv2
gray = _to_gray(image)
edges = cv2.Canny(gray, 100, 200)
return float((edges > 0).mean())
def plan(image_path: Path) -> AutoConfig | None:
"""Inspect the input image and return the quality modes, or None if unreadable.
Pure analysis: loads the image, runs the cv2 detectors on a downscaled copy, and
applies the quality-priority routing rules. Safe to call wherever the pipeline
runs; no diffusion model is loaded.
"""
from remove_ai_watermarks import image_io
image = image_io.imread(image_path)
if image is None:
return None
h, w = image.shape[:2]
small = _downscale_for_detection(image)
gray = _to_gray(small) # convert once; edge density + the MSER fallback use gray
has_face = detect_face(small) # YuNet needs the 3-channel image
has_text = detect_text(small) # DBNet wants BGR; the MSER fallback grays it internally
edges = edge_density(gray)
structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX
pipeline = "default" if structureless else "controlnet"
smoothing = pipeline == "controlnet"
cfg = AutoConfig(
pipeline=pipeline,
adaptive_polish=smoothing, # adaptive (detail-targeted) polish when a smoothing pass ran
unsharp=0.0,
humanize=0.0,
min_resolution=_UPSCALE_FLOOR,
has_face=has_face,
has_text=has_text,
edge_density=edges,
width=w,
height=h,
)
logger.debug("auto plan for %s: %s", image_path, cfg.reason)
return cfg
+123 -81
View File
@@ -18,7 +18,11 @@ from typing import TYPE_CHECKING, Any, Literal
import click
from remove_ai_watermarks import __version__, watermark_registry
from remove_ai_watermarks.noai.watermark_profiles import resolve_strength, vendor_for_strength
from remove_ai_watermarks.noai.watermark_profiles import (
resolve_strength,
strength_default_help,
vendor_for_strength,
)
if TYPE_CHECKING:
from collections.abc import Generator
@@ -143,8 +147,8 @@ _controlnet_scale_option = click.option(
"--controlnet-scale",
type=float,
default=1.0,
help="ControlNet conditioning scale (structure/text preservation strength), controlnet pipeline "
"only (EXPERIMENTAL).",
help="ControlNet conditioning scale (structure/text preservation strength); "
"applies to the controlnet pipeline (the default). Higher = closer to original structure.",
)
_min_resolution_option = click.option(
@@ -173,48 +177,103 @@ _auto_option = click.option(
"--auto",
is_flag=True,
default=False,
help="Auto-pick the pipeline and adaptive polish from image content. "
"Every choice is overridable -- an explicit --pipeline / --adaptive-polish "
"always wins. EXPERIMENTAL.",
help="DEPRECATED: controlnet is already the default pipeline, so --auto now only "
"enables --adaptive-polish (the content detectors were removed). Use "
"--adaptive-polish instead.",
)
_adaptive_polish_option = click.option(
"--adaptive-polish/--no-adaptive-polish",
default=False,
default=True,
help="Restore the input's detail level after removal (capped unsharp + edge-masked grain "
"targeting the input's sharpness, sparing text). On by default under --auto; pass "
"--no-adaptive-polish to disable it there, or --adaptive-polish to use it without --auto. "
"Independent of the fixed --unsharp/--humanize. EXPERIMENTAL.",
"targeting the input's sharpness, sparing text), countering the over-smoothed look. ON by "
"default; it self-limits where there is no detail deficit (text/flat graphics), so it is a "
"no-op there. Pass --no-adaptive-polish to disable. Independent of --unsharp/--humanize.",
)
# HuggingFace model + CFG knobs, shared by the diffusion commands (invisible/all/batch)
# so the surface stays identical across them.
_model_option = click.option(
"--model",
type=str,
default=None,
help="HuggingFace model ID for the diffusion pipeline. Default: the SDXL base checkpoint.",
)
_guidance_scale_option = click.option(
"--guidance-scale",
type=float,
default=None,
help="Classifier-free guidance scale (CFG). Default: 7.5 (the library default). "
"Lower = follow the prompt less / stay closer to the input.",
)
def _apply_auto(
ctx: click.Context,
source: Path,
pipeline: str,
adaptive_polish: bool,
) -> tuple[str, bool]:
"""Resolve ``--auto``: plan the three content-adaptive modes (pipeline, face
restore, adaptive polish) from the image, overriding only the ones the user left
at their default (an explicit flag always wins). The fixed ``--unsharp``/
``--humanize`` filters are independent and untouched. Prints the chosen plan.
def _normalize_pipeline(ctx: click.Context, param: click.Parameter, value: str | None) -> str | None:
"""Resolve the legacy ``default`` profile name to ``sdxl`` (click option callback).
Emits a one-line deprecation notice when the user explicitly passes the outdated
``default`` value, pointing at the two current choices (``sdxl`` / ``controlnet``).
"""
from remove_ai_watermarks import auto_config
if value is None:
return None
from remove_ai_watermarks.noai.watermark_profiles import normalize_profile
cfg = auto_config.plan(source)
if cfg is None:
console.print(" Auto: could not read image; using defaults")
return pipeline, adaptive_polish
normalized = normalize_profile(value)
if value.strip().lower() == "default":
click.echo(
"Warning: --pipeline default is deprecated and maps to 'sdxl'. "
"Use --pipeline sdxl (plain SDXL) or --pipeline controlnet (the default).",
err=True,
)
return normalized
def _is_default(name: str) -> bool:
return ctx.get_parameter_source(name) == click.core.ParameterSource.DEFAULT
if _is_default("pipeline"):
pipeline = cfg.pipeline
if _is_default("adaptive_polish"):
adaptive_polish = cfg.adaptive_polish
console.print(f" Auto: {cfg.reason}")
return pipeline, adaptive_polish
# ``controlnet`` (the default-SELECTED value) and ``sdxl`` (plain SDXL img2img) are the
# two current profiles; ``default`` is an OUTDATED back-compat alias for ``sdxl``
# (warned + normalized away by _normalize_pipeline).
_PIPELINE_CHOICES = ["sdxl", "controlnet", "default"]
_PIPELINE_HELP = (
"Pipeline profile. controlnet (DEFAULT) = SDXL + canny ControlNet that preserves "
"text/faces via edge conditioning while removing SynthID; sdxl = plain SDXL img2img "
"(lighter, no extra model download, but leaves SynthID on flat-graphic content). "
"('default' is an OUTDATED alias for 'sdxl' -- use sdxl or controlnet.)"
)
# Shared --pipeline / --strength decorators so the three diffusion commands
# (invisible/all/batch) keep an identical surface and the strength help can never
# drift from the watermark_profiles constants (strength_default_help derives it).
_pipeline_option = click.option(
"--pipeline",
type=click.Choice(_PIPELINE_CHOICES),
default="controlnet",
callback=_normalize_pipeline,
help=_PIPELINE_HELP,
)
_strength_option = click.option(
"--strength",
type=float,
default=None,
help=f"Denoising strength (0.0-1.0). Default: {strength_default_help()}.",
)
def _resolve_auto_polish(auto: bool, adaptive_polish: bool) -> bool:
"""Warn on the retired ``--auto`` flag, returning ``adaptive_polish`` unchanged.
``--auto`` used to plan the pipeline + polish from content detection, but the
pipeline is now always controlnet (the default) and the adaptive polish is ON by
default (it self-gates by detail level), so the content detectors were removed and
``--auto`` is now a no-op alias: the polish it used to enable is already the default,
and an explicit ``--no-adaptive-polish`` still wins. So it only emits a deprecation
warning and passes ``adaptive_polish`` through.
"""
if auto:
click.echo(
"Warning: --auto is deprecated and now does nothing (the adaptive polish it "
"enabled is ON by default). Use --no-adaptive-polish to turn the polish off.",
err=True,
)
return adaptive_polish
def _warn_if_esrgan_unavailable(upscaler: str) -> None:
@@ -524,21 +583,9 @@ def cmd_erase(
@click.option(
"-o", "--output", type=click.Path(path_type=Path), default=None, help="Output path (default: <source>_clean.<ext>)."
)
@click.option(
"--strength",
type=float,
default=None,
help="Denoising strength (0.0-1.0). Default: vendor-adaptive (OpenAI 0.10 / Google 0.15 / "
"unknown 0.15, from the C2PA issuer).",
)
@_strength_option
@click.option("--steps", type=int, default=50, help="Number of denoising steps. Default: 50.")
@click.option(
"--pipeline",
type=click.Choice(["default", "controlnet"]),
default="default",
help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves "
"text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).",
)
@_pipeline_option
@click.option(
"--device",
type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]),
@@ -560,6 +607,8 @@ def cmd_erase(
@_min_resolution_option
@_unsharp_option
@_upscaler_option
@_model_option
@_guidance_scale_option
@_auto_option
@_adaptive_polish_option
@click.pass_context
@@ -579,6 +628,8 @@ def cmd_invisible(
min_resolution: int,
controlnet_scale: float,
upscaler: str,
model: str | None,
guidance_scale: float | None,
auto: bool,
adaptive_polish: bool,
) -> None:
@@ -599,8 +650,7 @@ def cmd_invisible(
source = _validate_image(source)
_warn_if_esrgan_unavailable(upscaler)
if auto:
pipeline, adaptive_polish = _apply_auto(ctx, source, pipeline, adaptive_polish)
adaptive_polish = _resolve_auto_polish(auto, adaptive_polish)
if output is None:
output = source.with_stem(source.stem + "_clean")
@@ -610,6 +660,7 @@ def cmd_invisible(
console.print(f" {msg}")
engine = InvisibleEngine(
model_id=model,
device=device_str,
pipeline=pipeline,
hf_token=hf_token,
@@ -630,7 +681,7 @@ def cmd_invisible(
output_path=output,
strength=strength,
num_inference_steps=steps,
guidance_scale=None,
guidance_scale=guidance_scale,
seed=seed,
humanize=humanize,
unsharp=unsharp,
@@ -781,21 +832,10 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
@click.option(
"--inpaint-method", type=click.Choice(["ns", "telea", "gaussian"]), default="ns", help="Inpainting method."
)
@click.option(
"--strength",
type=float,
default=None,
help="Invisible watermark denoising strength. Default: vendor-adaptive (OpenAI 0.10 / Google 0.15 / unknown 0.15).",
)
@_strength_option
@click.option("--steps", type=int, default=50, help="Number of denoising steps for invisible removal.")
@click.option(
"--pipeline",
type=click.Choice(["default", "controlnet"]),
default="default",
help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves "
"text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).",
)
@click.option("--model", type=str, default=None, help="HuggingFace model ID for invisible removal.")
@_pipeline_option
@_model_option
@click.option(
"--device",
type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]),
@@ -817,6 +857,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
@_min_resolution_option
@_unsharp_option
@_upscaler_option
@_guidance_scale_option
@_auto_option
@_adaptive_polish_option
@click.pass_context
@@ -839,6 +880,7 @@ def cmd_all(
min_resolution: int,
controlnet_scale: float,
upscaler: str,
guidance_scale: float | None,
auto: bool,
adaptive_polish: bool,
) -> None:
@@ -856,8 +898,7 @@ def cmd_all(
_banner()
source = _validate_image(source)
_warn_if_esrgan_unavailable(upscaler)
if auto:
pipeline, adaptive_polish = _apply_auto(ctx, source, pipeline, adaptive_polish)
adaptive_polish = _resolve_auto_polish(auto, adaptive_polish)
if output is None:
output = source.with_stem(source.stem + "_clean")
@@ -937,6 +978,7 @@ def cmd_all(
output_path=tmp_path,
strength=strength,
num_inference_steps=steps,
guidance_scale=guidance_scale,
seed=seed,
humanize=humanize,
unsharp=unsharp,
@@ -1001,7 +1043,8 @@ def _process_batch_image(
min_resolution: int = 1024,
controlnet_scale: float = 1.0,
upscaler: str = "lanczos",
auto: bool = False,
model: str | None = None,
guidance_scale: float | None = None,
adaptive_polish: bool = False,
) -> None:
"""Process a single image for batch mode.
@@ -1048,14 +1091,12 @@ def _process_batch_image(
if invisible_available():
from remove_ai_watermarks.invisible_engine import InvisibleEngine
# --auto re-plans the pipeline / face-restore / polish per image; only the
# pipeline choice changes the engine ctor, so cache one engine per pipeline
# (controlnet vs default) rather than a single shared instance.
if auto:
pipeline, adaptive_polish = _apply_auto(ctx, img_path, pipeline, adaptive_polish)
# Cache the engine in ctx.obj so the batch builds it once (pipeline is a
# single CLI value, constant across the run).
engines = ctx.obj.setdefault("_inv_engines", {})
if pipeline not in engines:
engines[pipeline] = InvisibleEngine(
model_id=model,
device=None if device == "auto" else device,
pipeline=pipeline,
hf_token=hf_token,
@@ -1067,6 +1108,7 @@ def _process_batch_image(
out_path,
strength=strength,
num_inference_steps=steps,
guidance_scale=guidance_scale,
seed=seed,
humanize=humanize,
unsharp=unsharp,
@@ -1104,19 +1146,13 @@ def _process_batch_image(
@click.option(
"--mode", type=click.Choice(["visible", "invisible", "metadata", "all"]), default="visible", help="Processing mode."
)
@click.option("--strength", type=float, default=None, help="Denoising strength (invisible mode).")
@_strength_option
@click.option("--steps", type=int, default=50, help="Number of denoising steps (invisible mode).")
@click.option("--inpaint/--no-inpaint", default=True, help="Apply inpainting (visible mode).")
@click.option(
"--humanize", type=float, default=0.0, help="Analog Humanizer film grain intensity (0 = off, typical: 2.0-6.0)."
)
@click.option(
"--pipeline",
type=click.Choice(["default", "controlnet"]),
default="default",
help="Pipeline profile (default=SDXL img2img; controlnet=SDXL + canny ControlNet that preserves "
"text/faces via edge conditioning while removing SynthID, EXPERIMENTAL).",
)
@_pipeline_option
@click.option(
"--device",
type=click.Choice(["auto", "cpu", "mps", "cuda", "xpu"]),
@@ -1135,6 +1171,8 @@ def _process_batch_image(
@_unsharp_option
@_upscaler_option
@_controlnet_scale_option
@_model_option
@_guidance_scale_option
@_auto_option
@_adaptive_polish_option
@click.pass_context
@@ -1156,6 +1194,8 @@ def cmd_batch(
min_resolution: int,
controlnet_scale: float,
upscaler: str,
model: str | None,
guidance_scale: float | None,
auto: bool,
adaptive_polish: bool,
) -> None:
@@ -1177,6 +1217,7 @@ def cmd_batch(
console.print(f" Mode: {mode}")
if mode in ("invisible", "all"):
_warn_if_esrgan_unavailable(upscaler)
adaptive_polish = _resolve_auto_polish(auto, adaptive_polish)
processed = 0
errors = 0
@@ -1214,7 +1255,8 @@ def cmd_batch(
min_resolution=min_resolution,
controlnet_scale=controlnet_scale,
upscaler=upscaler,
auto=auto,
model=model,
guidance_scale=guidance_scale,
adaptive_polish=adaptive_polish,
)
processed += 1
+12 -12
View File
@@ -89,7 +89,7 @@ class InvisibleEngine:
self,
model_id: str | None = None,
device: str | None = None,
pipeline: str = "default",
pipeline: str = "controlnet",
hf_token: str | None = None,
progress_callback: Callable[[str], None] | None = None,
controlnet_conditioning_scale: float = 1.0,
@@ -99,9 +99,10 @@ class InvisibleEngine:
Args:
model_id: HuggingFace model ID. None = use the SDXL base default.
device: Device for inference (auto/cpu/mps/cuda/xpu). None = auto.
pipeline: Pipeline profile. "default" (plain SDXL img2img) or
"controlnet" (SDXL + canny ControlNet that preserves text/face
structure via edge conditioning while removing SynthID).
pipeline: Pipeline profile. "controlnet" (DEFAULT; SDXL + canny ControlNet
that preserves text/face structure via edge conditioning while removing
SynthID) or "sdxl" (plain SDXL img2img, lighter but leaves SynthID on
flat-graphic content). "default" is a back-compat alias for "sdxl".
hf_token: HuggingFace API token.
progress_callback: Optional callback for progress messages.
controlnet_conditioning_scale: ControlNet structure-preservation
@@ -182,12 +183,11 @@ class InvisibleEngine:
unsharp: Final unsharp-mask sharpening strength (0 = off, default).
Applied last to counter the soft / over-smoothed look of the
diffusion pass; ~0.5-0.8 is a safe range, higher risks edge halos.
adaptive_polish: When True (the --auto mode default), restore the input's
detail level in the softened output instead of fixed unsharp/humanize:
a capped unsharp + edge-masked grain targeting the input's Laplacian
variance (self-limiting on text/graphics). Runs LAST, after face
restoration. The fixed ``humanize``/``unsharp`` knobs are normally 0
when this is on.
adaptive_polish: When True (the CLI default), restore the input's detail
level in the softened output: a capped unsharp + edge-masked grain
targeting the input's Laplacian variance. Self-limiting -- a no-op when
the output already meets the input's detail level (text/flat graphics),
so it only acts on over-smoothed photo/face texture. Runs LAST.
max_resolution: Cap the long side (px) before diffusion. 0 (default)
= no cap. Set a positive value only to bound GPU/MPS memory on
very large inputs (it reintroduces a lossy downscale->upscale
@@ -316,8 +316,8 @@ class InvisibleEngine:
self._progress_callback(f"Sharpening (unsharp mask: {unsharp})...")
image_io.imwrite(out_path, unsharp_mask(out_cv, amount=unsharp))
# Adaptive polish (--auto): restore the input's detail level in the softened
# output, sparing text/edges. Replaces the fixed unsharp/humanize knobs.
# Adaptive polish (CLI default): restore the input's detail level in the
# softened output, sparing text/edges. Self-limiting where there is no deficit.
if adaptive_polish:
import cv2
import numpy as np
@@ -12,34 +12,56 @@ if TYPE_CHECKING:
DEFAULT_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
# Canonical pipeline-profile names + the back-compat alias. The plain SDXL img2img
# profile is ``sdxl``; ``default`` is kept as an accepted alias (it was the profile's
# name before ``controlnet`` became the default-selected pipeline, 2026-06-09).
SDXL_PROFILE = "sdxl"
CONTROLNET_PROFILE = "controlnet"
_PROFILE_ALIASES = {"default": SDXL_PROFILE}
def normalize_profile(profile: str) -> str:
"""Canonicalize a pipeline-profile name, resolving the ``default`` -> ``sdxl`` alias."""
normalized = profile.strip().lower()
return _PROFILE_ALIASES.get(normalized, normalized)
# The SDXL-native canny ControlNet used by the ``controlnet`` pipeline. The
# ControlNet is an add-on to the SDXL base checkpoint (DEFAULT_MODEL_ID), not a
# separate base model, so both the ``default`` and ``controlnet`` profiles load
# the same base weights and share the same vendor-adaptive strength.
# separate base model, so both the ``sdxl`` and ``controlnet`` profiles load the
# same base weights and share the same vendor-adaptive strength ladder (see below).
CONTROLNET_CANNY_MODEL = "xinsir/controlnet-canny-sdxl-1.0"
# Vendor-adaptive default denoising strength for the SDXL img2img scrub, overridable
# from the CLI (`--strength`). The right strength depends on which vendor's SynthID is
# present, detected from the C2PA issuer (metadata.synthid_source). Oracle-verified
# controlled study (2026-06-01, clean v0.8.6, per-image openai.com/verify or Gemini-app
# verdict; see docs/synthid.md section 2.2):
# - OpenAI gpt-image: removed at 0.05 across 1024-1600 (n=4), resolution-independent.
# OPENAI_STRENGTH 0.10 = the 0.05 floor plus a 2x margin (keeps quality high).
# - Google Gemini: removed at 0.15 on the capped-1536 path (n=4); 0.05/0.10 do NOT
# clear. GEMINI_STRENGTH 0.15. CAVEAT: 0.15 was validated only on
# `--max-resolution 1536`; native 2816 (the default path) was not locally
# measurable (OOM on Apple Silicon) and may need more -- pending GPU validation on
# the raiw.cc backend. If a native large Gemini still verifies positive at 0.15,
# raise `--strength`.
# - Unknown vendor (metadata stripped, or non-OpenAI/Google C2PA): UNKNOWN_STRENGTH
# 0.15, the safe middle that clears both vendors at the tested resolutions.
# The dominant factor is VENDOR, not resolution: Google's SynthID is ~3x more robust
# than OpenAI's. The ``controlnet`` pipeline shares these strengths (same SDXL base; the
# canny ControlNet only preserves structure, the strength still drives removal).
OPENAI_STRENGTH = 0.10
GEMINI_STRENGTH = 0.15
UNKNOWN_STRENGTH = 0.15
# Backwards-compatible alias: the vendor-unknown default (what a caller gets without a
# present (detected from the C2PA issuer, metadata.synthid_source). The SAME ladder
# applies to BOTH pipelines (`sdxl` plain img2img and `controlnet`) -- see "why one
# ladder" below.
#
# Data basis (see docs/synthid.md sections 2.2 / 5.5): the values are the ORACLE-
# CERTIFIED controlnet floors (2026-06-04, isolated Modal cert app, each vendor on its
# own verifier): OpenAI 0.20 (2 photoreal x 3 seeds = 6/6 clean, resolution-independent),
# Google 0.30 (clean on 2/2 seeds, validated ONLY at <= 1536 -- Gemini is resolution-
# sensitive, native ~2816 likely needs ~0.35+). Unknown vendor gets the Google (more
# robust watermark) value: safe-by-default.
#
# Why ONE ladder for both pipelines (2026-06-09): the certification was run on
# controlnet, and it does NOT transfer to `sdxl` by symmetry -- the two pipelines have
# OPPOSITE hard cases (controlnet leaves SynthID on photoreal, `sdxl` leaves it on flat
# graphics; the content-x-pipeline table in docs/synthid.md §5.1). BUT on its OWN hard
# case (flat fills) `sdxl` is the WEAKER remover -- plain img2img at low strength barely
# perturbs a flat region -- so it needs AT LEAST as much strength as controlnet, not
# less. Hence the certified controlnet floor is the right floor for `sdxl` too. The
# higher strength costs little quality where it matters: `controlnet` is now the default
# pipeline, so `sdxl` is reached only for structure-less inputs (via `--auto`) or an
# explicit `--pipeline sdxl`, where over-regeneration has no faces/text to damage. NOTE:
# this is a MARGIN argument for `sdxl`, not a fresh certification -- there is no local
# SynthID detector, so if an oracle still reads SynthID on a flat `sdxl` output, raise
# `--strength`.
OPENAI_STRENGTH = 0.20
GEMINI_STRENGTH = 0.30
UNKNOWN_STRENGTH = 0.30
# Backwards-compatible alias: the vendor-unknown value (what a caller gets without a
# detected vendor). Kept as DEFAULT_STRENGTH for existing references.
DEFAULT_STRENGTH = UNKNOWN_STRENGTH
@@ -47,17 +69,29 @@ DEFAULT_STRENGTH = UNKNOWN_STRENGTH
_VENDOR_STRENGTH = {"openai": OPENAI_STRENGTH, "google": GEMINI_STRENGTH}
def strength_default_help() -> str:
"""One-line description of the vendor-adaptive default, derived from the constants.
Single source of truth for the CLI ``--strength`` help so the numbers can never
drift from the actual ladder (they did once when the per-pipeline split was unified).
"""
return (
f"vendor-adaptive (OpenAI {OPENAI_STRENGTH} / Google {GEMINI_STRENGTH} / "
f"unknown {UNKNOWN_STRENGTH}, from the C2PA issuer; same ladder for both pipelines)"
)
def resolve_strength(strength: float | None, vendor: str | None = None) -> float:
"""Resolve the denoising strength, applying the vendor default when unset.
``None`` means "the user did not pass ``--strength``", which resolves
**vendor-adaptively**: ``vendor`` (``"openai"`` / ``"google"`` / None, from
``vendor_for_strength``) selects ``OPENAI_STRENGTH`` / ``GEMINI_STRENGTH`` /
``UNKNOWN_STRENGTH``. An explicit value always wins (including ``0.0`` -- the check
is ``is None``, not falsiness). The ``default`` and ``controlnet`` profiles share
the same SDXL base (the ControlNet only preserves structure), so the default does
NOT depend on the profile. Shared by the CLI (for display) and the engine (for
execution) so the two never disagree -- both must pass the SAME ``vendor``.
``UNKNOWN_STRENGTH``. The same ladder applies to both pipelines (see the module
comment for why one ladder is correct). An explicit value always wins (including
``0.0`` -- the check is ``is None``, not falsiness). Shared by the CLI (for display)
and the engine (for execution) so the two never disagree -- both must pass the SAME
``vendor``.
"""
if strength is not None:
return strength
@@ -90,11 +124,11 @@ def vendor_for_strength(image_path: Path) -> Literal["openai", "google"] | None:
def get_model_id_for_profile(profile: str) -> str:
"""Map CLI model profile names to concrete Hugging Face model IDs.
Both ``default`` and ``controlnet`` use the SDXL base checkpoint -- the canny
Both ``sdxl`` and ``controlnet`` use the SDXL base checkpoint -- the canny
ControlNet (``CONTROLNET_CANNY_MODEL``) is an add-on loaded on top of it, not a
separate base model.
separate base model. The legacy ``default`` alias resolves to ``sdxl``.
"""
normalized = profile.strip().lower()
if normalized in ("default", "controlnet"):
normalized = normalize_profile(profile)
if normalized in (SDXL_PROFILE, CONTROLNET_PROFILE):
return DEFAULT_MODEL_ID
raise ValueError(f"Unknown model profile '{profile}'. Use one of: default, controlnet.")
raise ValueError(f"Unknown model profile '{profile}'. Use one of: sdxl, controlnet.")
@@ -1,13 +1,17 @@
"""Watermark removal using diffusion model regeneration attack.
Two pipelines:
1. ``default`` -- plain SDXL img2img. Partial-noise regeneration scrubs the
invisible watermark; ``strength`` controls how much is regenerated.
2. ``controlnet`` -- SDXL img2img with a canny ControlNet. The watermark REMOVAL
still comes from the img2img regeneration (``strength``); the ControlNet only
PRESERVES structure (text/faces) by conditioning on the edge map. No original
pixels are ever copied or frozen, so SynthID does not survive.
1. ``controlnet`` (DEFAULT) -- SDXL img2img with a canny ControlNet. The watermark
REMOVAL still comes from the img2img regeneration (``strength``); the ControlNet
only PRESERVES structure (text/faces) by conditioning on the edge map. No original
pixels are ever copied or frozen. Because the edge map keeps the regeneration
closer to the original, it needs a higher ``strength`` floor than ``default`` to
destroy SynthID (the certified controlnet ladder; see ``watermark_profiles``).
``controlnet_conditioning_scale`` is the preservation knob.
2. ``default`` -- plain SDXL img2img. Partial-noise regeneration scrubs the
invisible watermark; ``strength`` controls how much is regenerated. Lighter (no
ControlNet weights), but at the low default strength it leaves SynthID on
flat-graphic content -- use it for inputs without text/faces.
"""
# torch/diffusers/cv2 boundary: these libs ship no usable types for the tensor and
@@ -32,6 +36,7 @@ from remove_ai_watermarks.noai.watermark_profiles import (
CONTROLNET_CANNY_MODEL,
DEFAULT_MODEL_ID,
DEFAULT_STRENGTH,
normalize_profile,
resolve_strength,
)
@@ -323,13 +328,14 @@ class WatermarkRemover:
torch_dtype: Any = None,
progress_callback: Callable[[str], None] | None = None,
hf_token: str | None = None,
pipeline: str = "default",
pipeline: str = "controlnet",
controlnet_conditioning_scale: float = 1.0,
) -> None:
self.model_id = model_id or self.DEFAULT_MODEL_ID
# The pipeline profile is threaded explicitly (not inferred from model_id):
# both "default" and "controlnet" use the same SDXL base checkpoint.
self.model_profile = pipeline
# both "sdxl" and "controlnet" use the same SDXL base checkpoint. Normalize so
# the legacy "default" alias resolves to "sdxl".
self.model_profile = normalize_profile(pipeline)
self.controlnet_conditioning_scale = controlnet_conditioning_scale
if not is_watermark_removal_available():
-117
View File
@@ -1,117 +0,0 @@
"""Tests for the --auto pipeline planner (content-adaptive mode selection).
Detection runs on synthetic images; the face-present routing is exercised by
monkeypatching ``detect_face`` (a real detectable face fixture is private, never
committed). The planner is cv2-only and torch-free.
"""
from __future__ import annotations
import cv2
import numpy as np
from remove_ai_watermarks import auto_config, image_io
def _write(img, tmp_path, name="x.png"):
p = tmp_path / name
image_io.imwrite(p, img)
return p
class TestDetectors:
def test_detect_face_false_on_flat(self):
flat = np.full((200, 200, 3), 128, dtype=np.uint8)
assert auto_config.detect_face(flat) is False
def test_edge_density_flat_near_zero(self):
flat = np.full((200, 200, 3), 128, dtype=np.uint8)
assert auto_config.edge_density(flat) < 0.001
def test_edge_density_text_higher_than_blank(self):
blank = np.full((200, 400, 3), 255, dtype=np.uint8)
text = blank.copy()
cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3)
assert auto_config.edge_density(text) > auto_config.edge_density(blank)
def test_dbnet_detects_text_card(self):
"""The bundled PP-OCRv3 DBNet model fires on a clear text card and not on flat."""
card = np.full((300, 500, 3), 255, dtype=np.uint8)
cv2.putText(card, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
assert auto_config._detect_text_dbnet(card) is True
assert auto_config._detect_text_dbnet(np.full((300, 500, 3), 128, dtype=np.uint8)) is False
def test_detect_text_falls_back_to_mser_when_dbnet_unavailable(self, monkeypatch):
"""If DBNet can't load (returns None), detect_text uses the MSER heuristic."""
monkeypatch.setattr(auto_config, "_detect_text_dbnet", lambda _img: None)
called = {}
def _fake_mser(_img):
called["mser"] = True
return True
monkeypatch.setattr(auto_config, "_detect_text_mser", _fake_mser)
assert auto_config.detect_text(np.full((100, 100, 3), 128, dtype=np.uint8)) is True
assert called.get("mser") is True
class TestPlan:
def test_unreadable_returns_none(self, tmp_path):
assert auto_config.plan(tmp_path / "does_not_exist.png") is None
def test_flat_image_is_default_pipeline_no_polish(self, tmp_path):
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
cfg = auto_config.plan(_write(flat, tmp_path))
assert cfg is not None
assert cfg.pipeline == "default" # structure-less -> plain SDXL
assert cfg.adaptive_polish is False # no smoothing pass -> no polish
assert cfg.unsharp == 0.0
assert cfg.humanize == 0.0
assert cfg.min_resolution == 1024
def test_text_image_uses_controlnet(self, tmp_path):
img = np.full((300, 500, 3), 255, dtype=np.uint8)
cv2.putText(img, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
cfg = auto_config.plan(_write(img, tmp_path))
assert cfg is not None
# Text creates edges above the structure-less floor -> controlnet preserves them.
assert cfg.pipeline == "controlnet"
def test_face_routes_to_controlnet_and_polish(self, tmp_path, monkeypatch):
monkeypatch.setattr(auto_config, "detect_face", lambda _img: True)
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
cfg = auto_config.plan(_write(flat, tmp_path))
assert cfg is not None
assert cfg.has_face
assert cfg.pipeline == "controlnet"
assert cfg.adaptive_polish # smoothing pass ran -> adaptive polish on
assert cfg.unsharp == 0.0 # fixed knobs off; the adaptive polish replaces them
assert cfg.humanize == 0.0
def test_text_signal_forces_controlnet_on_flat(self, tmp_path, monkeypatch):
monkeypatch.setattr(auto_config, "detect_text", lambda _img: True)
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
cfg = auto_config.plan(_write(flat, tmp_path))
assert cfg is not None
assert cfg.has_text
assert cfg.pipeline == "controlnet"
class TestReason:
def test_reason_summarizes_plan(self):
cfg = auto_config.AutoConfig(
pipeline="controlnet",
adaptive_polish=True,
unsharp=0.0,
humanize=0.0,
min_resolution=1024,
has_face=True,
has_text=False,
edge_density=0.05,
width=800,
height=600,
)
r = cfg.reason
assert "controlnet" in r
assert "face" in r
assert "adaptive polish" in r
+71 -20
View File
@@ -277,6 +277,72 @@ class TestInvisibleCommand:
expected = sample_png.with_stem(sample_png.stem + "_clean")
assert expected.exists()
def test_invisible_adaptive_polish_on_by_default(self, runner, sample_png):
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png)])
assert result.exit_code == 0, result.output
# adaptive_polish is ON by default (self-gating, so a no-op where not needed).
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
# Default model is None (the SDXL base) and CFG is None (the library's 7.5).
assert mock_cls.call_args.kwargs["model_id"] is None
assert mock_engine.remove_watermark.call_args.kwargs["guidance_scale"] is None
def test_invisible_no_adaptive_polish_disables(self, runner, sample_png):
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish"])
assert result.exit_code == 0, result.output
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is False
def test_invisible_model_and_guidance_scale_flow_to_engine(self, runner, sample_png):
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(
main,
["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5"],
)
assert result.exit_code == 0, result.output
assert mock_cls.call_args.kwargs["model_id"] == "org/custom-sdxl"
assert mock_engine.remove_watermark.call_args.kwargs["guidance_scale"] == 5.5
def test_pipeline_default_alias_warns_and_maps_to_sdxl(self, runner, sample_png):
mock_cls, _mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default"])
assert result.exit_code == 0, result.output
# The legacy value warns and is normalized to "sdxl" before the engine is built.
assert "deprecated" in result.output.lower()
assert mock_cls.call_args.kwargs["pipeline"] == "sdxl"
def test_pipeline_sdxl_does_not_warn(self, runner, sample_png):
mock_cls, _mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl"])
assert result.exit_code == 0, result.output
assert "deprecated" not in result.output.lower()
assert mock_cls.call_args.kwargs["pipeline"] == "sdxl"
def test_invisible_nonexistent_file(self, runner):
result = runner.invoke(main, ["invisible", "/nonexistent/file.png"])
assert result.exit_code != 0
@@ -514,32 +580,17 @@ class TestBatchCommand:
assert out[0, 0, 3] == 0
assert out[100, 100, 3] == 255
def test_batch_auto_plans_pipeline_per_image(self, runner, tmp_path):
"""--auto in batch re-plans the pipeline/restore/polish per image and
builds one engine per resolved pipeline."""
from remove_ai_watermarks import auto_config
def test_batch_auto_is_deprecated_and_enables_polish(self, runner, tmp_path):
"""--auto is retired: it warns and just enables the adaptive polish (the
pipeline is always the default controlnet now)."""
input_dir = _make_batch_dir(tmp_path, count=2)
output_dir = tmp_path / "output"
plan = auto_config.AutoConfig(
pipeline="controlnet",
adaptive_polish=True,
unsharp=0.0,
humanize=0.0,
min_resolution=1024,
has_face=True,
has_text=False,
edge_density=0.05,
width=200,
height=200,
)
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.auto_config.plan", return_value=plan),
):
result = runner.invoke(
main,
@@ -547,9 +598,9 @@ class TestBatchCommand:
)
assert result.exit_code == 0, result.output
assert "2 processed" in result.output
# Engine built with the auto-resolved controlnet pipeline.
assert "deprecated" in result.output.lower()
# Pipeline stays the default controlnet; --auto only turned the polish on.
assert mock_cls.call_args.kwargs["pipeline"] == "controlnet"
# The auto plan's adaptive polish reached the engine call.
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
def test_batch_default_output_dir(self, runner, tmp_path):
+27 -4
View File
@@ -21,7 +21,9 @@ from remove_ai_watermarks.noai.watermark_profiles import (
OPENAI_STRENGTH,
UNKNOWN_STRENGTH,
get_model_id_for_profile,
normalize_profile,
resolve_strength,
strength_default_help,
)
from remove_ai_watermarks.noai.watermark_remover import get_device, is_watermark_removal_available
@@ -111,8 +113,14 @@ class TestMpsErrorDetection:
class TestModelProfiles:
"""Tests for watermark_profiles.py."""
def test_default_profile(self):
def test_sdxl_profile(self):
assert get_model_id_for_profile("sdxl") == "stabilityai/stable-diffusion-xl-base-1.0"
def test_default_alias_resolves_to_sdxl(self):
# "default" is the legacy alias for "sdxl" (back-compat for existing scripts).
assert get_model_id_for_profile("default") == "stabilityai/stable-diffusion-xl-base-1.0"
assert normalize_profile("default") == "sdxl"
assert normalize_profile("controlnet") == "controlnet"
def test_controlnet_profile(self):
# controlnet shares the SDXL base checkpoint (the ControlNet is an add-on).
@@ -127,9 +135,9 @@ class TestResolveStrength:
"""resolve_strength applies the vendor default only when strength is unset."""
def test_none_is_vendor_adaptive(self):
# No vendor -> unknown default; OpenAI lower, Google == unknown. The default
# is vendor-adaptive and does NOT depend on the pipeline profile (default and
# controlnet share the same SDXL base).
# No vendor -> unknown default; OpenAI lower, Google == unknown. The SAME ladder
# applies to both pipelines (the certified controlnet floors), so there is no
# pipeline argument.
assert resolve_strength(None) == UNKNOWN_STRENGTH
assert resolve_strength(None, "openai") == OPENAI_STRENGTH
assert resolve_strength(None, "google") == GEMINI_STRENGTH
@@ -137,10 +145,25 @@ class TestResolveStrength:
# An unrecognized vendor string falls through to the unknown default.
assert resolve_strength(None, "adobe") == UNKNOWN_STRENGTH
def test_ladder_is_the_certified_controlnet_floors(self):
# The unified ladder == the oracle-certified controlnet floors (OpenAI 0.20,
# Google/unknown 0.30); Google is the more-robust watermark, so it is higher.
assert OPENAI_STRENGTH == 0.20
assert GEMINI_STRENGTH == 0.30
assert UNKNOWN_STRENGTH == 0.30
assert OPENAI_STRENGTH < GEMINI_STRENGTH
def test_default_strength_alias_is_unknown_vendor_value(self):
assert DEFAULT_STRENGTH == UNKNOWN_STRENGTH
assert OPENAI_STRENGTH < UNKNOWN_STRENGTH
def test_strength_default_help_derives_from_constants(self):
# The CLI --strength help is built from this, so it can never drift from the ladder.
h = strength_default_help()
assert str(OPENAI_STRENGTH) in h
assert str(GEMINI_STRENGTH) in h
assert str(UNKNOWN_STRENGTH) in h
def test_explicit_value_overrides_vendor(self):
assert resolve_strength(0.3) == 0.3
assert resolve_strength(0.3, "openai") == 0.3