mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-10 04:43:54 +02:00
feat(auto): DBNet text detector, Real-ESRGAN upscaler, batch --auto
Three content-quality features for the invisible/all/batch pipeline.
DBNet text detector (auto_config): replace the MSER text heuristic with
PP-OCRv3 differentiable-binarization via cv2.dnn.TextDetectionModel_DB,
using a bundled 2.4 MB Apache-2.0 model (en/cn detection nets are
byte-identical, so it ships language-neutral). cv2.dnn is core OpenCV, so
no new pip dep. MSER stays as the fallback when the model can't load.
Validated on real images: matches MSER everywhere and additionally catches
the Doubao CJK mark MSER missed; routing decisions unchanged otherwise.
Real-ESRGAN upscaler (new upscaler.py, esrgan extra): optional
pre-diffusion super-resolution for the min-resolution floor upscale, loaded
via spandrel (MIT, no basicsr) with BSD-3-Clause weights downloaded on
first use. New --upscaler {lanczos,esrgan} on invisible/all/batch; default
stays lanczos and the engine falls back to lanczos when the extra is absent
or the model errors (never breaks removal). It is a manual opt-in knob (the
auto plan never selects it) -- as a generic GAN it sharpens photo/texture
content strongly but can degrade faces (the diffusion pass regenerates
them) and thin text, documented accordingly.
batch --auto: wire the content-adaptive --auto (+ --adaptive-polish) into
cmd_batch. The plan is recomputed per image and the invisible engine is
cached per resolved pipeline (default/controlnet), so a mixed directory
builds at most one engine of each kind. Verified end-to-end: 3 mixed
images routed correctly with only 2 pipeline loads (controlnet reused).
ruff + strict pyright(src/) clean; 558 tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -113,7 +113,7 @@ image → encode to latent space (VAE) at native resolution
|
||||
→ decode back to pixels (VAE)
|
||||
```
|
||||
|
||||
- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`.
|
||||
- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. The floor upscale uses Lanczos by default; `--upscaler esrgan` (the `esrgan` extra) runs Real-ESRGAN first for sharper detail and falls back to Lanczos if the extra is absent. ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it is best for photo/texture content -- it can degrade faces (the diffusion pass regenerates them, so the final recovers) and thin text; keep Lanczos for text-heavy inputs.
|
||||
|
||||
> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength that clears it with the least quality loss: **OpenAI gpt-image → `0.10`**, **Google Gemini → `0.15`**, **unknown source → `0.15`**. An oracle-verified June 2026 study (clean pipeline, per-image openai.com/verify or Gemini app) found OpenAI's watermark clears at `0.05` across `1024`-`1600` px (resolution-independent) while Google's is ~3x more robust and needs `0.15`. The dominant factor is the vendor, not resolution. There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine text, lower it. (Caveat: Google's `0.15` was validated on the capped `--max-resolution 1536` path; a very large native Gemini image may need more.)
|
||||
>
|
||||
@@ -213,6 +213,14 @@ After installation the `remove-ai-watermarks` command is available system-wide.
|
||||
> ```bash
|
||||
> pip install -e ".[restore]" # or: uv pip install -e ".[restore]"
|
||||
> ```
|
||||
>
|
||||
> For sharper upscaling of small inputs before diffusion (`--upscaler esrgan`,
|
||||
> Real-ESRGAN), install the `esrgan` extra. It loads via spandrel (MIT, no basicsr);
|
||||
> the Real-ESRGAN weights (BSD-3-Clause) download on first use:
|
||||
>
|
||||
> ```bash
|
||||
> pip install -e ".[esrgan]" # or: uv pip install -e ".[esrgan]"
|
||||
> ```
|
||||
|
||||
#### Invisible watermark removal
|
||||
|
||||
@@ -280,7 +288,8 @@ remove-ai-watermarks erase image.png --region 1640,1930,400,100 -o clean.png
|
||||
remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0.5
|
||||
# --humanize adds film grain, --unsharp counters the soft "AI" look (both opt-in).
|
||||
# Large images run at native resolution; small ones are upscaled to a 1024 floor
|
||||
# first (disable with --min-resolution 0). On a very large image that OOMs the
|
||||
# first (disable with --min-resolution 0); --upscaler esrgan uses Real-ESRGAN for
|
||||
# that floor upscale (needs the 'esrgan' extra). On a very large image that OOMs the
|
||||
# GPU/MPS, cap the long side: --max-resolution 2048
|
||||
# Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override
|
||||
# with --strength. To preserve text/face structure, use --pipeline controlnet
|
||||
@@ -301,6 +310,10 @@ remove-ai-watermarks metadata image.png --remove
|
||||
|
||||
# Batch with a specific mode
|
||||
remove-ai-watermarks batch ./images/ --mode visible
|
||||
|
||||
# Batch also accepts --auto (and --adaptive-polish): the plan is recomputed per
|
||||
# image, so a mixed directory routes each file to the right pipeline
|
||||
remove-ai-watermarks batch ./images/ --mode all --auto
|
||||
```
|
||||
|
||||
### Python API
|
||||
|
||||
@@ -92,6 +92,19 @@ restore = [
|
||||
"scipy<1.18",
|
||||
"numba<0.60",
|
||||
]
|
||||
# Optional pre-diffusion super-resolution for small inputs (Real-ESRGAN). Loaded via
|
||||
# spandrel (MIT) -- a pure model-loader with NO basicsr dependency (it pulls only
|
||||
# torch / torchvision / safetensors / numpy / einops), which sidesteps the
|
||||
# basicsr / torchvision.functional_tensor breakage that the `restore` extra fights.
|
||||
# The Real-ESRGAN weights (BSD-3-Clause) download on first use and are cached; they
|
||||
# are never bundled. CPU works but is slow on large inputs -- it is meant for the
|
||||
# pre-diffusion upscale of SMALL inputs (and the GPU worker). Guarded by
|
||||
# upscaler.is_available(); the default upscaler stays Lanczos (cv2, no deps). The
|
||||
# weights are fetched with torch.hub (bundled with spandrel's torch), so no extra
|
||||
# download dependency is needed.
|
||||
esrgan = [
|
||||
"spandrel>=0.3.0",
|
||||
]
|
||||
dev = [
|
||||
"pytest>=8.0.0",
|
||||
"pytest-cov>=4.1.0",
|
||||
|
||||
Binary file not shown.
@@ -17,14 +17,15 @@ text/graphics (already high-frequency, so almost no polish) and spares text/edge
|
||||
masking the grain.
|
||||
|
||||
Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for
|
||||
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- plus a Canny
|
||||
edge-density + MSER region heuristic for text/structure. The whole planner peaks
|
||||
~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs anywhere
|
||||
the pipeline runs.
|
||||
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- DBNet (PP-OCRv3
|
||||
differentiable-binarization via ``cv2.dnn.TextDetectionModel_DB``, a 2.4 MB Apache-2.0
|
||||
model bundled in ``assets/``) for text, and a Canny ``edge_density``. The whole planner
|
||||
peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs
|
||||
anywhere the pipeline runs.
|
||||
|
||||
The text heuristic is a deliberately rough Phase-1 placeholder (DBNet via cv2.dnn is
|
||||
the planned precision upgrade); it only ever ADDS controlnet, so a miss is backstopped
|
||||
by the edge-density route and a false positive only costs a controlnet run.
|
||||
The text detector falls back to the old MSER region heuristic if the DBNet model can't
|
||||
load. Either way text only ever ADDS controlnet, so a miss is backstopped by the
|
||||
edge-density route and a false positive only costs a controlnet run.
|
||||
"""
|
||||
|
||||
# cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules
|
||||
@@ -47,15 +48,29 @@ logger = logging.getLogger(__name__)
|
||||
# preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat
|
||||
# gradient/solid image falls under 0.008.
|
||||
_STRUCTURELESS_EDGE_MAX = 0.008
|
||||
# MSER regions per megapixel above this -> likely text. Rough Phase-1 heuristic: a
|
||||
# no-text portrait measures a few hundred/MP, dense text far more. Set high so it
|
||||
# rarely false-fires; it only ever ADDS controlnet so miscalibration is low-harm.
|
||||
# MSER regions per megapixel above this -> likely text. The MSER path is now only the
|
||||
# FALLBACK when the bundled DBNet model can't load; DBNet (below) is the primary text
|
||||
# detector. Rough heuristic: a no-text portrait measures a few hundred/MP, dense text
|
||||
# far more. Set high so it rarely false-fires; text only ever ADDS controlnet.
|
||||
_TEXT_MSER_PER_MP = 1500.0
|
||||
_FACE_SCORE = 0.6 # YuNet confidence for a face to count
|
||||
# Downscale the long side to this for DETECTION only (faces stay detectable down to
|
||||
# ~10px, and this bounds YuNet/MSER cost on huge inputs). Removal runs at full res.
|
||||
# ~10px, and this bounds YuNet/DBNet/MSER cost on huge inputs). Removal runs at full res.
|
||||
_DETECT_MAX_SIDE = 1024
|
||||
|
||||
# DBNet (PP-OCRv3 differentiable-binarization) text-region detector via cv2.dnn -- the
|
||||
# primary "has meaningful text" signal. The model is the shared PP-OCRv3 detection net
|
||||
# from OpenCV Zoo (Apache-2.0); en/cn variants are byte-identical, so it is bundled
|
||||
# language-neutral. cv2.dnn is core OpenCV, so this adds NO new pip dependency.
|
||||
_DBNET_ASSET = "text_detection_ppocrv3_2023may.onnx" # Apache-2.0 (OpenCV Zoo PP-OCRv3 DB)
|
||||
_DBNET_BINARY_THRESHOLD = 0.3
|
||||
_DBNET_POLYGON_THRESHOLD = 0.5
|
||||
_DBNET_MAX_CANDIDATES = 200
|
||||
_DBNET_UNCLIP_RATIO = 2.0
|
||||
_DBNET_INPUT_SIDE = 736 # square input, multiple of 32 (PP-OCRv3 default)
|
||||
_DBNET_MEAN = (122.67891434, 116.66876762, 104.00698793) # ImageNet mean * 255
|
||||
_dbnet: Any = None # lazy singleton; set to False after a load failure (-> MSER fallback)
|
||||
|
||||
# When a smoothing pass ran (controlnet or face restore), the adaptive polish
|
||||
# (humanizer.adaptive_polish) restores the input's detail level, sparing text --
|
||||
# replacing the old fixed unsharp/grain which over-/under-corrected and speckled text.
|
||||
@@ -152,8 +167,41 @@ def detect_face(image: NDArray[Any]) -> bool:
|
||||
return faces is not None and len(faces) > 0
|
||||
|
||||
|
||||
def detect_text(image: NDArray[Any]) -> bool:
|
||||
"""Rough MSER-based text-presence heuristic (Phase-1 placeholder for DBNet)."""
|
||||
def _detect_text_dbnet(image: NDArray[Any]) -> bool | None:
|
||||
"""DBNet (PP-OCRv3) text-region presence via cv2.dnn.
|
||||
|
||||
Returns True/False on a successful run, or None if the bundled model can't load
|
||||
(the caller then falls back to the MSER heuristic). Loads once, lazily.
|
||||
"""
|
||||
import cv2
|
||||
|
||||
global _dbnet
|
||||
if _dbnet is False: # a prior load failed; skip straight to the MSER fallback
|
||||
return None
|
||||
img = _to_bgr(image)
|
||||
h, w = img.shape[:2]
|
||||
if h < 1 or w < 1:
|
||||
return False
|
||||
try:
|
||||
if _dbnet is None:
|
||||
model = Path(__file__).parent / "assets" / _DBNET_ASSET
|
||||
net = cv2.dnn.TextDetectionModel_DB(str(model))
|
||||
net.setBinaryThreshold(_DBNET_BINARY_THRESHOLD)
|
||||
net.setPolygonThreshold(_DBNET_POLYGON_THRESHOLD)
|
||||
net.setMaxCandidates(_DBNET_MAX_CANDIDATES)
|
||||
net.setUnclipRatio(_DBNET_UNCLIP_RATIO)
|
||||
net.setInputParams(1.0 / 255.0, (_DBNET_INPUT_SIDE, _DBNET_INPUT_SIDE), _DBNET_MEAN)
|
||||
_dbnet = net
|
||||
boxes, _ = _dbnet.detect(img)
|
||||
except Exception as e: # model load / inference can raise cv2.error or others
|
||||
logger.debug("DBNet text detect failed (%s); falling back to MSER", e)
|
||||
_dbnet = False
|
||||
return None
|
||||
return boxes is not None and len(boxes) > 0
|
||||
|
||||
|
||||
def _detect_text_mser(image: NDArray[Any]) -> bool:
|
||||
"""Fallback MSER-based text-presence heuristic (used only if DBNet can't load)."""
|
||||
import cv2
|
||||
|
||||
gray = _to_gray(image)
|
||||
@@ -166,6 +214,12 @@ def detect_text(image: NDArray[Any]) -> bool:
|
||||
return per_mp > _TEXT_MSER_PER_MP
|
||||
|
||||
|
||||
def detect_text(image: NDArray[Any]) -> bool:
|
||||
"""Text-presence: DBNet (cv2.dnn) when the bundled model loads, else the MSER heuristic."""
|
||||
dbnet = _detect_text_dbnet(image)
|
||||
return _detect_text_mser(image) if dbnet is None else dbnet
|
||||
|
||||
|
||||
def edge_density(image: NDArray[Any]) -> float:
|
||||
"""Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1]."""
|
||||
import cv2
|
||||
@@ -190,9 +244,9 @@ def plan(image_path: Path) -> AutoConfig | None:
|
||||
|
||||
h, w = image.shape[:2]
|
||||
small = _downscale_for_detection(image)
|
||||
gray = _to_gray(small) # convert once; the text/edge detectors pass a gray input through
|
||||
gray = _to_gray(small) # convert once; edge density + the MSER fallback use gray
|
||||
has_face = detect_face(small) # YuNet needs the 3-channel image
|
||||
has_text = detect_text(gray)
|
||||
has_text = detect_text(small) # DBNet wants BGR; the MSER fallback grays it internally
|
||||
edges = edge_density(gray)
|
||||
|
||||
structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX
|
||||
|
||||
@@ -159,6 +159,16 @@ _unsharp_option = click.option(
|
||||
"--unsharp", type=float, default=0.0, help="Unsharp-mask sharpening strength (0 = off, typical: 0.3-0.8)."
|
||||
)
|
||||
|
||||
_upscaler_option = click.option(
|
||||
"--upscaler",
|
||||
type=click.Choice(["lanczos", "esrgan"]),
|
||||
default="lanczos",
|
||||
help="How to upscale a small input to the --min-resolution floor: lanczos (default, cv2, no deps) or "
|
||||
"esrgan (Real-ESRGAN via the 'esrgan' extra; better detail, slower on CPU). Best for photo/texture "
|
||||
"content -- as a generic GAN with no face/glyph prior it can degrade faces (diffusion mitigates) and "
|
||||
"thin text, so lanczos stays the default. Falls back to lanczos if the extra is absent. Only when upscaling.",
|
||||
)
|
||||
|
||||
_auto_option = click.option(
|
||||
"--auto",
|
||||
is_flag=True,
|
||||
@@ -210,6 +220,21 @@ def _apply_auto(
|
||||
return pipeline, restore_faces, adaptive_polish
|
||||
|
||||
|
||||
def _warn_if_esrgan_unavailable(upscaler: str) -> None:
|
||||
"""Tell the user once if ``--upscaler esrgan`` will silently fall back to Lanczos.
|
||||
|
||||
The engine downgrades to Lanczos when the ``esrgan`` extra is absent (fail-safe, so
|
||||
a batch never breaks mid-run) -- but without this notice the user would believe
|
||||
Real-ESRGAN ran. Surfaced at the CLI layer, once per invocation (not per image).
|
||||
"""
|
||||
if upscaler != "esrgan":
|
||||
return
|
||||
from remove_ai_watermarks import upscaler as _upscaler
|
||||
|
||||
if not _upscaler.is_available():
|
||||
console.print(" Note: --upscaler esrgan needs the 'esrgan' extra; falling back to Lanczos.")
|
||||
|
||||
|
||||
def _restore_faces_options(f: Any) -> Any:
|
||||
"""Attach the shared GFPGAN face-restoration flags to an invisible-pipeline command."""
|
||||
restore_flag = click.option(
|
||||
@@ -557,6 +582,7 @@ def cmd_erase(
|
||||
@_restore_faces_options
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_upscaler_option
|
||||
@_auto_option
|
||||
@_adaptive_polish_option
|
||||
@click.pass_context
|
||||
@@ -577,6 +603,7 @@ def cmd_invisible(
|
||||
controlnet_scale: float,
|
||||
restore_faces: bool,
|
||||
restore_faces_weight: float,
|
||||
upscaler: str,
|
||||
auto: bool,
|
||||
adaptive_polish: bool,
|
||||
) -> None:
|
||||
@@ -596,6 +623,7 @@ def cmd_invisible(
|
||||
from remove_ai_watermarks.invisible_engine import InvisibleEngine
|
||||
|
||||
source = _validate_image(source)
|
||||
_warn_if_esrgan_unavailable(upscaler)
|
||||
if auto:
|
||||
pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish)
|
||||
if output is None:
|
||||
@@ -634,6 +662,7 @@ def cmd_invisible(
|
||||
adaptive_polish=adaptive_polish,
|
||||
max_resolution=max_resolution,
|
||||
min_resolution=min_resolution,
|
||||
upscaler=upscaler,
|
||||
vendor=vendor,
|
||||
restore_faces=restore_faces,
|
||||
restore_faces_weight=restore_faces_weight,
|
||||
@@ -815,6 +844,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
|
||||
@_restore_faces_options
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_upscaler_option
|
||||
@_auto_option
|
||||
@_adaptive_polish_option
|
||||
@click.pass_context
|
||||
@@ -838,6 +868,7 @@ def cmd_all(
|
||||
controlnet_scale: float,
|
||||
restore_faces: bool,
|
||||
restore_faces_weight: float,
|
||||
upscaler: str,
|
||||
auto: bool,
|
||||
adaptive_polish: bool,
|
||||
) -> None:
|
||||
@@ -854,6 +885,7 @@ def cmd_all(
|
||||
|
||||
_banner()
|
||||
source = _validate_image(source)
|
||||
_warn_if_esrgan_unavailable(upscaler)
|
||||
if auto:
|
||||
pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish)
|
||||
|
||||
@@ -941,6 +973,7 @@ def cmd_all(
|
||||
adaptive_polish=adaptive_polish,
|
||||
max_resolution=max_resolution,
|
||||
min_resolution=min_resolution,
|
||||
upscaler=upscaler,
|
||||
vendor=vendor,
|
||||
restore_faces=restore_faces,
|
||||
restore_faces_weight=restore_faces_weight,
|
||||
@@ -1001,6 +1034,9 @@ def _process_batch_image(
|
||||
restore_faces: bool = False,
|
||||
restore_faces_weight: float = 0.5,
|
||||
controlnet_scale: float = 1.0,
|
||||
upscaler: str = "lanczos",
|
||||
auto: bool = False,
|
||||
adaptive_polish: bool = False,
|
||||
) -> None:
|
||||
"""Process a single image for batch mode.
|
||||
|
||||
@@ -1046,14 +1082,22 @@ def _process_batch_image(
|
||||
if invisible_available():
|
||||
from remove_ai_watermarks.invisible_engine import InvisibleEngine
|
||||
|
||||
if "_inv_engine" not in ctx.obj:
|
||||
ctx.obj["_inv_engine"] = InvisibleEngine(
|
||||
# --auto re-plans the pipeline / face-restore / polish per image; only the
|
||||
# pipeline choice changes the engine ctor, so cache one engine per pipeline
|
||||
# (controlnet vs default) rather than a single shared instance.
|
||||
if auto:
|
||||
pipeline, restore_faces, adaptive_polish = _apply_auto(
|
||||
ctx, img_path, pipeline, restore_faces, adaptive_polish
|
||||
)
|
||||
engines = ctx.obj.setdefault("_inv_engines", {})
|
||||
if pipeline not in engines:
|
||||
engines[pipeline] = InvisibleEngine(
|
||||
device=None if device == "auto" else device,
|
||||
pipeline=pipeline,
|
||||
hf_token=hf_token,
|
||||
controlnet_conditioning_scale=controlnet_scale,
|
||||
)
|
||||
engine_inv = ctx.obj["_inv_engine"]
|
||||
engine_inv = engines[pipeline]
|
||||
engine_inv.remove_watermark(
|
||||
img_path if mode == "invisible" else out_path,
|
||||
out_path,
|
||||
@@ -1062,8 +1106,10 @@ def _process_batch_image(
|
||||
seed=seed,
|
||||
humanize=humanize,
|
||||
unsharp=unsharp,
|
||||
adaptive_polish=adaptive_polish,
|
||||
max_resolution=max_resolution,
|
||||
min_resolution=min_resolution,
|
||||
upscaler=upscaler,
|
||||
restore_faces=restore_faces,
|
||||
restore_faces_weight=restore_faces_weight,
|
||||
# Detect the vendor from the pristine original (`img_path`), not the
|
||||
@@ -1126,7 +1172,10 @@ def _process_batch_image(
|
||||
@_restore_faces_options
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_upscaler_option
|
||||
@_controlnet_scale_option
|
||||
@_auto_option
|
||||
@_adaptive_polish_option
|
||||
@click.pass_context
|
||||
def cmd_batch(
|
||||
ctx: click.Context,
|
||||
@@ -1147,6 +1196,9 @@ def cmd_batch(
|
||||
restore_faces: bool,
|
||||
restore_faces_weight: float,
|
||||
controlnet_scale: float,
|
||||
upscaler: str,
|
||||
auto: bool,
|
||||
adaptive_polish: bool,
|
||||
) -> None:
|
||||
"""Process all images in a directory."""
|
||||
_banner()
|
||||
@@ -1164,6 +1216,8 @@ def cmd_batch(
|
||||
console.print(f" Found {len(images)} images in {directory}")
|
||||
console.print(f" Output -> {output_dir}")
|
||||
console.print(f" Mode: {mode}")
|
||||
if mode in ("invisible", "all"):
|
||||
_warn_if_esrgan_unavailable(upscaler)
|
||||
|
||||
processed = 0
|
||||
errors = 0
|
||||
@@ -1202,6 +1256,9 @@ def cmd_batch(
|
||||
restore_faces=restore_faces,
|
||||
restore_faces_weight=restore_faces_weight,
|
||||
controlnet_scale=controlnet_scale,
|
||||
upscaler=upscaler,
|
||||
auto=auto,
|
||||
adaptive_polish=adaptive_polish,
|
||||
)
|
||||
processed += 1
|
||||
|
||||
|
||||
@@ -126,6 +126,32 @@ class InvisibleEngine:
|
||||
"""Eagerly load the pipeline so download progress is visible."""
|
||||
self._remover.preload()
|
||||
|
||||
def _esrgan_upscale(self, image: Any, target: tuple[int, int]) -> Any:
|
||||
"""Upscale a PIL image to ``target`` with Real-ESRGAN, else Lanczos.
|
||||
|
||||
Runs Real-ESRGAN at its native factor (on the remover's device, CPU fallback),
|
||||
then resizes to the exact ``target`` with Lanczos. Falls back to a plain Lanczos
|
||||
resize when the ``esrgan`` extra is absent or the model errors.
|
||||
"""
|
||||
import cv2
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
from remove_ai_watermarks import upscaler
|
||||
|
||||
if not upscaler.is_available():
|
||||
logger.debug("esrgan upscaler requested but the extra is absent; using Lanczos")
|
||||
return image.resize(target, Image.Resampling.LANCZOS)
|
||||
try:
|
||||
bgr = cv2.cvtColor(np.array(image.convert("RGB")), cv2.COLOR_RGB2BGR)
|
||||
big = upscaler.upscale(bgr, device=self._remover.device)
|
||||
if (big.shape[1], big.shape[0]) != target:
|
||||
big = cv2.resize(big, target, interpolation=cv2.INTER_LANCZOS4)
|
||||
return Image.fromarray(cv2.cvtColor(big, cv2.COLOR_BGR2RGB))
|
||||
except Exception as e: # never let an optional upscaler break removal
|
||||
logger.warning("Real-ESRGAN upscale failed (%s); using Lanczos", e)
|
||||
return image.resize(target, Image.Resampling.LANCZOS)
|
||||
|
||||
def remove_watermark(
|
||||
self,
|
||||
image_path: Path,
|
||||
@@ -142,6 +168,7 @@ class InvisibleEngine:
|
||||
restore_faces_weight: float = 0.5,
|
||||
unsharp: float = 0.0,
|
||||
adaptive_polish: bool = False,
|
||||
upscaler: str = "lanczos",
|
||||
) -> Path:
|
||||
"""Remove invisible watermark from an image.
|
||||
|
||||
@@ -180,6 +207,11 @@ class InvisibleEngine:
|
||||
(default) = on; 0 = off. The output is restored to the original
|
||||
input size, so this is a transparent quality boost; it adds time
|
||||
and memory on small inputs. Ignored on a min > max misconfig.
|
||||
upscaler: How to upscale a small input to the ``min_resolution`` floor:
|
||||
``"lanczos"`` (default, cv2, no deps) or ``"esrgan"`` (Real-ESRGAN
|
||||
via the ``esrgan`` extra). Only applies when UPscaling (the floor
|
||||
case); a ``max_resolution`` downscale always uses Lanczos. Falls back
|
||||
to Lanczos if the extra is absent.
|
||||
|
||||
Returns:
|
||||
Path to the cleaned image.
|
||||
@@ -202,8 +234,8 @@ class InvisibleEngine:
|
||||
|
||||
target = _target_size(image.width, image.height, max_resolution, min_resolution)
|
||||
if target is not None:
|
||||
upscaling = max(target) > max(image.width, image.height)
|
||||
if self._progress_callback:
|
||||
upscaling = max(target) > max(image.width, image.height)
|
||||
reason = (
|
||||
f"min-resolution floor {min_resolution}px"
|
||||
if upscaling
|
||||
@@ -211,7 +243,12 @@ class InvisibleEngine:
|
||||
)
|
||||
verb = "Upscaling" if upscaling else "Downscaling"
|
||||
self._progress_callback(f"{verb} {image.width}x{image.height} to {target[0]}x{target[1]} ({reason})...")
|
||||
image = image.resize(target, Image.Resampling.LANCZOS)
|
||||
# Real-ESRGAN only helps when UPscaling (the floor case); a downscale cap
|
||||
# always uses Lanczos. _esrgan_upscale falls back to Lanczos if the extra is absent.
|
||||
if upscaling and upscaler == "esrgan":
|
||||
image = self._esrgan_upscale(image, target)
|
||||
else:
|
||||
image = image.resize(target, Image.Resampling.LANCZOS)
|
||||
|
||||
# Always persist to a temp file, even without downscaling: WatermarkRemover
|
||||
# reloads by path, so the EXIF-transposed pixels must be saved or rotation
|
||||
|
||||
@@ -0,0 +1,125 @@
|
||||
"""Optional pre-diffusion super-resolution for small inputs (Real-ESRGAN via spandrel).
|
||||
|
||||
Mirrors ``region_eraser``'s optional-backend pattern: ``is_available()`` guards the
|
||||
``spandrel`` import, a lazy singleton (double-checked lock) holds the loaded model, and
|
||||
the weights download on first use (cached by ``torch.hub``) -- they are never bundled.
|
||||
|
||||
The DEFAULT upscaler stays Lanczos (cv2, no deps); this is opt-in via the ``esrgan``
|
||||
extra and feeds the ``--upscaler esrgan`` path. ``spandrel`` is a pure model-loader
|
||||
(MIT) with NO basicsr dependency -- it pulls only torch/torchvision/safetensors/numpy/
|
||||
einops -- so it sidesteps the basicsr / ``torchvision.transforms.functional_tensor``
|
||||
breakage that the ``restore`` (GFPGAN) extra has to shim. Real-ESRGAN weights are
|
||||
BSD-3-Clause.
|
||||
|
||||
CPU works but is slow on large inputs, so this is meant for the pre-diffusion upscale of
|
||||
SMALL inputs (and the GPU worker). On a memory-constrained host it is a no-op (the extra
|
||||
is absent), and the caller falls back to Lanczos.
|
||||
"""
|
||||
|
||||
# torch/spandrel boundary: these libs ship no usable element types; relax the
|
||||
# unknown-type rules for this file only.
|
||||
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib.util
|
||||
import logging
|
||||
import threading
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from numpy.typing import NDArray
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Real-ESRGAN x2plus (BSD-3-Clause), official release. x2 is the right native factor for
|
||||
# the pre-diffusion floor upscale (small inputs ~512 -> ~1024); spandrel infers the
|
||||
# architecture and scale from the checkpoint, so swapping the URL is enough to change it.
|
||||
_MODEL_URL = "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth"
|
||||
_MODEL_FILENAME = "RealESRGAN_x2plus.pth"
|
||||
|
||||
_model: Any = None # lazy singleton (spandrel ImageModelDescriptor)
|
||||
_model_device: str = "cpu"
|
||||
_lock = threading.Lock()
|
||||
|
||||
|
||||
def is_available() -> bool:
|
||||
"""True if the ``esrgan`` extra (spandrel + torch) is importable."""
|
||||
return importlib.util.find_spec("spandrel") is not None and importlib.util.find_spec("torch") is not None
|
||||
|
||||
|
||||
def _model_cache_path() -> Path:
|
||||
"""Path the weights are cached at (the torch.hub checkpoints dir)."""
|
||||
import torch
|
||||
|
||||
cache_dir = Path(torch.hub.get_dir()) / "checkpoints"
|
||||
cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
return cache_dir / _MODEL_FILENAME
|
||||
|
||||
|
||||
def _get_model(device: str) -> Any:
|
||||
"""Load the Real-ESRGAN model once (downloading the weights on first use)."""
|
||||
global _model, _model_device
|
||||
if _model is not None and _model_device == device:
|
||||
return _model
|
||||
with _lock:
|
||||
if _model is None:
|
||||
import torch
|
||||
from spandrel import ImageModelDescriptor, ModelLoader
|
||||
|
||||
dst = _model_cache_path()
|
||||
if not dst.exists():
|
||||
logger.info("Downloading Real-ESRGAN weights to %s", dst)
|
||||
torch.hub.download_url_to_file(_MODEL_URL, str(dst), progress=False)
|
||||
model = ModelLoader().load_from_file(str(dst))
|
||||
if not isinstance(model, ImageModelDescriptor):
|
||||
raise RuntimeError(f"Unexpected spandrel model type: {type(model).__name__}")
|
||||
_model = model.eval()
|
||||
if _model_device != device:
|
||||
_model.to(device)
|
||||
_model_device = device
|
||||
return _model
|
||||
|
||||
|
||||
def scale() -> int:
|
||||
"""The model's native upscale factor (e.g. 2 for x2plus). Loads the model if needed."""
|
||||
return int(_get_model("cpu").scale)
|
||||
|
||||
|
||||
def upscale(image: NDArray[Any], device: str | None = None) -> NDArray[Any]:
|
||||
"""Upscale a BGR uint8 image by the model's native factor with Real-ESRGAN.
|
||||
|
||||
Returns a BGR uint8 array. Falls back to CPU if the requested device errors (an
|
||||
MPS/CUDA OOM or unsupported-op on the small pre-diffusion input), mirroring the
|
||||
diffusion engine's MPS->CPU fallback.
|
||||
|
||||
Raises:
|
||||
RuntimeError: if the ``esrgan`` extra is not installed (guard with
|
||||
``is_available()`` first).
|
||||
"""
|
||||
if not is_available():
|
||||
raise RuntimeError("Real-ESRGAN upscaler needs the 'esrgan' extra (spandrel). Install it or use Lanczos.")
|
||||
import cv2
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
target_device = (device or "cpu").lower()
|
||||
if target_device not in {"cpu", "mps", "cuda", "xpu"}:
|
||||
target_device = "cpu"
|
||||
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
|
||||
tensor = torch.from_numpy(rgb).permute(2, 0, 1).float().div(255.0).unsqueeze(0)
|
||||
|
||||
def _run(dev: str) -> NDArray[Any]:
|
||||
model = _get_model(dev)
|
||||
with torch.no_grad():
|
||||
out = model(tensor.to(dev))
|
||||
arr = out.clamp(0.0, 1.0).squeeze(0).permute(1, 2, 0).cpu().numpy() * 255.0
|
||||
return cv2.cvtColor(arr.round().astype(np.uint8), cv2.COLOR_RGB2BGR)
|
||||
|
||||
try:
|
||||
return _run(target_device)
|
||||
except Exception as e: # GPU OOM / unsupported op: fall back to CPU
|
||||
if target_device == "cpu":
|
||||
raise
|
||||
logger.warning("Real-ESRGAN on %s failed (%s); retrying on CPU", target_device, e)
|
||||
return _run("cpu")
|
||||
@@ -34,6 +34,26 @@ class TestDetectors:
|
||||
cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3)
|
||||
assert auto_config.edge_density(text) > auto_config.edge_density(blank)
|
||||
|
||||
def test_dbnet_detects_text_card(self):
|
||||
"""The bundled PP-OCRv3 DBNet model fires on a clear text card and not on flat."""
|
||||
card = np.full((300, 500, 3), 255, dtype=np.uint8)
|
||||
cv2.putText(card, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
|
||||
assert auto_config._detect_text_dbnet(card) is True
|
||||
assert auto_config._detect_text_dbnet(np.full((300, 500, 3), 128, dtype=np.uint8)) is False
|
||||
|
||||
def test_detect_text_falls_back_to_mser_when_dbnet_unavailable(self, monkeypatch):
|
||||
"""If DBNet can't load (returns None), detect_text uses the MSER heuristic."""
|
||||
monkeypatch.setattr(auto_config, "_detect_text_dbnet", lambda _img: None)
|
||||
called = {}
|
||||
|
||||
def _fake_mser(_img):
|
||||
called["mser"] = True
|
||||
return True
|
||||
|
||||
monkeypatch.setattr(auto_config, "_detect_text_mser", _fake_mser)
|
||||
assert auto_config.detect_text(np.full((100, 100, 3), 128, dtype=np.uint8)) is True
|
||||
assert called.get("mser") is True
|
||||
|
||||
|
||||
class TestPlan:
|
||||
def test_unreadable_returns_none(self, tmp_path):
|
||||
|
||||
@@ -514,6 +514,45 @@ class TestBatchCommand:
|
||||
assert out[0, 0, 3] == 0
|
||||
assert out[100, 100, 3] == 255
|
||||
|
||||
def test_batch_auto_plans_pipeline_per_image(self, runner, tmp_path):
|
||||
"""--auto in batch re-plans the pipeline/restore/polish per image and
|
||||
builds one engine per resolved pipeline."""
|
||||
from remove_ai_watermarks import auto_config
|
||||
|
||||
input_dir = _make_batch_dir(tmp_path, count=2)
|
||||
output_dir = tmp_path / "output"
|
||||
plan = auto_config.AutoConfig(
|
||||
pipeline="controlnet",
|
||||
restore_faces=True,
|
||||
adaptive_polish=True,
|
||||
unsharp=0.0,
|
||||
humanize=0.0,
|
||||
min_resolution=1024,
|
||||
has_face=True,
|
||||
has_text=False,
|
||||
edge_density=0.05,
|
||||
width=200,
|
||||
height=200,
|
||||
)
|
||||
mock_cls, mock_engine = _mock_invisible_engine()
|
||||
with (
|
||||
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
|
||||
patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
|
||||
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
|
||||
patch("remove_ai_watermarks.auto_config.plan", return_value=plan),
|
||||
):
|
||||
result = runner.invoke(
|
||||
main,
|
||||
["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto"],
|
||||
)
|
||||
assert result.exit_code == 0, result.output
|
||||
assert "2 processed" in result.output
|
||||
# Engine built with the auto-resolved controlnet pipeline.
|
||||
assert mock_cls.call_args.kwargs["pipeline"] == "controlnet"
|
||||
# The auto plan's adaptive polish reached the engine call.
|
||||
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
|
||||
|
||||
def test_batch_default_output_dir(self, runner, tmp_path):
|
||||
input_dir = _make_batch_dir(tmp_path)
|
||||
result = runner.invoke(
|
||||
|
||||
@@ -101,3 +101,70 @@ class TestTargetSize:
|
||||
# min(1024) > max(800) is a misconfig: the floor must not upscale above the
|
||||
# cap, so it is skipped and the (within-cap) input stays native.
|
||||
assert _target_size(500, 400, 800, 1024) is None
|
||||
|
||||
|
||||
class TestEsrganUpscale:
|
||||
"""Branches of InvisibleEngine._esrgan_upscale (no diffusion model loaded).
|
||||
|
||||
A SimpleNamespace stands in for the engine so we exercise the helper without
|
||||
constructing a real InvisibleEngine (which would load WatermarkRemover).
|
||||
"""
|
||||
|
||||
@staticmethod
|
||||
def _fake_engine():
|
||||
from types import SimpleNamespace
|
||||
|
||||
return SimpleNamespace(_remover=SimpleNamespace(device="cpu"))
|
||||
|
||||
@staticmethod
|
||||
def _pil(w=120, h=80):
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
return Image.fromarray(np.full((h, w, 3), 128, dtype=np.uint8))
|
||||
|
||||
def test_falls_back_to_lanczos_when_extra_absent(self, monkeypatch):
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
from remove_ai_watermarks import upscaler
|
||||
|
||||
monkeypatch.setattr(upscaler, "is_available", lambda: False)
|
||||
img = self._pil()
|
||||
out = InvisibleEngine._esrgan_upscale(self._fake_engine(), img, (1024, 683))
|
||||
assert out.size == (1024, 683)
|
||||
# Identical to a plain Lanczos resize (the fallback path).
|
||||
assert np.array_equal(np.asarray(out), np.asarray(img.resize((1024, 683), Image.Resampling.LANCZOS)))
|
||||
|
||||
def test_resizes_esrgan_output_to_exact_target(self, monkeypatch):
|
||||
import cv2
|
||||
|
||||
from remove_ai_watermarks import upscaler
|
||||
|
||||
monkeypatch.setattr(upscaler, "is_available", lambda: True)
|
||||
|
||||
# Fake a 2x upscale that does NOT match the requested target; the helper must
|
||||
# resize it to the exact target.
|
||||
def _fake_upscale(bgr, device=None):
|
||||
return cv2.resize(bgr, (bgr.shape[1] * 2, bgr.shape[0] * 2), interpolation=cv2.INTER_NEAREST)
|
||||
|
||||
monkeypatch.setattr(upscaler, "upscale", _fake_upscale)
|
||||
out = InvisibleEngine._esrgan_upscale(self._fake_engine(), self._pil(), (1024, 683))
|
||||
assert out.size == (1024, 683)
|
||||
|
||||
def test_falls_back_to_lanczos_when_upscale_raises(self, monkeypatch):
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
from remove_ai_watermarks import upscaler
|
||||
|
||||
monkeypatch.setattr(upscaler, "is_available", lambda: True)
|
||||
|
||||
def _boom(bgr, device=None):
|
||||
raise RuntimeError("model exploded")
|
||||
|
||||
monkeypatch.setattr(upscaler, "upscale", _boom)
|
||||
img = self._pil()
|
||||
out = InvisibleEngine._esrgan_upscale(self._fake_engine(), img, (512, 341))
|
||||
assert out.size == (512, 341)
|
||||
assert np.array_equal(np.asarray(out), np.asarray(img.resize((512, 341), Image.Resampling.LANCZOS)))
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
"""Tests for the optional Real-ESRGAN upscaler (no model download).
|
||||
|
||||
The model-running path is exercised manually (it downloads ~67 MB of BSD-3-Clause
|
||||
weights on first use); these tests cover the availability guard and the no-model
|
||||
control flow, mirroring the repo convention for ML-adjacent modules.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
from remove_ai_watermarks import upscaler
|
||||
|
||||
|
||||
class TestIsAvailable:
|
||||
def test_returns_bool(self):
|
||||
assert isinstance(upscaler.is_available(), bool)
|
||||
|
||||
|
||||
class TestUpscaleGuard:
|
||||
def test_raises_without_extra(self, monkeypatch):
|
||||
monkeypatch.setattr(upscaler, "is_available", lambda: False)
|
||||
with pytest.raises(RuntimeError, match="esrgan"):
|
||||
upscaler.upscale(np.full((32, 32, 3), 128, dtype=np.uint8))
|
||||
|
||||
|
||||
class TestModelCachePath:
|
||||
def test_cache_path_uses_model_filename(self):
|
||||
if not upscaler.is_available():
|
||||
pytest.skip("esrgan extra (torch) not installed")
|
||||
assert upscaler._model_cache_path().name == upscaler._MODEL_FILENAME
|
||||
@@ -3075,6 +3075,9 @@ dev = [
|
||||
{ name = "pytest-cov" },
|
||||
{ name = "ruff" },
|
||||
]
|
||||
esrgan = [
|
||||
{ name = "spandrel" },
|
||||
]
|
||||
gpu = [
|
||||
{ name = "accelerate" },
|
||||
{ name = "diffusers" },
|
||||
@@ -3125,12 +3128,13 @@ requires-dist = [
|
||||
{ name = "ruff", marker = "extra == 'dev'", specifier = ">=0.4.0" },
|
||||
{ name = "safetensors", marker = "extra == 'gpu'" },
|
||||
{ name = "scipy", marker = "extra == 'restore'", specifier = "<1.18" },
|
||||
{ name = "spandrel", marker = "extra == 'esrgan'", specifier = ">=0.3.0" },
|
||||
{ name = "tokenizers", marker = "extra == 'gpu'", specifier = ">=0.22,<0.23" },
|
||||
{ name = "torch", marker = "extra == 'gpu'", specifier = ">=2.0.0" },
|
||||
{ name = "transformers", marker = "extra == 'gpu'", specifier = ">=5,<6" },
|
||||
{ name = "trustmark", marker = "extra == 'trustmark'", specifier = ">=0.8.0" },
|
||||
]
|
||||
provides-extras = ["gpu", "detect", "trustmark", "lama", "restore", "dev", "all"]
|
||||
provides-extras = ["gpu", "detect", "trustmark", "lama", "restore", "esrgan", "dev", "all"]
|
||||
|
||||
[[package]]
|
||||
name = "requests"
|
||||
@@ -3494,6 +3498,23 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "spandrel"
|
||||
version = "0.4.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "einops" },
|
||||
{ name = "numpy" },
|
||||
{ name = "safetensors" },
|
||||
{ name = "torch" },
|
||||
{ name = "torchvision" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/2a/8f/ab4565c23dd67a036ab72101a830cebd7ca026b2fddf5771bbf6284f6228/spandrel-0.4.2.tar.gz", hash = "sha256:fefa4ea966c6a5b7721dcf24f3e2062a5a96a395c8bedcb570fb55971fdcbccb", size = 247544, upload-time = "2026-02-21T01:52:26.342Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/74/31/411ea965835534c43d4b98d451968354876e0e867ea1fd42669e4cca0732/spandrel-0.4.2-py3-none-any.whl", hash = "sha256:6c93e3ecbeb0e548fd2df45a605472b34c1614287c56b51bb33cdef7ae5235b5", size = 320811, upload-time = "2026-02-21T01:52:25.015Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sympy"
|
||||
version = "1.14.0"
|
||||
|
||||
Reference in New Issue
Block a user