feat(auto): DBNet text detector, Real-ESRGAN upscaler, batch --auto

Three content-quality features for the invisible/all/batch pipeline.

DBNet text detector (auto_config): replace the MSER text heuristic with
PP-OCRv3 differentiable-binarization via cv2.dnn.TextDetectionModel_DB,
using a bundled 2.4 MB Apache-2.0 model (en/cn detection nets are
byte-identical, so it ships language-neutral). cv2.dnn is core OpenCV, so
no new pip dep. MSER stays as the fallback when the model can't load.
Validated on real images: matches MSER everywhere and additionally catches
the Doubao CJK mark MSER missed; routing decisions unchanged otherwise.

Real-ESRGAN upscaler (new upscaler.py, esrgan extra): optional
pre-diffusion super-resolution for the min-resolution floor upscale, loaded
via spandrel (MIT, no basicsr) with BSD-3-Clause weights downloaded on
first use. New --upscaler {lanczos,esrgan} on invisible/all/batch; default
stays lanczos and the engine falls back to lanczos when the extra is absent
or the model errors (never breaks removal). It is a manual opt-in knob (the
auto plan never selects it) -- as a generic GAN it sharpens photo/texture
content strongly but can degrade faces (the diffusion pass regenerates
them) and thin text, documented accordingly.

batch --auto: wire the content-adaptive --auto (+ --adaptive-polish) into
cmd_batch. The plan is recomputed per image and the invisible engine is
cached per resolved pipeline (default/controlnet), so a mixed directory
builds at most one engine of each kind. Verified end-to-end: 3 mixed
images routed correctly with only 2 pipeline loads (controlnet reused).

ruff + strict pyright(src/) clean; 558 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-04 16:04:33 -07:00
parent 4a6cd71ab2
commit 6d11c11b52
13 changed files with 507 additions and 27 deletions
+6 -4
View File
File diff suppressed because one or more lines are too long
+15 -2
View File
@@ -113,7 +113,7 @@ image → encode to latent space (VAE) at native resolution
→ decode back to pixels (VAE)
```
- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`.
- Large inputs run at native resolution (no down-then-up round-trip, which was the main quality loss in issue #10); use `--max-resolution N` only to cap GPU/MPS memory on very large inputs. Small inputs (long side under 1024 px) are auto-upscaled to a 1024 px floor before diffusion, because SDXL distorts on a tiny latent, and the result is restored to the original size (a transparent quality boost). Disable the floor with `--min-resolution 0`. The floor upscale uses Lanczos by default; `--upscaler esrgan` (the `esrgan` extra) runs Real-ESRGAN first for sharper detail and falls back to Lanczos if the extra is absent. ESRGAN is a generic photo/texture GAN with no face/glyph prior, so it is best for photo/texture content -- it can degrade faces (the diffusion pass regenerates them, so the final recovers) and thin text; keep Lanczos for text-heavy inputs.
> **Default strength is vendor-adaptive (no flag needed).** The tool reads the C2PA issuer to detect which vendor's SynthID is present and picks the strength that clears it with the least quality loss: **OpenAI gpt-image → `0.10`**, **Google Gemini → `0.15`**, **unknown source → `0.15`**. An oracle-verified June 2026 study (clean pipeline, per-image openai.com/verify or Gemini app) found OpenAI's watermark clears at `0.05` across `1024`-`1600` px (resolution-independent) while Google's is ~3x more robust and needs `0.15`. The dominant factor is the vendor, not resolution. There is no local SynthID detector, so if the oracle still reads SynthID, raise `--strength`; if you care more about preserving fine text, lower it. (Caveat: Google's `0.15` was validated on the capped `--max-resolution 1536` path; a very large native Gemini image may need more.)
>
@@ -213,6 +213,14 @@ After installation the `remove-ai-watermarks` command is available system-wide.
> ```bash
> pip install -e ".[restore]" # or: uv pip install -e ".[restore]"
> ```
>
> For sharper upscaling of small inputs before diffusion (`--upscaler esrgan`,
> Real-ESRGAN), install the `esrgan` extra. It loads via spandrel (MIT, no basicsr);
> the Real-ESRGAN weights (BSD-3-Clause) download on first use:
>
> ```bash
> pip install -e ".[esrgan]" # or: uv pip install -e ".[esrgan]"
> ```
#### Invisible watermark removal
@@ -280,7 +288,8 @@ remove-ai-watermarks erase image.png --region 1640,1930,400,100 -o clean.png
remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0.5
# --humanize adds film grain, --unsharp counters the soft "AI" look (both opt-in).
# Large images run at native resolution; small ones are upscaled to a 1024 floor
# first (disable with --min-resolution 0). On a very large image that OOMs the
# first (disable with --min-resolution 0); --upscaler esrgan uses Real-ESRGAN for
# that floor upscale (needs the 'esrgan' extra). On a very large image that OOMs the
# GPU/MPS, cap the long side: --max-resolution 2048
# Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override
# with --strength. To preserve text/face structure, use --pipeline controlnet
@@ -301,6 +310,10 @@ remove-ai-watermarks metadata image.png --remove
# Batch with a specific mode
remove-ai-watermarks batch ./images/ --mode visible
# Batch also accepts --auto (and --adaptive-polish): the plan is recomputed per
# image, so a mixed directory routes each file to the right pipeline
remove-ai-watermarks batch ./images/ --mode all --auto
```
### Python API
+13
View File
@@ -92,6 +92,19 @@ restore = [
"scipy<1.18",
"numba<0.60",
]
# Optional pre-diffusion super-resolution for small inputs (Real-ESRGAN). Loaded via
# spandrel (MIT) -- a pure model-loader with NO basicsr dependency (it pulls only
# torch / torchvision / safetensors / numpy / einops), which sidesteps the
# basicsr / torchvision.functional_tensor breakage that the `restore` extra fights.
# The Real-ESRGAN weights (BSD-3-Clause) download on first use and are cached; they
# are never bundled. CPU works but is slow on large inputs -- it is meant for the
# pre-diffusion upscale of SMALL inputs (and the GPU worker). Guarded by
# upscaler.is_available(); the default upscaler stays Lanczos (cv2, no deps). The
# weights are fetched with torch.hub (bundled with spandrel's torch), so no extra
# download dependency is needed.
esrgan = [
"spandrel>=0.3.0",
]
dev = [
"pytest>=8.0.0",
"pytest-cov>=4.1.0",
+69 -15
View File
@@ -17,14 +17,15 @@ text/graphics (already high-frequency, so almost no polish) and spares text/edge
masking the grain.
Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- plus a Canny
edge-density + MSER region heuristic for text/structure. The whole planner peaks
~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs anywhere
the pipeline runs.
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- DBNet (PP-OCRv3
differentiable-binarization via ``cv2.dnn.TextDetectionModel_DB``, a 2.4 MB Apache-2.0
model bundled in ``assets/``) for text, and a Canny ``edge_density``. The whole planner
peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs
anywhere the pipeline runs.
The text heuristic is a deliberately rough Phase-1 placeholder (DBNet via cv2.dnn is
the planned precision upgrade); it only ever ADDS controlnet, so a miss is backstopped
by the edge-density route and a false positive only costs a controlnet run.
The text detector falls back to the old MSER region heuristic if the DBNet model can't
load. Either way text only ever ADDS controlnet, so a miss is backstopped by the
edge-density route and a false positive only costs a controlnet run.
"""
# cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules
@@ -47,15 +48,29 @@ logger = logging.getLogger(__name__)
# preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat
# gradient/solid image falls under 0.008.
_STRUCTURELESS_EDGE_MAX = 0.008
# MSER regions per megapixel above this -> likely text. Rough Phase-1 heuristic: a
# no-text portrait measures a few hundred/MP, dense text far more. Set high so it
# rarely false-fires; it only ever ADDS controlnet so miscalibration is low-harm.
# MSER regions per megapixel above this -> likely text. The MSER path is now only the
# FALLBACK when the bundled DBNet model can't load; DBNet (below) is the primary text
# detector. Rough heuristic: a no-text portrait measures a few hundred/MP, dense text
# far more. Set high so it rarely false-fires; text only ever ADDS controlnet.
_TEXT_MSER_PER_MP = 1500.0
_FACE_SCORE = 0.6 # YuNet confidence for a face to count
# Downscale the long side to this for DETECTION only (faces stay detectable down to
# ~10px, and this bounds YuNet/MSER cost on huge inputs). Removal runs at full res.
# ~10px, and this bounds YuNet/DBNet/MSER cost on huge inputs). Removal runs at full res.
_DETECT_MAX_SIDE = 1024
# DBNet (PP-OCRv3 differentiable-binarization) text-region detector via cv2.dnn -- the
# primary "has meaningful text" signal. The model is the shared PP-OCRv3 detection net
# from OpenCV Zoo (Apache-2.0); en/cn variants are byte-identical, so it is bundled
# language-neutral. cv2.dnn is core OpenCV, so this adds NO new pip dependency.
_DBNET_ASSET = "text_detection_ppocrv3_2023may.onnx" # Apache-2.0 (OpenCV Zoo PP-OCRv3 DB)
_DBNET_BINARY_THRESHOLD = 0.3
_DBNET_POLYGON_THRESHOLD = 0.5
_DBNET_MAX_CANDIDATES = 200
_DBNET_UNCLIP_RATIO = 2.0
_DBNET_INPUT_SIDE = 736 # square input, multiple of 32 (PP-OCRv3 default)
_DBNET_MEAN = (122.67891434, 116.66876762, 104.00698793) # ImageNet mean * 255
_dbnet: Any = None # lazy singleton; set to False after a load failure (-> MSER fallback)
# When a smoothing pass ran (controlnet or face restore), the adaptive polish
# (humanizer.adaptive_polish) restores the input's detail level, sparing text --
# replacing the old fixed unsharp/grain which over-/under-corrected and speckled text.
@@ -152,8 +167,41 @@ def detect_face(image: NDArray[Any]) -> bool:
return faces is not None and len(faces) > 0
def detect_text(image: NDArray[Any]) -> bool:
"""Rough MSER-based text-presence heuristic (Phase-1 placeholder for DBNet)."""
def _detect_text_dbnet(image: NDArray[Any]) -> bool | None:
"""DBNet (PP-OCRv3) text-region presence via cv2.dnn.
Returns True/False on a successful run, or None if the bundled model can't load
(the caller then falls back to the MSER heuristic). Loads once, lazily.
"""
import cv2
global _dbnet
if _dbnet is False: # a prior load failed; skip straight to the MSER fallback
return None
img = _to_bgr(image)
h, w = img.shape[:2]
if h < 1 or w < 1:
return False
try:
if _dbnet is None:
model = Path(__file__).parent / "assets" / _DBNET_ASSET
net = cv2.dnn.TextDetectionModel_DB(str(model))
net.setBinaryThreshold(_DBNET_BINARY_THRESHOLD)
net.setPolygonThreshold(_DBNET_POLYGON_THRESHOLD)
net.setMaxCandidates(_DBNET_MAX_CANDIDATES)
net.setUnclipRatio(_DBNET_UNCLIP_RATIO)
net.setInputParams(1.0 / 255.0, (_DBNET_INPUT_SIDE, _DBNET_INPUT_SIDE), _DBNET_MEAN)
_dbnet = net
boxes, _ = _dbnet.detect(img)
except Exception as e: # model load / inference can raise cv2.error or others
logger.debug("DBNet text detect failed (%s); falling back to MSER", e)
_dbnet = False
return None
return boxes is not None and len(boxes) > 0
def _detect_text_mser(image: NDArray[Any]) -> bool:
"""Fallback MSER-based text-presence heuristic (used only if DBNet can't load)."""
import cv2
gray = _to_gray(image)
@@ -166,6 +214,12 @@ def detect_text(image: NDArray[Any]) -> bool:
return per_mp > _TEXT_MSER_PER_MP
def detect_text(image: NDArray[Any]) -> bool:
"""Text-presence: DBNet (cv2.dnn) when the bundled model loads, else the MSER heuristic."""
dbnet = _detect_text_dbnet(image)
return _detect_text_mser(image) if dbnet is None else dbnet
def edge_density(image: NDArray[Any]) -> float:
"""Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1]."""
import cv2
@@ -190,9 +244,9 @@ def plan(image_path: Path) -> AutoConfig | None:
h, w = image.shape[:2]
small = _downscale_for_detection(image)
gray = _to_gray(small) # convert once; the text/edge detectors pass a gray input through
gray = _to_gray(small) # convert once; edge density + the MSER fallback use gray
has_face = detect_face(small) # YuNet needs the 3-channel image
has_text = detect_text(gray)
has_text = detect_text(small) # DBNet wants BGR; the MSER fallback grays it internally
edges = edge_density(gray)
structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX
+60 -3
View File
@@ -159,6 +159,16 @@ _unsharp_option = click.option(
"--unsharp", type=float, default=0.0, help="Unsharp-mask sharpening strength (0 = off, typical: 0.3-0.8)."
)
_upscaler_option = click.option(
"--upscaler",
type=click.Choice(["lanczos", "esrgan"]),
default="lanczos",
help="How to upscale a small input to the --min-resolution floor: lanczos (default, cv2, no deps) or "
"esrgan (Real-ESRGAN via the 'esrgan' extra; better detail, slower on CPU). Best for photo/texture "
"content -- as a generic GAN with no face/glyph prior it can degrade faces (diffusion mitigates) and "
"thin text, so lanczos stays the default. Falls back to lanczos if the extra is absent. Only when upscaling.",
)
_auto_option = click.option(
"--auto",
is_flag=True,
@@ -210,6 +220,21 @@ def _apply_auto(
return pipeline, restore_faces, adaptive_polish
def _warn_if_esrgan_unavailable(upscaler: str) -> None:
"""Tell the user once if ``--upscaler esrgan`` will silently fall back to Lanczos.
The engine downgrades to Lanczos when the ``esrgan`` extra is absent (fail-safe, so
a batch never breaks mid-run) -- but without this notice the user would believe
Real-ESRGAN ran. Surfaced at the CLI layer, once per invocation (not per image).
"""
if upscaler != "esrgan":
return
from remove_ai_watermarks import upscaler as _upscaler
if not _upscaler.is_available():
console.print(" Note: --upscaler esrgan needs the 'esrgan' extra; falling back to Lanczos.")
def _restore_faces_options(f: Any) -> Any:
"""Attach the shared GFPGAN face-restoration flags to an invisible-pipeline command."""
restore_flag = click.option(
@@ -557,6 +582,7 @@ def cmd_erase(
@_restore_faces_options
@_min_resolution_option
@_unsharp_option
@_upscaler_option
@_auto_option
@_adaptive_polish_option
@click.pass_context
@@ -577,6 +603,7 @@ def cmd_invisible(
controlnet_scale: float,
restore_faces: bool,
restore_faces_weight: float,
upscaler: str,
auto: bool,
adaptive_polish: bool,
) -> None:
@@ -596,6 +623,7 @@ def cmd_invisible(
from remove_ai_watermarks.invisible_engine import InvisibleEngine
source = _validate_image(source)
_warn_if_esrgan_unavailable(upscaler)
if auto:
pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish)
if output is None:
@@ -634,6 +662,7 @@ def cmd_invisible(
adaptive_polish=adaptive_polish,
max_resolution=max_resolution,
min_resolution=min_resolution,
upscaler=upscaler,
vendor=vendor,
restore_faces=restore_faces,
restore_faces_weight=restore_faces_weight,
@@ -815,6 +844,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
@_restore_faces_options
@_min_resolution_option
@_unsharp_option
@_upscaler_option
@_auto_option
@_adaptive_polish_option
@click.pass_context
@@ -838,6 +868,7 @@ def cmd_all(
controlnet_scale: float,
restore_faces: bool,
restore_faces_weight: float,
upscaler: str,
auto: bool,
adaptive_polish: bool,
) -> None:
@@ -854,6 +885,7 @@ def cmd_all(
_banner()
source = _validate_image(source)
_warn_if_esrgan_unavailable(upscaler)
if auto:
pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish)
@@ -941,6 +973,7 @@ def cmd_all(
adaptive_polish=adaptive_polish,
max_resolution=max_resolution,
min_resolution=min_resolution,
upscaler=upscaler,
vendor=vendor,
restore_faces=restore_faces,
restore_faces_weight=restore_faces_weight,
@@ -1001,6 +1034,9 @@ def _process_batch_image(
restore_faces: bool = False,
restore_faces_weight: float = 0.5,
controlnet_scale: float = 1.0,
upscaler: str = "lanczos",
auto: bool = False,
adaptive_polish: bool = False,
) -> None:
"""Process a single image for batch mode.
@@ -1046,14 +1082,22 @@ def _process_batch_image(
if invisible_available():
from remove_ai_watermarks.invisible_engine import InvisibleEngine
if "_inv_engine" not in ctx.obj:
ctx.obj["_inv_engine"] = InvisibleEngine(
# --auto re-plans the pipeline / face-restore / polish per image; only the
# pipeline choice changes the engine ctor, so cache one engine per pipeline
# (controlnet vs default) rather than a single shared instance.
if auto:
pipeline, restore_faces, adaptive_polish = _apply_auto(
ctx, img_path, pipeline, restore_faces, adaptive_polish
)
engines = ctx.obj.setdefault("_inv_engines", {})
if pipeline not in engines:
engines[pipeline] = InvisibleEngine(
device=None if device == "auto" else device,
pipeline=pipeline,
hf_token=hf_token,
controlnet_conditioning_scale=controlnet_scale,
)
engine_inv = ctx.obj["_inv_engine"]
engine_inv = engines[pipeline]
engine_inv.remove_watermark(
img_path if mode == "invisible" else out_path,
out_path,
@@ -1062,8 +1106,10 @@ def _process_batch_image(
seed=seed,
humanize=humanize,
unsharp=unsharp,
adaptive_polish=adaptive_polish,
max_resolution=max_resolution,
min_resolution=min_resolution,
upscaler=upscaler,
restore_faces=restore_faces,
restore_faces_weight=restore_faces_weight,
# Detect the vendor from the pristine original (`img_path`), not the
@@ -1126,7 +1172,10 @@ def _process_batch_image(
@_restore_faces_options
@_min_resolution_option
@_unsharp_option
@_upscaler_option
@_controlnet_scale_option
@_auto_option
@_adaptive_polish_option
@click.pass_context
def cmd_batch(
ctx: click.Context,
@@ -1147,6 +1196,9 @@ def cmd_batch(
restore_faces: bool,
restore_faces_weight: float,
controlnet_scale: float,
upscaler: str,
auto: bool,
adaptive_polish: bool,
) -> None:
"""Process all images in a directory."""
_banner()
@@ -1164,6 +1216,8 @@ def cmd_batch(
console.print(f" Found {len(images)} images in {directory}")
console.print(f" Output -> {output_dir}")
console.print(f" Mode: {mode}")
if mode in ("invisible", "all"):
_warn_if_esrgan_unavailable(upscaler)
processed = 0
errors = 0
@@ -1202,6 +1256,9 @@ def cmd_batch(
restore_faces=restore_faces,
restore_faces_weight=restore_faces_weight,
controlnet_scale=controlnet_scale,
upscaler=upscaler,
auto=auto,
adaptive_polish=adaptive_polish,
)
processed += 1
+39 -2
View File
@@ -126,6 +126,32 @@ class InvisibleEngine:
"""Eagerly load the pipeline so download progress is visible."""
self._remover.preload()
def _esrgan_upscale(self, image: Any, target: tuple[int, int]) -> Any:
"""Upscale a PIL image to ``target`` with Real-ESRGAN, else Lanczos.
Runs Real-ESRGAN at its native factor (on the remover's device, CPU fallback),
then resizes to the exact ``target`` with Lanczos. Falls back to a plain Lanczos
resize when the ``esrgan`` extra is absent or the model errors.
"""
import cv2
import numpy as np
from PIL import Image
from remove_ai_watermarks import upscaler
if not upscaler.is_available():
logger.debug("esrgan upscaler requested but the extra is absent; using Lanczos")
return image.resize(target, Image.Resampling.LANCZOS)
try:
bgr = cv2.cvtColor(np.array(image.convert("RGB")), cv2.COLOR_RGB2BGR)
big = upscaler.upscale(bgr, device=self._remover.device)
if (big.shape[1], big.shape[0]) != target:
big = cv2.resize(big, target, interpolation=cv2.INTER_LANCZOS4)
return Image.fromarray(cv2.cvtColor(big, cv2.COLOR_BGR2RGB))
except Exception as e: # never let an optional upscaler break removal
logger.warning("Real-ESRGAN upscale failed (%s); using Lanczos", e)
return image.resize(target, Image.Resampling.LANCZOS)
def remove_watermark(
self,
image_path: Path,
@@ -142,6 +168,7 @@ class InvisibleEngine:
restore_faces_weight: float = 0.5,
unsharp: float = 0.0,
adaptive_polish: bool = False,
upscaler: str = "lanczos",
) -> Path:
"""Remove invisible watermark from an image.
@@ -180,6 +207,11 @@ class InvisibleEngine:
(default) = on; 0 = off. The output is restored to the original
input size, so this is a transparent quality boost; it adds time
and memory on small inputs. Ignored on a min > max misconfig.
upscaler: How to upscale a small input to the ``min_resolution`` floor:
``"lanczos"`` (default, cv2, no deps) or ``"esrgan"`` (Real-ESRGAN
via the ``esrgan`` extra). Only applies when UPscaling (the floor
case); a ``max_resolution`` downscale always uses Lanczos. Falls back
to Lanczos if the extra is absent.
Returns:
Path to the cleaned image.
@@ -202,8 +234,8 @@ class InvisibleEngine:
target = _target_size(image.width, image.height, max_resolution, min_resolution)
if target is not None:
upscaling = max(target) > max(image.width, image.height)
if self._progress_callback:
upscaling = max(target) > max(image.width, image.height)
reason = (
f"min-resolution floor {min_resolution}px"
if upscaling
@@ -211,7 +243,12 @@ class InvisibleEngine:
)
verb = "Upscaling" if upscaling else "Downscaling"
self._progress_callback(f"{verb} {image.width}x{image.height} to {target[0]}x{target[1]} ({reason})...")
image = image.resize(target, Image.Resampling.LANCZOS)
# Real-ESRGAN only helps when UPscaling (the floor case); a downscale cap
# always uses Lanczos. _esrgan_upscale falls back to Lanczos if the extra is absent.
if upscaling and upscaler == "esrgan":
image = self._esrgan_upscale(image, target)
else:
image = image.resize(target, Image.Resampling.LANCZOS)
# Always persist to a temp file, even without downscaling: WatermarkRemover
# reloads by path, so the EXIF-transposed pixels must be saved or rotation
+125
View File
@@ -0,0 +1,125 @@
"""Optional pre-diffusion super-resolution for small inputs (Real-ESRGAN via spandrel).
Mirrors ``region_eraser``'s optional-backend pattern: ``is_available()`` guards the
``spandrel`` import, a lazy singleton (double-checked lock) holds the loaded model, and
the weights download on first use (cached by ``torch.hub``) -- they are never bundled.
The DEFAULT upscaler stays Lanczos (cv2, no deps); this is opt-in via the ``esrgan``
extra and feeds the ``--upscaler esrgan`` path. ``spandrel`` is a pure model-loader
(MIT) with NO basicsr dependency -- it pulls only torch/torchvision/safetensors/numpy/
einops -- so it sidesteps the basicsr / ``torchvision.transforms.functional_tensor``
breakage that the ``restore`` (GFPGAN) extra has to shim. Real-ESRGAN weights are
BSD-3-Clause.
CPU works but is slow on large inputs, so this is meant for the pre-diffusion upscale of
SMALL inputs (and the GPU worker). On a memory-constrained host it is a no-op (the extra
is absent), and the caller falls back to Lanczos.
"""
# torch/spandrel boundary: these libs ship no usable element types; relax the
# unknown-type rules for this file only.
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false
from __future__ import annotations
import importlib.util
import logging
import threading
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
# Real-ESRGAN x2plus (BSD-3-Clause), official release. x2 is the right native factor for
# the pre-diffusion floor upscale (small inputs ~512 -> ~1024); spandrel infers the
# architecture and scale from the checkpoint, so swapping the URL is enough to change it.
_MODEL_URL = "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth"
_MODEL_FILENAME = "RealESRGAN_x2plus.pth"
_model: Any = None # lazy singleton (spandrel ImageModelDescriptor)
_model_device: str = "cpu"
_lock = threading.Lock()
def is_available() -> bool:
"""True if the ``esrgan`` extra (spandrel + torch) is importable."""
return importlib.util.find_spec("spandrel") is not None and importlib.util.find_spec("torch") is not None
def _model_cache_path() -> Path:
"""Path the weights are cached at (the torch.hub checkpoints dir)."""
import torch
cache_dir = Path(torch.hub.get_dir()) / "checkpoints"
cache_dir.mkdir(parents=True, exist_ok=True)
return cache_dir / _MODEL_FILENAME
def _get_model(device: str) -> Any:
"""Load the Real-ESRGAN model once (downloading the weights on first use)."""
global _model, _model_device
if _model is not None and _model_device == device:
return _model
with _lock:
if _model is None:
import torch
from spandrel import ImageModelDescriptor, ModelLoader
dst = _model_cache_path()
if not dst.exists():
logger.info("Downloading Real-ESRGAN weights to %s", dst)
torch.hub.download_url_to_file(_MODEL_URL, str(dst), progress=False)
model = ModelLoader().load_from_file(str(dst))
if not isinstance(model, ImageModelDescriptor):
raise RuntimeError(f"Unexpected spandrel model type: {type(model).__name__}")
_model = model.eval()
if _model_device != device:
_model.to(device)
_model_device = device
return _model
def scale() -> int:
"""The model's native upscale factor (e.g. 2 for x2plus). Loads the model if needed."""
return int(_get_model("cpu").scale)
def upscale(image: NDArray[Any], device: str | None = None) -> NDArray[Any]:
"""Upscale a BGR uint8 image by the model's native factor with Real-ESRGAN.
Returns a BGR uint8 array. Falls back to CPU if the requested device errors (an
MPS/CUDA OOM or unsupported-op on the small pre-diffusion input), mirroring the
diffusion engine's MPS->CPU fallback.
Raises:
RuntimeError: if the ``esrgan`` extra is not installed (guard with
``is_available()`` first).
"""
if not is_available():
raise RuntimeError("Real-ESRGAN upscaler needs the 'esrgan' extra (spandrel). Install it or use Lanczos.")
import cv2
import numpy as np
import torch
target_device = (device or "cpu").lower()
if target_device not in {"cpu", "mps", "cuda", "xpu"}:
target_device = "cpu"
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
tensor = torch.from_numpy(rgb).permute(2, 0, 1).float().div(255.0).unsqueeze(0)
def _run(dev: str) -> NDArray[Any]:
model = _get_model(dev)
with torch.no_grad():
out = model(tensor.to(dev))
arr = out.clamp(0.0, 1.0).squeeze(0).permute(1, 2, 0).cpu().numpy() * 255.0
return cv2.cvtColor(arr.round().astype(np.uint8), cv2.COLOR_RGB2BGR)
try:
return _run(target_device)
except Exception as e: # GPU OOM / unsupported op: fall back to CPU
if target_device == "cpu":
raise
logger.warning("Real-ESRGAN on %s failed (%s); retrying on CPU", target_device, e)
return _run("cpu")
+20
View File
@@ -34,6 +34,26 @@ class TestDetectors:
cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3)
assert auto_config.edge_density(text) > auto_config.edge_density(blank)
def test_dbnet_detects_text_card(self):
"""The bundled PP-OCRv3 DBNet model fires on a clear text card and not on flat."""
card = np.full((300, 500, 3), 255, dtype=np.uint8)
cv2.putText(card, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
assert auto_config._detect_text_dbnet(card) is True
assert auto_config._detect_text_dbnet(np.full((300, 500, 3), 128, dtype=np.uint8)) is False
def test_detect_text_falls_back_to_mser_when_dbnet_unavailable(self, monkeypatch):
"""If DBNet can't load (returns None), detect_text uses the MSER heuristic."""
monkeypatch.setattr(auto_config, "_detect_text_dbnet", lambda _img: None)
called = {}
def _fake_mser(_img):
called["mser"] = True
return True
monkeypatch.setattr(auto_config, "_detect_text_mser", _fake_mser)
assert auto_config.detect_text(np.full((100, 100, 3), 128, dtype=np.uint8)) is True
assert called.get("mser") is True
class TestPlan:
def test_unreadable_returns_none(self, tmp_path):
+39
View File
@@ -514,6 +514,45 @@ class TestBatchCommand:
assert out[0, 0, 3] == 0
assert out[100, 100, 3] == 255
def test_batch_auto_plans_pipeline_per_image(self, runner, tmp_path):
"""--auto in batch re-plans the pipeline/restore/polish per image and
builds one engine per resolved pipeline."""
from remove_ai_watermarks import auto_config
input_dir = _make_batch_dir(tmp_path, count=2)
output_dir = tmp_path / "output"
plan = auto_config.AutoConfig(
pipeline="controlnet",
restore_faces=True,
adaptive_polish=True,
unsharp=0.0,
humanize=0.0,
min_resolution=1024,
has_face=True,
has_text=False,
edge_density=0.05,
width=200,
height=200,
)
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.auto_config.plan", return_value=plan),
):
result = runner.invoke(
main,
["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto"],
)
assert result.exit_code == 0, result.output
assert "2 processed" in result.output
# Engine built with the auto-resolved controlnet pipeline.
assert mock_cls.call_args.kwargs["pipeline"] == "controlnet"
# The auto plan's adaptive polish reached the engine call.
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
def test_batch_default_output_dir(self, runner, tmp_path):
input_dir = _make_batch_dir(tmp_path)
result = runner.invoke(
+67
View File
@@ -101,3 +101,70 @@ class TestTargetSize:
# min(1024) > max(800) is a misconfig: the floor must not upscale above the
# cap, so it is skipped and the (within-cap) input stays native.
assert _target_size(500, 400, 800, 1024) is None
class TestEsrganUpscale:
"""Branches of InvisibleEngine._esrgan_upscale (no diffusion model loaded).
A SimpleNamespace stands in for the engine so we exercise the helper without
constructing a real InvisibleEngine (which would load WatermarkRemover).
"""
@staticmethod
def _fake_engine():
from types import SimpleNamespace
return SimpleNamespace(_remover=SimpleNamespace(device="cpu"))
@staticmethod
def _pil(w=120, h=80):
import numpy as np
from PIL import Image
return Image.fromarray(np.full((h, w, 3), 128, dtype=np.uint8))
def test_falls_back_to_lanczos_when_extra_absent(self, monkeypatch):
import numpy as np
from PIL import Image
from remove_ai_watermarks import upscaler
monkeypatch.setattr(upscaler, "is_available", lambda: False)
img = self._pil()
out = InvisibleEngine._esrgan_upscale(self._fake_engine(), img, (1024, 683))
assert out.size == (1024, 683)
# Identical to a plain Lanczos resize (the fallback path).
assert np.array_equal(np.asarray(out), np.asarray(img.resize((1024, 683), Image.Resampling.LANCZOS)))
def test_resizes_esrgan_output_to_exact_target(self, monkeypatch):
import cv2
from remove_ai_watermarks import upscaler
monkeypatch.setattr(upscaler, "is_available", lambda: True)
# Fake a 2x upscale that does NOT match the requested target; the helper must
# resize it to the exact target.
def _fake_upscale(bgr, device=None):
return cv2.resize(bgr, (bgr.shape[1] * 2, bgr.shape[0] * 2), interpolation=cv2.INTER_NEAREST)
monkeypatch.setattr(upscaler, "upscale", _fake_upscale)
out = InvisibleEngine._esrgan_upscale(self._fake_engine(), self._pil(), (1024, 683))
assert out.size == (1024, 683)
def test_falls_back_to_lanczos_when_upscale_raises(self, monkeypatch):
import numpy as np
from PIL import Image
from remove_ai_watermarks import upscaler
monkeypatch.setattr(upscaler, "is_available", lambda: True)
def _boom(bgr, device=None):
raise RuntimeError("model exploded")
monkeypatch.setattr(upscaler, "upscale", _boom)
img = self._pil()
out = InvisibleEngine._esrgan_upscale(self._fake_engine(), img, (512, 341))
assert out.size == (512, 341)
assert np.array_equal(np.asarray(out), np.asarray(img.resize((512, 341), Image.Resampling.LANCZOS)))
+32
View File
@@ -0,0 +1,32 @@
"""Tests for the optional Real-ESRGAN upscaler (no model download).
The model-running path is exercised manually (it downloads ~67 MB of BSD-3-Clause
weights on first use); these tests cover the availability guard and the no-model
control flow, mirroring the repo convention for ML-adjacent modules.
"""
from __future__ import annotations
import numpy as np
import pytest
from remove_ai_watermarks import upscaler
class TestIsAvailable:
def test_returns_bool(self):
assert isinstance(upscaler.is_available(), bool)
class TestUpscaleGuard:
def test_raises_without_extra(self, monkeypatch):
monkeypatch.setattr(upscaler, "is_available", lambda: False)
with pytest.raises(RuntimeError, match="esrgan"):
upscaler.upscale(np.full((32, 32, 3), 128, dtype=np.uint8))
class TestModelCachePath:
def test_cache_path_uses_model_filename(self):
if not upscaler.is_available():
pytest.skip("esrgan extra (torch) not installed")
assert upscaler._model_cache_path().name == upscaler._MODEL_FILENAME
Generated
+22 -1
View File
@@ -3075,6 +3075,9 @@ dev = [
{ name = "pytest-cov" },
{ name = "ruff" },
]
esrgan = [
{ name = "spandrel" },
]
gpu = [
{ name = "accelerate" },
{ name = "diffusers" },
@@ -3125,12 +3128,13 @@ requires-dist = [
{ name = "ruff", marker = "extra == 'dev'", specifier = ">=0.4.0" },
{ name = "safetensors", marker = "extra == 'gpu'" },
{ name = "scipy", marker = "extra == 'restore'", specifier = "<1.18" },
{ name = "spandrel", marker = "extra == 'esrgan'", specifier = ">=0.3.0" },
{ name = "tokenizers", marker = "extra == 'gpu'", specifier = ">=0.22,<0.23" },
{ name = "torch", marker = "extra == 'gpu'", specifier = ">=2.0.0" },
{ name = "transformers", marker = "extra == 'gpu'", specifier = ">=5,<6" },
{ name = "trustmark", marker = "extra == 'trustmark'", specifier = ">=0.8.0" },
]
provides-extras = ["gpu", "detect", "trustmark", "lama", "restore", "dev", "all"]
provides-extras = ["gpu", "detect", "trustmark", "lama", "restore", "esrgan", "dev", "all"]
[[package]]
name = "requests"
@@ -3494,6 +3498,23 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" },
]
[[package]]
name = "spandrel"
version = "0.4.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "einops" },
{ name = "numpy" },
{ name = "safetensors" },
{ name = "torch" },
{ name = "torchvision" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/2a/8f/ab4565c23dd67a036ab72101a830cebd7ca026b2fddf5771bbf6284f6228/spandrel-0.4.2.tar.gz", hash = "sha256:fefa4ea966c6a5b7721dcf24f3e2062a5a96a395c8bedcb570fb55971fdcbccb", size = 247544, upload-time = "2026-02-21T01:52:26.342Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/74/31/411ea965835534c43d4b98d451968354876e0e867ea1fd42669e4cca0732/spandrel-0.4.2-py3-none-any.whl", hash = "sha256:6c93e3ecbeb0e548fd2df45a605472b34c1614287c56b51bb33cdef7ae5235b5", size = 320811, upload-time = "2026-02-21T01:52:25.015Z" },
]
[[package]]
name = "sympy"
version = "1.14.0"