mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-06-04 18:18:00 +02:00
feat(auto): content-adaptive --auto quality mode, Phase 1
Add `auto_config.plan(image_path) -> AutoConfig`, the first step of the invisible/all pipeline: it inspects the input image (before the diffusion model loads) and picks the quality modes so the run adapts to content. Quality-priority routing -- ControlNet (text/face-structure preservation) is the default, skipped for plain SDXL only on a clearly structure-less image; GFPGAN face restore when a face is present; a mild sharpen + grain polish when a smoothing pass ran. Exposed as `--auto` on `all`/`invisible` (`_apply_auto`; explicit flags override via click's parameter source). Not wired into batch (its engine is cached per-mode). Detection is cv2-only and torch-free (~100 MB peak RSS, a few ms): OpenCV YuNet (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled in assets/) for faces, a Canny edge-density + MSER heuristic for text/structure (a rough Phase-1 placeholder; DBNet via cv2.dnn is the planned upgrade). ZERO new pip deps. Designed to run wherever the pipeline runs -- the raiw.cc Modal GPU worker -- never on the 512 MB web host. Real-ESRGAN-via-Spandrel upscaling (a new `esrgan` extra) and an adaptive Laplacian-variance polish are deferred to later phases. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -45,6 +45,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r
|
||||
- `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton guarded by a double-checked `threading.Lock` so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise.
|
||||
- `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map -- no original pixels are copied or frozen, so SynthID does not survive.** Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is preserved by the optional `--restore-faces` GFPGAN post-pass (EXPERIMENTAL, opt-in, OFF by default) -- see `face_restore.py`). `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once).
|
||||
- `face_restore.py` — optional GFPGAN face-restoration post-pass (cv2/torch/gfpgan boundary, top-of-file pyright pragma). **EXPERIMENTAL, opt-in, OFF by default.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark`, params `restore_faces=False` / `restore_faces_weight=0.5`; CLI `--restore-faces`/`--no-restore-faces` + `--restore-faces-weight` on `invisible`/`all`/`batch`). **Restores face IDENTITY while still scrubbing the pixel watermark:** GFPGAN re-synthesizes each face from a StyleGAN2 prior (codebook/GAN pixels, NOT the original), so the composited face regions carry no watermark and no pixel-copy -- oracle-validated clean at weight 0.5 with identity preserved. Flow: GFPGANer.enhance runs on the ORIGINAL (watermarked) image -> identity faces + RetinaFace boxes (`restorer.face_helper.det_faces`); `_composite_faces` feather-composites those restored face REGIONS into the diffusion-cleaned image. `is_available()` gates on gfpgan + facexlib; lazily-built `GFPGANer` singleton forces CPU unless CUDA (the pip GFPGANer has an MPS device-mismatch bug; it is a cheap post-pass on a few face crops). `_apply_basicsr_shim()` recreates the removed `torchvision.transforms.functional_tensor` module that basicsr imports. The pure `_composite_faces` helper (Gaussian-feathered rectangular alpha per box, `out = restored*a + base*(1-a)`) is unit-tested without the model (`tests/test_face_restore.py`); the model-running path is gated behind `is_available()`. **Commercial-safe** (GFPGAN Apache-2.0 + RetinaFace MIT); the CodeFormer alternative is NON-COMMERCIAL and is NOT shipped. The `restore` extra (gfpgan/facexlib/basicsr) is kept OUT of `all` (heavy + the GFPGANv1.4 + RetinaFace weights download on first use, never bundled). **`restore` pins numpy<2** (same trap class as the removed faceid/insightface extra): basicsr/gfpgan/facexlib are an old ecosystem, so the extra caps `scipy<1.18` (>=1.18 uses `np.long`, gone in numpy 1.24-1.26) and `numba<0.60` to keep the whole env on one numpy 1.26 resolution; verified the `--extra dev --extra gpu` gate env stays numpy 1.26.4 + `diffusers.loaders.peft` importable with `restore` present. **basicsr 1.4.2 builds only on Python <3.13** (its `setup.py get_version()` uses `exec(...)` + `locals()['__version__']`, which the 3.13 fast-locals change broke -> `KeyError: '__version__'`), so the project is pinned to Python 3.12 via `.python-version` and `[tool.uv.extra-build-dependencies] basicsr = ["setuptools<69"]`. basicsr ships sdist-only (no wheel).
|
||||
- `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). `restore_faces` is on when a face is present. A mild polish (`_AUTO_UNSHARP` 0.5, `_AUTO_HUMANIZE` 2.0) is added only when a smoothing pass (controlnet/restore) ran. **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, a Canny edge-density + MSER region heuristic for text/structure (the text part is a rough Phase-1 placeholder — DBNet via `cv2.dnn` is the planned precision upgrade; it only ever ADDS controlnet so a miss is backstopped by edge-density and a false positive only costs a controlnet run), and `edge_density`. `min_resolution` stays 1024. **`_apply_auto` (cli.py)** overrides only the flags the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — an explicit `--pipeline`/`--restore-faces`/`--unsharp`/`--humanize` always wins — and prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible` (not `batch` yet — its engine is cached per-mode, auto needs a per-image pipeline). **Phase 1 adds ZERO new pip deps** (all cv2 core + the bundled MIT model); Real-ESRGAN-via-Spandrel upscaling (a new `esrgan` extra) and an adaptive Laplacian-driven polish are deferred to later phases. Unit-tested without the model where possible (`tests/test_auto_config.py`): flat/text synthetic images for routing, monkeypatched `detect_face`/`detect_text` for the face/text branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`.
|
||||
- `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** `imwrite` returns `False` on an unwritable path (`OSError` caught) instead of raising, matching `cv2.imwrite` semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env.
|
||||
|
||||
### Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29)
|
||||
|
||||
@@ -284,6 +284,9 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0
|
||||
# GPU/MPS, cap the long side: --max-resolution 2048
|
||||
# Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override
|
||||
# with --strength. To preserve text/face structure, use --pipeline controlnet
|
||||
# Or let it choose: --auto picks the pipeline, face restore, and polish from the
|
||||
# image content (controlnet when there is text/structure, face restore when a face
|
||||
# is present). Explicit flags override it. Experimental.
|
||||
# (SDXL + canny ControlNet); tune preservation with --controlnet-scale. Add
|
||||
|
||||
# Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels)
|
||||
|
||||
Binary file not shown.
@@ -0,0 +1,209 @@
|
||||
"""Automatic pipeline planning for the ``--auto`` quality mode.
|
||||
|
||||
``plan(image_path)`` inspects the INPUT image (before the diffusion model loads)
|
||||
and returns the quality modes to use, so the pipeline can adapt to content. It is
|
||||
meant to run as the FIRST step of the invisible/all pipeline, wherever that pipeline
|
||||
runs (locally, or the raiw.cc Modal GPU worker) -- never on a memory-constrained web
|
||||
host (image work there OOM-crashes the container).
|
||||
|
||||
Routing is **quality-priority**: ControlNet (text/face-structure preservation) is the
|
||||
default; it is only skipped for a clearly structure-less image (no face, no text,
|
||||
near-zero edges), where plain SDXL is cheaper and just as good. GFPGAN face
|
||||
restoration is enabled when a face is present. A mild sharpen + grain polish is added
|
||||
when a smoothing pass (controlnet or face restore) ran, to counter the over-smoothed
|
||||
"AI look".
|
||||
|
||||
Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for
|
||||
faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- plus a Canny
|
||||
edge-density + MSER region heuristic for text/structure. The whole planner peaks
|
||||
~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs anywhere
|
||||
the pipeline runs. (Phase 1 applies a fixed mild polish; an adaptive Laplacian-variance
|
||||
polish that measures the OUTPUT is a later phase.)
|
||||
|
||||
The text heuristic is a deliberately rough Phase-1 placeholder (DBNet via cv2.dnn is
|
||||
the planned precision upgrade); it only ever ADDS controlnet, so a miss is backstopped
|
||||
by the edge-density route and a false positive only costs a controlnet run.
|
||||
"""
|
||||
|
||||
# cv2/numpy boundary: cv2 ships no usable element types; relax the unknown-type rules
|
||||
# for this file only.
|
||||
# pyright: reportUnknownMemberType=false, reportUnknownArgumentType=false, reportUnknownVariableType=false, reportUnknownParameterType=false, reportMissingTypeArgument=false, reportMissingTypeStubs=false, reportMissingImports=false, reportArgumentType=false, reportAssignmentType=false, reportReturnType=false, reportCallIssue=false, reportIndexIssue=false, reportOperatorIssue=false, reportOptionalMemberAccess=false, reportOptionalCall=false, reportOptionalSubscript=false, reportOptionalOperand=false, reportAttributeAccessIssue=false, reportPrivateImportUsage=false, reportPrivateUsage=false, reportInvalidTypeForm=false, reportConstantRedefinition=false, reportUnnecessaryComparison=false
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from numpy.typing import NDArray
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── Routing thresholds (tunable; quality-priority -> controlnet unless clearly flat) ──
|
||||
# Canny edge-density below this, AND no face AND no text -> plain SDXL (nothing to
|
||||
# preserve). The headshot measures ~0.022, a busy photo higher; only a near-flat
|
||||
# gradient/solid image falls under 0.008.
|
||||
_STRUCTURELESS_EDGE_MAX = 0.008
|
||||
# MSER regions per megapixel above this -> likely text. Rough Phase-1 heuristic: a
|
||||
# no-text portrait measures a few hundred/MP, dense text far more. Set high so it
|
||||
# rarely false-fires; it only ever ADDS controlnet so miscalibration is low-harm.
|
||||
_TEXT_MSER_PER_MP = 1500.0
|
||||
_FACE_SCORE = 0.6 # YuNet confidence for a face to count
|
||||
# Downscale the long side to this for DETECTION only (faces stay detectable down to
|
||||
# ~10px, and this bounds YuNet/MSER cost on huge inputs). Removal runs at full res.
|
||||
_DETECT_MAX_SIDE = 1024
|
||||
|
||||
# Auto polish applied only when a smoothing pass ran (controlnet or face restore),
|
||||
# to counter the soft "AI look". Conservative defaults; the user can override.
|
||||
_AUTO_UNSHARP = 0.5
|
||||
_AUTO_HUMANIZE = 2.0
|
||||
_UPSCALE_FLOOR = 1024
|
||||
|
||||
_YUNET_ASSET = "face_detection_yunet_2023mar.onnx" # MIT (Shiqi Yu), OpenCV Zoo
|
||||
_yunet: Any = None # lazy singleton
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AutoConfig:
|
||||
"""Resolved quality modes from content analysis (the ``--auto`` plan)."""
|
||||
|
||||
pipeline: str # "default" | "controlnet"
|
||||
restore_faces: bool
|
||||
unsharp: float
|
||||
humanize: float
|
||||
min_resolution: int
|
||||
# signals retained for logging / debugging a bad pick
|
||||
has_face: bool
|
||||
has_text: bool
|
||||
edge_density: float
|
||||
width: int
|
||||
height: int
|
||||
|
||||
@property
|
||||
def reason(self) -> str:
|
||||
"""One-line human-readable summary of the plan (logged per image)."""
|
||||
bits = ["face" if self.has_face else "no-face"]
|
||||
if self.has_text:
|
||||
bits.append("text")
|
||||
bits.append(f"edges={self.edge_density:.3f}")
|
||||
rf = ", face-restore on" if self.restore_faces else ""
|
||||
polish = f", unsharp {self.unsharp}/grain {self.humanize}" if (self.unsharp or self.humanize) else ""
|
||||
return f"{'+'.join(bits)} -> {self.pipeline} pipeline{rf}{polish}"
|
||||
|
||||
|
||||
def _to_bgr(image: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Normalize a 2D grayscale or 4-channel BGRA array to 3-channel BGR."""
|
||||
import cv2
|
||||
|
||||
if image.ndim == 2:
|
||||
return cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
|
||||
if image.shape[2] == 4:
|
||||
return cv2.cvtColor(image, cv2.COLOR_BGRA2BGR)
|
||||
return image
|
||||
|
||||
|
||||
def _to_gray(image: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Single-channel grayscale; passes a 2D (already-gray) input through unchanged."""
|
||||
import cv2
|
||||
|
||||
if image.ndim == 3 and image.shape[2] >= 3:
|
||||
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
return image
|
||||
|
||||
|
||||
def _downscale_for_detection(image: NDArray[Any]) -> NDArray[Any]:
|
||||
"""Shrink the long side to ``_DETECT_MAX_SIDE`` for cheap, bounded detection."""
|
||||
import cv2
|
||||
|
||||
h, w = image.shape[:2]
|
||||
long_side = max(h, w)
|
||||
if long_side <= _DETECT_MAX_SIDE:
|
||||
return image
|
||||
scale = _DETECT_MAX_SIDE / long_side
|
||||
return cv2.resize(image, (max(1, round(w * scale)), max(1, round(h * scale))), interpolation=cv2.INTER_AREA)
|
||||
|
||||
|
||||
def detect_face(image: NDArray[Any]) -> bool:
|
||||
"""True if OpenCV YuNet finds at least one face. cv2-only, torch-free."""
|
||||
import cv2
|
||||
|
||||
global _yunet
|
||||
img = _to_bgr(image)
|
||||
h, w = img.shape[:2]
|
||||
if h < 1 or w < 1:
|
||||
return False
|
||||
try:
|
||||
if _yunet is None:
|
||||
model = Path(__file__).parent / "assets" / _YUNET_ASSET
|
||||
_yunet = cv2.FaceDetectorYN.create(str(model), "", (w, h), _FACE_SCORE, 0.3, 5000)
|
||||
_yunet.setInputSize((w, h))
|
||||
_, faces = _yunet.detect(img)
|
||||
except cv2.error as e: # malformed input / model
|
||||
logger.debug("YuNet face detect failed (%s); assuming no face", e)
|
||||
return False
|
||||
return faces is not None and len(faces) > 0
|
||||
|
||||
|
||||
def detect_text(image: NDArray[Any]) -> bool:
|
||||
"""Rough MSER-based text-presence heuristic (Phase-1 placeholder for DBNet)."""
|
||||
import cv2
|
||||
|
||||
gray = _to_gray(image)
|
||||
h, w = gray.shape[:2]
|
||||
try:
|
||||
regions, _ = cv2.MSER_create().detectRegions(gray)
|
||||
except cv2.error:
|
||||
return False
|
||||
per_mp = len(regions) / max(1e-6, (h * w) / 1e6)
|
||||
return per_mp > _TEXT_MSER_PER_MP
|
||||
|
||||
|
||||
def edge_density(image: NDArray[Any]) -> float:
|
||||
"""Fraction of Canny edge pixels -- a cheap 'has structure' proxy in [0, 1]."""
|
||||
import cv2
|
||||
|
||||
gray = _to_gray(image)
|
||||
edges = cv2.Canny(gray, 100, 200)
|
||||
return float((edges > 0).mean())
|
||||
|
||||
|
||||
def plan(image_path: Path) -> AutoConfig | None:
|
||||
"""Inspect the input image and return the quality modes, or None if unreadable.
|
||||
|
||||
Pure analysis: loads the image, runs the cv2 detectors on a downscaled copy, and
|
||||
applies the quality-priority routing rules. Safe to call wherever the pipeline
|
||||
runs; no diffusion model is loaded.
|
||||
"""
|
||||
from remove_ai_watermarks import image_io
|
||||
|
||||
image = image_io.imread(image_path)
|
||||
if image is None:
|
||||
return None
|
||||
|
||||
h, w = image.shape[:2]
|
||||
small = _downscale_for_detection(image)
|
||||
gray = _to_gray(small) # convert once; the text/edge detectors pass a gray input through
|
||||
has_face = detect_face(small) # YuNet needs the 3-channel image
|
||||
has_text = detect_text(gray)
|
||||
edges = edge_density(gray)
|
||||
|
||||
structureless = (not has_face) and (not has_text) and edges < _STRUCTURELESS_EDGE_MAX
|
||||
pipeline = "default" if structureless else "controlnet"
|
||||
restore_faces = has_face
|
||||
smoothing = pipeline == "controlnet" or restore_faces
|
||||
|
||||
cfg = AutoConfig(
|
||||
pipeline=pipeline,
|
||||
restore_faces=restore_faces,
|
||||
unsharp=_AUTO_UNSHARP if smoothing else 0.0,
|
||||
humanize=_AUTO_HUMANIZE if smoothing else 0.0,
|
||||
min_resolution=_UPSCALE_FLOOR,
|
||||
has_face=has_face,
|
||||
has_text=has_text,
|
||||
edge_density=edges,
|
||||
width=w,
|
||||
height=h,
|
||||
)
|
||||
logger.debug("auto plan for %s: %s", image_path, cfg.reason)
|
||||
return cfg
|
||||
@@ -159,6 +159,48 @@ _unsharp_option = click.option(
|
||||
"--unsharp", type=float, default=0.0, help="Unsharp-mask sharpening strength (0 = off, typical: 0.3-0.8)."
|
||||
)
|
||||
|
||||
_auto_option = click.option(
|
||||
"--auto",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="Auto-pick quality modes (pipeline, face restore, sharpen/grain) from image content. "
|
||||
"Explicit flags override. EXPERIMENTAL.",
|
||||
)
|
||||
|
||||
|
||||
def _apply_auto(
|
||||
ctx: click.Context,
|
||||
source: Path,
|
||||
pipeline: str,
|
||||
restore_faces: bool,
|
||||
unsharp: float,
|
||||
humanize: float,
|
||||
) -> tuple[str, bool, float, float]:
|
||||
"""Resolve ``--auto``: plan modes from the image, overriding only the flags the
|
||||
user left at their default (an explicit flag always wins). Returns the resolved
|
||||
``(pipeline, restore_faces, unsharp, humanize)`` and prints the chosen plan.
|
||||
"""
|
||||
from remove_ai_watermarks import auto_config
|
||||
|
||||
cfg = auto_config.plan(source)
|
||||
if cfg is None:
|
||||
console.print(" Auto: could not read image; using defaults")
|
||||
return pipeline, restore_faces, unsharp, humanize
|
||||
|
||||
def _is_default(name: str) -> bool:
|
||||
return ctx.get_parameter_source(name) == click.core.ParameterSource.DEFAULT
|
||||
|
||||
if _is_default("pipeline"):
|
||||
pipeline = cfg.pipeline
|
||||
if _is_default("restore_faces"):
|
||||
restore_faces = cfg.restore_faces
|
||||
if _is_default("unsharp"):
|
||||
unsharp = cfg.unsharp
|
||||
if _is_default("humanize"):
|
||||
humanize = cfg.humanize
|
||||
console.print(f" Auto: {cfg.reason}")
|
||||
return pipeline, restore_faces, unsharp, humanize
|
||||
|
||||
|
||||
def _restore_faces_options(f: Any) -> Any:
|
||||
"""Attach the shared GFPGAN face-restoration flags to an invisible-pipeline command."""
|
||||
@@ -507,6 +549,7 @@ def cmd_erase(
|
||||
@_restore_faces_options
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_auto_option
|
||||
@click.pass_context
|
||||
def cmd_invisible(
|
||||
ctx: click.Context,
|
||||
@@ -525,6 +568,7 @@ def cmd_invisible(
|
||||
controlnet_scale: float,
|
||||
restore_faces: bool,
|
||||
restore_faces_weight: float,
|
||||
auto: bool,
|
||||
) -> None:
|
||||
"""Remove invisible AI watermarks (SynthID, StableSignature, TreeRing).
|
||||
|
||||
@@ -542,6 +586,10 @@ def cmd_invisible(
|
||||
from remove_ai_watermarks.invisible_engine import InvisibleEngine
|
||||
|
||||
source = _validate_image(source)
|
||||
if auto:
|
||||
pipeline, restore_faces, unsharp, humanize = _apply_auto(
|
||||
ctx, source, pipeline, restore_faces, unsharp, humanize
|
||||
)
|
||||
if output is None:
|
||||
output = source.with_stem(source.stem + "_clean")
|
||||
|
||||
@@ -758,6 +806,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
|
||||
@_restore_faces_options
|
||||
@_min_resolution_option
|
||||
@_unsharp_option
|
||||
@_auto_option
|
||||
@click.pass_context
|
||||
def cmd_all(
|
||||
ctx: click.Context,
|
||||
@@ -779,6 +828,7 @@ def cmd_all(
|
||||
controlnet_scale: float,
|
||||
restore_faces: bool,
|
||||
restore_faces_weight: float,
|
||||
auto: bool,
|
||||
) -> None:
|
||||
"""Remove ALL watermarks: visible + invisible + metadata.
|
||||
|
||||
@@ -793,6 +843,10 @@ def cmd_all(
|
||||
|
||||
_banner()
|
||||
source = _validate_image(source)
|
||||
if auto:
|
||||
pipeline, restore_faces, unsharp, humanize = _apply_auto(
|
||||
ctx, source, pipeline, restore_faces, unsharp, humanize
|
||||
)
|
||||
|
||||
if output is None:
|
||||
output = source.with_stem(source.stem + "_clean")
|
||||
|
||||
@@ -0,0 +1,98 @@
|
||||
"""Tests for the --auto pipeline planner (content-adaptive mode selection).
|
||||
|
||||
Detection runs on synthetic images; the face-present routing is exercised by
|
||||
monkeypatching ``detect_face`` (a real detectable face fixture is private, never
|
||||
committed). The planner is cv2-only and torch-free.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
from remove_ai_watermarks import auto_config, image_io
|
||||
|
||||
|
||||
def _write(img, tmp_path, name="x.png"):
|
||||
p = tmp_path / name
|
||||
image_io.imwrite(p, img)
|
||||
return p
|
||||
|
||||
|
||||
class TestDetectors:
|
||||
def test_detect_face_false_on_flat(self):
|
||||
flat = np.full((200, 200, 3), 128, dtype=np.uint8)
|
||||
assert auto_config.detect_face(flat) is False
|
||||
|
||||
def test_edge_density_flat_near_zero(self):
|
||||
flat = np.full((200, 200, 3), 128, dtype=np.uint8)
|
||||
assert auto_config.edge_density(flat) < 0.001
|
||||
|
||||
def test_edge_density_text_higher_than_blank(self):
|
||||
blank = np.full((200, 400, 3), 255, dtype=np.uint8)
|
||||
text = blank.copy()
|
||||
cv2.putText(text, "HELLO AI TEXT", (10, 120), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 3)
|
||||
assert auto_config.edge_density(text) > auto_config.edge_density(blank)
|
||||
|
||||
|
||||
class TestPlan:
|
||||
def test_unreadable_returns_none(self, tmp_path):
|
||||
assert auto_config.plan(tmp_path / "does_not_exist.png") is None
|
||||
|
||||
def test_flat_image_is_default_pipeline_no_polish(self, tmp_path):
|
||||
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
|
||||
cfg = auto_config.plan(_write(flat, tmp_path))
|
||||
assert cfg is not None
|
||||
assert cfg.pipeline == "default" # structure-less -> plain SDXL
|
||||
assert cfg.restore_faces is False
|
||||
assert cfg.unsharp == 0.0 # no smoothing pass -> no polish
|
||||
assert cfg.humanize == 0.0
|
||||
assert cfg.min_resolution == 1024
|
||||
|
||||
def test_text_image_uses_controlnet(self, tmp_path):
|
||||
img = np.full((300, 500, 3), 255, dtype=np.uint8)
|
||||
cv2.putText(img, "INVOICE TOTAL 1234", (10, 170), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 0, 0), 4)
|
||||
cfg = auto_config.plan(_write(img, tmp_path))
|
||||
assert cfg is not None
|
||||
# Text creates edges above the structure-less floor -> controlnet preserves them.
|
||||
assert cfg.pipeline == "controlnet"
|
||||
|
||||
def test_face_routes_to_restore_and_controlnet_and_polish(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setattr(auto_config, "detect_face", lambda _img: True)
|
||||
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
|
||||
cfg = auto_config.plan(_write(flat, tmp_path))
|
||||
assert cfg is not None
|
||||
assert cfg.has_face
|
||||
assert cfg.restore_faces
|
||||
assert cfg.pipeline == "controlnet"
|
||||
assert cfg.unsharp == 0.5 # smoothing pass ran -> polish on
|
||||
assert cfg.humanize == 2.0
|
||||
|
||||
def test_text_signal_forces_controlnet_on_flat(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setattr(auto_config, "detect_text", lambda _img: True)
|
||||
flat = np.full((300, 300, 3), 128, dtype=np.uint8)
|
||||
cfg = auto_config.plan(_write(flat, tmp_path))
|
||||
assert cfg is not None
|
||||
assert cfg.has_text
|
||||
assert cfg.pipeline == "controlnet"
|
||||
|
||||
|
||||
class TestReason:
|
||||
def test_reason_summarizes_plan(self):
|
||||
cfg = auto_config.AutoConfig(
|
||||
pipeline="controlnet",
|
||||
restore_faces=True,
|
||||
unsharp=0.5,
|
||||
humanize=2.0,
|
||||
min_resolution=1024,
|
||||
has_face=True,
|
||||
has_text=False,
|
||||
edge_density=0.05,
|
||||
width=800,
|
||||
height=600,
|
||||
)
|
||||
r = cfg.reason
|
||||
assert "controlnet" in r
|
||||
assert "face" in r
|
||||
assert "face-restore on" in r
|
||||
assert "unsharp 0.5" in r
|
||||
Reference in New Issue
Block a user