diff --git a/CLAUDE.md b/CLAUDE.md index 2e0e24b..ab64784 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -28,7 +28,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - GPU/ML modules (invisible_engine, watermark_remover) are optional — guard imports with `is_available()` checks - Optional detection extras: `detect` (imwatermark — open SD/SDXL/FLUX watermark) and `trustmark` (Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded by `is_available()` and skipped by `identify` when absent. - Optional `restore` extra (gfpgan/facexlib/basicsr): the GFPGAN face-identity post-pass (`face_restore.py`, CLI `--restore-faces`, **EXPERIMENTAL, opt-in, OFF by default**). Guarded by `face_restore.is_available()`; when enabled it auto-skips with a debug log when the extra is absent or no face is detected. numpy<2-pinned and Python-3.12-pinned (see the `face_restore.py` Key-modules bullet). -- Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask` (`test_humanizer.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do. +- Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask`/`adaptive_polish` (`test_humanizer.py`), `auto_config.plan`/detectors (`test_auto_config.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do. ## Key modules @@ -45,7 +45,7 @@ You are a **principal Python engineer** maintaining a CLI tool and library for r - `trustmark_detector.py` — `detect_trustmark(path)` decodes the OPEN, keyless **Adobe TrustMark** watermark (the soft binding behind Adobe Durable Content Credentials, `alg` `com.adobe.trustmark.P`) via the optional `trustmark` package (extra `trustmark`; pulls torch, downloads model weights on first use). Mirrors `invisible_watermark.py` (lazy singleton guarded by a double-checked `threading.Lock` so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects *provenance*, not AI origin as such (TrustMark also marks human-authored content), so `identify` lists it as a watermark without setting `is_ai_generated`. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only *named* via the `C2PA_SOFT_BINDINGS` scan, not decoded. **False-positive gate (added 2026-05-29):** TrustMark's `wm_present` is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that *cannot* carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a *durable* soft binding engineered to survive re-encoding, so `detect_trustmark` re-decodes after a mild JPEG round-trip (`_survives_reencode`, `_REENCODE_QUALITY` 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise. - `noai/watermark_remover.py` — the `WatermarkRemover` class has two diffusion pipelines, selected by the explicit `pipeline` ctor arg (NOT inferred from `model_id` -- both use the same SDXL base, `DEFAULT_MODEL_ID`). **`default`** runs plain SDXL img2img (`_run_img2img`). **`controlnet`** (**EXPERIMENTAL, opt-in**; `_run_controlnet`, `_load_controlnet_pipeline`) runs `StableDiffusionXLControlNetImg2ImgPipeline` with the SDXL-native canny ControlNet `xinsir/controlnet-canny-sdxl-1.0` (`watermark_profiles.CONTROLNET_CANNY_MODEL`): the control image is `cv2.Canny(gray, 100, 200)` stacked to 3 channels (`_CANNY_LOW`/`_CANNY_HIGH`, prompt `_CONTROLNET_PROMPT` / `_CONTROLNET_NEGATIVE`). **Removal still comes from the img2img regeneration (`strength`); the ControlNet only PRESERVES text and face STRUCTURE via the edge map -- no original pixels are copied or frozen, so SynthID does not survive.** Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is preserved by the optional `--restore-faces` GFPGAN post-pass (EXPERIMENTAL, opt-in, OFF by default) -- see `face_restore.py`). `controlnet_conditioning_scale` (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as `default` (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE `_SDXL_FP16_VAE_ID` is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once). - `face_restore.py` — optional GFPGAN face-restoration post-pass (cv2/torch/gfpgan boundary, top-of-file pyright pragma). **EXPERIMENTAL, opt-in, OFF by default.** Runs AFTER the diffusion removal pass (`InvisibleEngine.remove_watermark`, params `restore_faces=False` / `restore_faces_weight=0.5`; CLI `--restore-faces`/`--no-restore-faces` + `--restore-faces-weight` on `invisible`/`all`/`batch`). **Restores face IDENTITY while still scrubbing the pixel watermark:** GFPGAN re-synthesizes each face from a StyleGAN2 prior (codebook/GAN pixels, NOT the original), so the composited face regions carry no watermark and no pixel-copy -- oracle-validated clean at weight 0.5 with identity preserved. Flow: GFPGANer.enhance runs on the ORIGINAL (watermarked) image -> identity faces + RetinaFace boxes (`restorer.face_helper.det_faces`); `_composite_faces` feather-composites those restored face REGIONS into the diffusion-cleaned image. `is_available()` gates on gfpgan + facexlib; lazily-built `GFPGANer` singleton forces CPU unless CUDA (the pip GFPGANer has an MPS device-mismatch bug; it is a cheap post-pass on a few face crops). `_apply_basicsr_shim()` recreates the removed `torchvision.transforms.functional_tensor` module that basicsr imports. The pure `_composite_faces` helper (Gaussian-feathered rectangular alpha per box, `out = restored*a + base*(1-a)`) is unit-tested without the model (`tests/test_face_restore.py`); the model-running path is gated behind `is_available()`. **Commercial-safe** (GFPGAN Apache-2.0 + RetinaFace MIT); the CodeFormer alternative is NON-COMMERCIAL and is NOT shipped. The `restore` extra (gfpgan/facexlib/basicsr) is kept OUT of `all` (heavy + the GFPGANv1.4 + RetinaFace weights download on first use, never bundled). **`restore` pins numpy<2** (same trap class as the removed faceid/insightface extra): basicsr/gfpgan/facexlib are an old ecosystem, so the extra caps `scipy<1.18` (>=1.18 uses `np.long`, gone in numpy 1.24-1.26) and `numba<0.60` to keep the whole env on one numpy 1.26 resolution; verified the `--extra dev --extra gpu` gate env stays numpy 1.26.4 + `diffusers.loaders.peft` importable with `restore` present. **basicsr 1.4.2 builds only on Python <3.13** (its `setup.py get_version()` uses `exec(...)` + `locals()['__version__']`, which the 3.13 fast-locals change broke -> `KeyError: '__version__'`), so the project is pinned to Python 3.12 via `.python-version` and `[tool.uv.extra-build-dependencies] basicsr = ["setuptools<69"]`. basicsr ships sdist-only (no wheel). -- `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). `restore_faces` is on when a face is present. A mild polish (`_AUTO_UNSHARP` 0.5, `_AUTO_HUMANIZE` 2.0) is added only when a smoothing pass (controlnet/restore) ran. **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, a Canny edge-density + MSER region heuristic for text/structure (the text part is a rough Phase-1 placeholder — DBNet via `cv2.dnn` is the planned precision upgrade; it only ever ADDS controlnet so a miss is backstopped by edge-density and a false positive only costs a controlnet run), and `edge_density`. `min_resolution` stays 1024. **`_apply_auto` (cli.py)** overrides only the flags the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — an explicit `--pipeline`/`--restore-faces`/`--unsharp`/`--humanize` always wins — and prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible` (not `batch` yet — its engine is cached per-mode, auto needs a per-image pipeline). **Phase 1 adds ZERO new pip deps** (all cv2 core + the bundled MIT model); Real-ESRGAN-via-Spandrel upscaling (a new `esrgan` extra) and an adaptive Laplacian-driven polish are deferred to later phases. Unit-tested without the model where possible (`tests/test_auto_config.py`): flat/text synthetic images for routing, monkeypatched `detect_face`/`detect_text` for the face/text branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. +- `auto_config.py` — the `--auto` quality-mode planner (EXPERIMENTAL). `plan(image_path) -> AutoConfig | None` inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. **Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs** — locally or the raiw.cc Modal GPU worker — **never on the 512 MB web host** (image work there OOM-crashes the container; the planner is `_apply_auto` in `cli.py` for the CLI, and raiw-app would call `plan()` inside `RaiwProtect.remove`). **Quality-priority routing:** ControlNet (text/face-structure preservation) is the default; it is skipped for `default` (plain SDXL) only on a clearly structure-less image (`not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX` 0.008). `restore_faces` is on when a face is present. When a smoothing pass (controlnet/restore) ran, the **adaptive polish** (`humanizer.adaptive_polish`) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while **sparing text** (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). **Detection is cv2-only and torch-free** (~100 MB peak RSS, a few ms — measured): OpenCV **YuNet** (`cv2.FaceDetectorYN`, MIT, 232 KB model bundled at `assets/face_detection_yunet_2023mar.onnx`) for faces, a Canny edge-density + MSER region heuristic for text/structure (the text part is a rough Phase-1 placeholder — DBNet via `cv2.dnn` is the planned precision upgrade; it only ever ADDS controlnet so a miss is backstopped by edge-density and a false positive only costs a controlnet run), and `edge_density`. `min_resolution` stays 1024. **Every auto decision is independently overridable** (interface principle): `_apply_auto` (cli.py) overrides only the three content-adaptive modes the user left at their click default (`ctx.get_parameter_source(...) == DEFAULT`) — `--pipeline`, `--restore-faces`/`--no-restore-faces`, and **`--adaptive-polish`/`--no-adaptive-polish`** always win; `--min-resolution`/`--strength`/`--unsharp`/`--humanize` are independent knobs. `--adaptive-polish` also works WITHOUT `--auto` (manual detail-targeted polish; the engine's `adaptive_polish` param uses the full-res original as the detail reference). Prints the chosen plan (`AutoConfig.reason`). Wired into `cmd_all`/`cmd_invisible` (not `batch` yet — its engine is cached per-mode, auto needs a per-image pipeline). **Adds ZERO new pip deps** (all cv2 core + the bundled MIT model + the cv2-only adaptive polish). Still deferred: Real-ESRGAN-via-Spandrel upscaling (a new `esrgan` extra) and a DBNet text detector (replacing the MSER heuristic). Unit-tested without the model where possible (`tests/test_auto_config.py`): flat/text synthetic images for routing, monkeypatched `detect_face`/`detect_text` for the face/text branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in `modal_app.py` and pass `auto=True`. - `image_io.py` — Unicode-safe cv2 IO (issue #17). `imread(path, flags=None)` / `imwrite(path, img)` wrap `np.fromfile`+`cv2.imdecode` / `cv2.imencode`+`tofile` so non-ASCII paths work on Windows -- bare `cv2.imread`/`cv2.imwrite` use the platform ANSI code-page API there and fail (empty decode + `can't open/read file`) on Chinese/Cyrillic/accented filenames. `imread` keeps `cv2.imread` semantics (defaults to `IMREAD_COLOR`, returns `None` on missing/empty/undecodable). **Every cv2 file read/write in the package routes through here; do not call `cv2.imread`/`cv2.imwrite` directly.** `imwrite` returns `False` on an unwritable path (`OSError` caught) instead of raising, matching `cv2.imwrite` semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env. ### Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29) diff --git a/README.md b/README.md index 8a94569..1d45978 100644 --- a/README.md +++ b/README.md @@ -284,9 +284,11 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0 # GPU/MPS, cap the long side: --max-resolution 2048 # Strength is vendor-adaptive by default (OpenAI 0.10 / Google 0.15); override # with --strength. To preserve text/face structure, use --pipeline controlnet -# Or let it choose: --auto picks the pipeline, face restore, and polish from the -# image content (controlnet when there is text/structure, face restore when a face -# is present). Explicit flags override it. Experimental. +# Or let it choose: --auto picks the pipeline, face restore, and an adaptive polish +# from the image content (controlnet when there is text/structure, face restore when +# a face is present, polish that restores the input's detail level while sparing +# text). Every choice is overridable: --pipeline, --no-restore-faces, +# --no-adaptive-polish all win over the auto pick. Experimental. # (SDXL + canny ControlNet); tune preservation with --controlnet-scale. Add # Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels) diff --git a/src/remove_ai_watermarks/auto_config.py b/src/remove_ai_watermarks/auto_config.py index 8516d3b..4e29975 100644 --- a/src/remove_ai_watermarks/auto_config.py +++ b/src/remove_ai_watermarks/auto_config.py @@ -9,16 +9,18 @@ host (image work there OOM-crashes the container). Routing is **quality-priority**: ControlNet (text/face-structure preservation) is the default; it is only skipped for a clearly structure-less image (no face, no text, near-zero edges), where plain SDXL is cheaper and just as good. GFPGAN face -restoration is enabled when a face is present. A mild sharpen + grain polish is added -when a smoothing pass (controlnet or face restore) ran, to counter the over-smoothed -"AI look". +restoration is enabled when a face is present. When a smoothing pass (controlnet or +face restore) ran, the **adaptive polish** (``humanizer.adaptive_polish``) restores +the input's detail level -- a capped unsharp + edge-masked grain targeting the input's +Laplacian variance -- to counter the over-smoothed "AI look". It is self-limiting on +text/graphics (already high-frequency, so almost no polish) and spares text/edges by +masking the grain. Detection is **cv2-only and torch-free**: OpenCV YuNet (``cv2.FaceDetectorYN``) for faces -- a 232 KB MIT-licensed model bundled in ``assets/`` -- plus a Canny edge-density + MSER region heuristic for text/structure. The whole planner peaks ~100 MB RSS in a few ms, so it adds nothing meaningful to a GPU run and runs anywhere -the pipeline runs. (Phase 1 applies a fixed mild polish; an adaptive Laplacian-variance -polish that measures the OUTPUT is a later phase.) +the pipeline runs. The text heuristic is a deliberately rough Phase-1 placeholder (DBNet via cv2.dnn is the planned precision upgrade); it only ever ADDS controlnet, so a miss is backstopped @@ -54,10 +56,9 @@ _FACE_SCORE = 0.6 # YuNet confidence for a face to count # ~10px, and this bounds YuNet/MSER cost on huge inputs). Removal runs at full res. _DETECT_MAX_SIDE = 1024 -# Auto polish applied only when a smoothing pass ran (controlnet or face restore), -# to counter the soft "AI look". Conservative defaults; the user can override. -_AUTO_UNSHARP = 0.5 -_AUTO_HUMANIZE = 2.0 +# When a smoothing pass ran (controlnet or face restore), the adaptive polish +# (humanizer.adaptive_polish) restores the input's detail level, sparing text -- +# replacing the old fixed unsharp/grain which over-/under-corrected and speckled text. _UPSCALE_FLOOR = 1024 _YUNET_ASSET = "face_detection_yunet_2023mar.onnx" # MIT (Shiqi Yu), OpenCV Zoo @@ -70,7 +71,8 @@ class AutoConfig: pipeline: str # "default" | "controlnet" restore_faces: bool - unsharp: float + adaptive_polish: bool # restore the input's detail level (sharpen + masked grain), sparing text + unsharp: float # fixed-polish knobs, 0 in auto (the adaptive polish replaces them) humanize: float min_resolution: int # signals retained for logging / debugging a bad pick @@ -88,7 +90,12 @@ class AutoConfig: bits.append("text") bits.append(f"edges={self.edge_density:.3f}") rf = ", face-restore on" if self.restore_faces else "" - polish = f", unsharp {self.unsharp}/grain {self.humanize}" if (self.unsharp or self.humanize) else "" + if self.adaptive_polish: + polish = ", adaptive polish" + elif self.unsharp or self.humanize: + polish = f", unsharp {self.unsharp}/grain {self.humanize}" + else: + polish = "" return f"{'+'.join(bits)} -> {self.pipeline} pipeline{rf}{polish}" @@ -196,8 +203,9 @@ def plan(image_path: Path) -> AutoConfig | None: cfg = AutoConfig( pipeline=pipeline, restore_faces=restore_faces, - unsharp=_AUTO_UNSHARP if smoothing else 0.0, - humanize=_AUTO_HUMANIZE if smoothing else 0.0, + adaptive_polish=smoothing, # adaptive (detail-targeted) polish when a smoothing pass ran + unsharp=0.0, + humanize=0.0, min_resolution=_UPSCALE_FLOOR, has_face=has_face, has_text=has_text, diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index 21aec82..5aad0be 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -163,8 +163,18 @@ _auto_option = click.option( "--auto", is_flag=True, default=False, - help="Auto-pick quality modes (pipeline, face restore, sharpen/grain) from image content. " - "Explicit flags override. EXPERIMENTAL.", + help="Auto-pick the pipeline, face restore, and adaptive polish from image content. " + "Every choice is overridable -- an explicit --pipeline / --restore-faces / --adaptive-polish " + "always wins. EXPERIMENTAL.", +) + +_adaptive_polish_option = click.option( + "--adaptive-polish/--no-adaptive-polish", + default=False, + help="Restore the input's detail level after removal (capped unsharp + edge-masked grain " + "targeting the input's sharpness, sparing text). On by default under --auto; pass " + "--no-adaptive-polish to disable it there, or --adaptive-polish to use it without --auto. " + "Independent of the fixed --unsharp/--humanize. EXPERIMENTAL.", ) @@ -173,19 +183,19 @@ def _apply_auto( source: Path, pipeline: str, restore_faces: bool, - unsharp: float, - humanize: float, -) -> tuple[str, bool, float, float]: - """Resolve ``--auto``: plan modes from the image, overriding only the flags the - user left at their default (an explicit flag always wins). Returns the resolved - ``(pipeline, restore_faces, unsharp, humanize)`` and prints the chosen plan. + adaptive_polish: bool, +) -> tuple[str, bool, bool]: + """Resolve ``--auto``: plan the three content-adaptive modes (pipeline, face + restore, adaptive polish) from the image, overriding only the ones the user left + at their default (an explicit flag always wins). The fixed ``--unsharp``/ + ``--humanize`` filters are independent and untouched. Prints the chosen plan. """ from remove_ai_watermarks import auto_config cfg = auto_config.plan(source) if cfg is None: console.print(" Auto: could not read image; using defaults") - return pipeline, restore_faces, unsharp, humanize + return pipeline, restore_faces, adaptive_polish def _is_default(name: str) -> bool: return ctx.get_parameter_source(name) == click.core.ParameterSource.DEFAULT @@ -194,12 +204,10 @@ def _apply_auto( pipeline = cfg.pipeline if _is_default("restore_faces"): restore_faces = cfg.restore_faces - if _is_default("unsharp"): - unsharp = cfg.unsharp - if _is_default("humanize"): - humanize = cfg.humanize + if _is_default("adaptive_polish"): + adaptive_polish = cfg.adaptive_polish console.print(f" Auto: {cfg.reason}") - return pipeline, restore_faces, unsharp, humanize + return pipeline, restore_faces, adaptive_polish def _restore_faces_options(f: Any) -> Any: @@ -550,6 +558,7 @@ def cmd_erase( @_min_resolution_option @_unsharp_option @_auto_option +@_adaptive_polish_option @click.pass_context def cmd_invisible( ctx: click.Context, @@ -569,6 +578,7 @@ def cmd_invisible( restore_faces: bool, restore_faces_weight: float, auto: bool, + adaptive_polish: bool, ) -> None: """Remove invisible AI watermarks (SynthID, StableSignature, TreeRing). @@ -587,9 +597,7 @@ def cmd_invisible( source = _validate_image(source) if auto: - pipeline, restore_faces, unsharp, humanize = _apply_auto( - ctx, source, pipeline, restore_faces, unsharp, humanize - ) + pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish) if output is None: output = source.with_stem(source.stem + "_clean") @@ -623,6 +631,7 @@ def cmd_invisible( seed=seed, humanize=humanize, unsharp=unsharp, + adaptive_polish=adaptive_polish, max_resolution=max_resolution, min_resolution=min_resolution, vendor=vendor, @@ -807,6 +816,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo @_min_resolution_option @_unsharp_option @_auto_option +@_adaptive_polish_option @click.pass_context def cmd_all( ctx: click.Context, @@ -829,6 +839,7 @@ def cmd_all( restore_faces: bool, restore_faces_weight: float, auto: bool, + adaptive_polish: bool, ) -> None: """Remove ALL watermarks: visible + invisible + metadata. @@ -844,9 +855,7 @@ def cmd_all( _banner() source = _validate_image(source) if auto: - pipeline, restore_faces, unsharp, humanize = _apply_auto( - ctx, source, pipeline, restore_faces, unsharp, humanize - ) + pipeline, restore_faces, adaptive_polish = _apply_auto(ctx, source, pipeline, restore_faces, adaptive_polish) if output is None: output = source.with_stem(source.stem + "_clean") @@ -929,6 +938,7 @@ def cmd_all( seed=seed, humanize=humanize, unsharp=unsharp, + adaptive_polish=adaptive_polish, max_resolution=max_resolution, min_resolution=min_resolution, vendor=vendor, diff --git a/src/remove_ai_watermarks/humanizer.py b/src/remove_ai_watermarks/humanizer.py index 2298fa7..2d653aa 100644 --- a/src/remove_ai_watermarks/humanizer.py +++ b/src/remove_ai_watermarks/humanizer.py @@ -82,3 +82,87 @@ def unsharp_mask(image: NDArray, amount: float = 0.5, sigma: float = 1.0) -> NDA blurred = cv2.GaussianBlur(img_f, (0, 0), sigmaX=sigma, sigmaY=sigma) sharpened = cv2.addWeighted(img_f, 1.0 + amount, blurred, -amount, 0.0) return np.clip(sharpened, 0, 255).astype(np.uint8) + + +# ── Adaptive polish (target the input's detail level; spare text) ────────────── +# A capped unsharp scaled to the sharpness deficit, then edge-masked grain to close +# the rest -- tunable constants. Validated 2026-06-03 on the spaces corpus: a soft +# gemini_3 face/photo (lap-var 84 vs the 592 of its original) is pulled up to ~327 +# with full polish, while a sharp openai_1 text card (1175 vs 1644) gets near-zero +# (the deficit is tiny) so text is left alone -- the polish self-limits on text. +_ADAPTIVE_MAX_UNSHARP = 1.0 +_ADAPTIVE_UNSHARP_GAIN = 0.4 # unsharp amount per unit of (deficit - 1), before the cap +_ADAPTIVE_MAX_GRAIN = 8.0 +_MASK_EDGE_PERCENTILE = 85.0 # local-energy percentile above which a pixel is an "edge/text" +_MASK_EDGE_DILATE = 5 # grow the edge mask so grain is suppressed in a margin around text +_MASK_GAMMA = 2.0 # push the smooth weight toward 0 except in genuinely flat areas + + +def _to_gray(image: NDArray) -> NDArray: + """Single-channel grayscale; passes a 2D (already-gray) input through unchanged.""" + return image if image.ndim == 2 else cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) + + +def _laplacian_variance(image: NDArray) -> float: + """Variance of the Laplacian -- a cheap proxy for high-frequency detail/sharpness.""" + return float(cv2.Laplacian(_to_gray(image), cv2.CV_64F).var()) + + +def _smooth_grain_mask(image: NDArray) -> NDArray: + """Per-pixel weight ~1 in flat/smooth regions, ~0 over text and hard edges. + + Grain in smooth ("AI-plastic") regions reads as natural sensor noise; grain over + text/edges just speckles them, so this masks grain to the smooth regions only. + """ + energy = cv2.GaussianBlur(np.abs(cv2.Laplacian(_to_gray(image).astype(np.float32), cv2.CV_32F)), (0, 0), sigmaX=2.0) + thr = float(np.percentile(energy, _MASK_EDGE_PERCENTILE)) + kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (_MASK_EDGE_DILATE, _MASK_EDGE_DILATE)) + edges = cv2.dilate((energy > thr).astype(np.uint8), kernel) + mask = np.clip(1.0 - energy / (thr + 1e-6), 0.0, 1.0) ** _MASK_GAMMA + mask[edges > 0] = 0.0 + return cv2.GaussianBlur(mask, (0, 0), sigmaX=1.5) + + +def adaptive_polish(image: NDArray, reference: NDArray, seed: int | None = None) -> NDArray: + """Restore the detail level of ``reference`` in a softened ``image``, sparing text. + + Diffusion + face restoration leave an over-smoothed "AI-plastic" look, worst on + photo/face regions. This targets the reference's Laplacian variance (the input's + detail level): a capped unsharp scaled to the deficit, then edge-masked grain + (smooth regions only) calibrated to close the remaining gap. **Self-limiting on + text/graphics** -- they are already high-frequency, so the deficit is small and + almost no polish is applied (text legibility is a generation-side concern, not a + filter one). No-op when the image already meets the reference's detail level. + + Args: + image: the cleaned BGR output (uint8). + reference: the original input BGR at the same resolution (the detail target). + seed: optional RNG seed for reproducible grain. + + Returns: + Polished BGR image (uint8). + """ + target = _laplacian_variance(reference) + current = _laplacian_variance(image) + if target <= 0.0 or current >= target: + return image.copy() + + deficit = target / max(current, 1.0) + amount = min(_ADAPTIVE_MAX_UNSHARP, _ADAPTIVE_UNSHARP_GAIN * (deficit - 1.0)) + work = unsharp_mask(image, amount=amount, sigma=1.2) if amount > 0.0 else image.copy() + if _laplacian_variance(work) >= target: + return work + + # Calibrate the grain sigma by a short search: its lap-var contribution depends on + # the per-pixel mask (no closed form), so step it up until the target is met. A few + # full-image Laplacians here are negligible against the diffusion pass that precedes. + mask = _smooth_grain_mask(work) + noise = np.random.default_rng(seed).normal(0.0, 1.0, work.shape[:2]).astype(np.float32) * mask + best = work + sigma = 2.0 + while sigma <= _ADAPTIVE_MAX_GRAIN: + best = np.clip(work.astype(np.float32) + (noise * sigma)[:, :, np.newaxis], 0.0, 255.0).astype(np.uint8) + if _laplacian_variance(best) >= target: + break + sigma += 1.0 + return best diff --git a/src/remove_ai_watermarks/invisible_engine.py b/src/remove_ai_watermarks/invisible_engine.py index 8f8d068..f3a3b5d 100644 --- a/src/remove_ai_watermarks/invisible_engine.py +++ b/src/remove_ai_watermarks/invisible_engine.py @@ -141,6 +141,7 @@ class InvisibleEngine: restore_faces: bool = False, restore_faces_weight: float = 0.5, unsharp: float = 0.0, + adaptive_polish: bool = False, ) -> Path: """Remove invisible watermark from an image. @@ -163,6 +164,12 @@ class InvisibleEngine: Applied last (after face restoration) to counter the soft, over-smoothed look of the diffusion/GFPGAN passes; ~0.5-0.8 is a safe range, higher risks edge halos. + adaptive_polish: When True (the --auto mode default), restore the input's + detail level in the softened output instead of fixed unsharp/humanize: + a capped unsharp + edge-masked grain targeting the input's Laplacian + variance (self-limiting on text/graphics). Runs LAST, after face + restoration. The fixed ``humanize``/``unsharp`` knobs are normally 0 + when this is on. max_resolution: Cap the long side (px) before diffusion. 0 (default) = no cap. Set a positive value only to bound GPU/MPS memory on very large inputs (it reintroduces a lossy downscale->upscale @@ -189,6 +196,9 @@ class InvisibleEngine: image = Image.open(image_path) image = ImageOps.exif_transpose(image) orig_size = image.size # (width, height) + # Full-res original, kept for the adaptive-polish detail target (image is + # reassigned to the resized copy below; PIL resize returns a new object). + reference_pil = image target = _target_size(image.width, image.height, max_resolution, min_resolution) if target is not None: @@ -287,6 +297,23 @@ class InvisibleEngine: self._progress_callback(f"Sharpening (unsharp mask: {unsharp})...") image_io.imwrite(out_path, unsharp_mask(out_cv, amount=unsharp)) + # Adaptive polish (--auto): restore the input's detail level in the softened + # output, sparing text/edges. Replaces the fixed unsharp/humanize knobs. + if adaptive_polish: + import cv2 + import numpy as np + + from remove_ai_watermarks import humanizer, image_io + + out_cv = image_io.imread(out_path, cv2.IMREAD_COLOR) + if out_cv is not None: + ref = cv2.cvtColor(np.array(reference_pil.convert("RGB")), cv2.COLOR_RGB2BGR) + if (ref.shape[1], ref.shape[0]) != (out_cv.shape[1], out_cv.shape[0]): + ref = cv2.resize(ref, (out_cv.shape[1], out_cv.shape[0]), interpolation=cv2.INTER_LANCZOS4) + if self._progress_callback: + self._progress_callback("Adaptive polish (sharpen + grain to the input's detail level)...") + image_io.imwrite(out_path, humanizer.adaptive_polish(out_cv, ref, seed=seed)) + return out_path finally: # _tmp_path is always set above (we persist the image unconditionally). diff --git a/tests/test_auto_config.py b/tests/test_auto_config.py index 838d9ee..3dadc10 100644 --- a/tests/test_auto_config.py +++ b/tests/test_auto_config.py @@ -45,7 +45,8 @@ class TestPlan: assert cfg is not None assert cfg.pipeline == "default" # structure-less -> plain SDXL assert cfg.restore_faces is False - assert cfg.unsharp == 0.0 # no smoothing pass -> no polish + assert cfg.adaptive_polish is False # no smoothing pass -> no polish + assert cfg.unsharp == 0.0 assert cfg.humanize == 0.0 assert cfg.min_resolution == 1024 @@ -65,8 +66,9 @@ class TestPlan: assert cfg.has_face assert cfg.restore_faces assert cfg.pipeline == "controlnet" - assert cfg.unsharp == 0.5 # smoothing pass ran -> polish on - assert cfg.humanize == 2.0 + assert cfg.adaptive_polish # smoothing pass ran -> adaptive polish on + assert cfg.unsharp == 0.0 # fixed knobs off; the adaptive polish replaces them + assert cfg.humanize == 0.0 def test_text_signal_forces_controlnet_on_flat(self, tmp_path, monkeypatch): monkeypatch.setattr(auto_config, "detect_text", lambda _img: True) @@ -82,8 +84,9 @@ class TestReason: cfg = auto_config.AutoConfig( pipeline="controlnet", restore_faces=True, - unsharp=0.5, - humanize=2.0, + adaptive_polish=True, + unsharp=0.0, + humanize=0.0, min_resolution=1024, has_face=True, has_text=False, @@ -95,4 +98,4 @@ class TestReason: assert "controlnet" in r assert "face" in r assert "face-restore on" in r - assert "unsharp 0.5" in r + assert "adaptive polish" in r diff --git a/tests/test_humanizer.py b/tests/test_humanizer.py index 9f52545..bed26ac 100644 --- a/tests/test_humanizer.py +++ b/tests/test_humanizer.py @@ -102,3 +102,48 @@ def test_unsharp_flat_image_is_a_noop(): img = np.full((30, 30, 3), 128, dtype=np.uint8) result = unsharp_mask(img, amount=0.8, sigma=1.0) assert np.array_equal(result, img) + + +class TestAdaptivePolish: + """Adaptive polish: target the reference's detail level, sparing text/edges.""" + + def test_noop_when_already_sharp(self): + from remove_ai_watermarks.humanizer import adaptive_polish + + rng = np.random.default_rng(1) + sharp = rng.integers(0, 256, (120, 120, 3), dtype=np.uint8) # high detail + soft_ref = np.full((120, 120, 3), 128, dtype=np.uint8) # flat -> low target + out = adaptive_polish(sharp, soft_ref) + assert np.array_equal(out, sharp) # current >= target -> unchanged copy + + def test_sharpens_a_soft_image_toward_reference(self): + import cv2 + + from remove_ai_watermarks.humanizer import _laplacian_variance, adaptive_polish + + rng = np.random.default_rng(2) + reference = rng.integers(0, 256, (160, 160, 3), dtype=np.uint8) # very high detail + soft = cv2.GaussianBlur(reference, (0, 0), sigmaX=4.0) # blurred -> low detail + out = adaptive_polish(soft, reference, seed=0) + assert _laplacian_variance(out) > _laplacian_variance(soft) # moved toward the target + + def test_mask_spares_edges(self): + from remove_ai_watermarks.humanizer import _smooth_grain_mask + + img = np.full((100, 100, 3), 128, dtype=np.uint8) + img[:, 50:] = 30 # a hard vertical edge down the middle + mask = _smooth_grain_mask(img) + # Flat far-left region keeps grain; the column at the edge is suppressed. + assert mask[:, :15].mean() > mask[:, 45:55].mean() + + def test_deterministic_with_seed(self): + import cv2 + + from remove_ai_watermarks.humanizer import adaptive_polish + + rng = np.random.default_rng(3) + reference = rng.integers(0, 256, (140, 140, 3), dtype=np.uint8) + soft = cv2.GaussianBlur(reference, (0, 0), sigmaX=3.0) + a = adaptive_polish(soft, reference, seed=7) + b = adaptive_polish(soft, reference, seed=7) + assert np.array_equal(a, b)