feat(invisible): skip the diffusion scrub when no invisible watermark is detectable (P0#5)

Regenerating pixels removes SynthID / open watermarks but degrades a real
photo, so running it on a clean image is the dominant paid score-0 cause on
no-watermark uploads. Gate invisible/all/batch on identify.has_invisible_target:
when no invisible AI signal is locally detectable and --force is unset, skip the
regeneration. Per-command semantics:
  - invisible: write no output, exit EXIT_NO_INVISIBLE_SIGNAL (2)
  - all: skip step 2 but keep visible-removed pixels + strip metadata, exit 0
  - batch: skip the scrub; copy the input through in invisible mode
A skip never claims the image is clean (a pixel SynthID is undetectable once its
metadata proxy is gone); the message says so and routes to --force. The gate
fails safe (a detector error runs the removal).

has_invisible_target wraps identify(check_visible=False, check_invisible=True)
and returns the new ProvenanceReport.ai_from_metadata field (the confidence==high
union), so the raiw.cc worker can reuse the same gate. Gate placed before engine
construction so the skip path is cheap; shared via cli._should_skip_invisible_scrub.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-06-22 11:36:54 -07:00
parent 5a612adfef
commit 19f9ab0947
8 changed files with 290 additions and 17 deletions
+3 -3
View File
@@ -17,14 +17,14 @@ Consequences for contributors (do not drift back into the stock niche just becau
## How to run
- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped**`all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped**`all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **No-signal skip (P0#5):** step 2 also runs the same `has_invisible_target` gate (see `invisible` below) — when no invisible watermark is detectable and `--force` is not set, step 2 is skipped and the pixels are left intact, but unlike the GPU-missing skip this is a **SUCCESS (exit 0)**: the visible pass + metadata strip still ran and a file is written (the message says so without claiming the image is clean). Distinct exit semantics by design: GPU-missing = couldn't do the work (non-zero); no-signal = nothing to do (zero). Regression-guarded by `test_all_skips_invisible_on_no_signal_but_succeeds`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). **No-signal skip (P0#5, roadmap):** before the diffusion runs, the command checks `identify.has_invisible_target(source)` (the `ProvenanceReport.ai_from_metadata` union: C2PA AI-issuer / SynthID proxy, IPTC, AIGC, local gen params, EXIF/xAI, open DWT-DCT / TrustMark — visible marks do NOT count, they are a separate pass). When nothing is locally detectable it does NOT regenerate (that would only degrade a clean image — the dominant paid score-0 cause on no-watermark uploads): it writes NO output, prints guidance that does NOT claim the image is clean (a pixel SynthID is undetectable once its metadata proxy is gone), and exits **`EXIT_NO_INVISIBLE_SIGNAL` (2)** — same value/role as the visible `EXIT_NO_VISIBLE_MARK`. `--force/--no-force` (**default skip = ON**) runs the scrub regardless. The check fails SAFE (a detector exception → run, since leaving a watermark on a paid removal is worse than over-regenerating). Helpers `cli._no_invisible_signal_exit` + `identify.has_invisible_target`; regression-guarded by `tests/test_cli.py::TestInvisibleCommand::{test_invisible_no_signal_skips_and_exits_two,test_invisible_force_runs_scrub_on_no_signal,test_invisible_runs_without_force_when_signal_present}` and `tests/test_identify.py::TestHasInvisibleTargetFailSafe`. **Test trap:** any `invisible`/`all`/`batch` test that exercises the diffusion path on a signal-LESS fixture (e.g. the synthetic `sample_png`) MUST pass `--force`, or the new gate skips step 2 (so `mock_engine.remove_watermark` is never called / `invisible` exits 2).
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
- `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
- `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
- `uv run remove-ai-watermarks metadata <image.png> --check` — inspect AI metadata (C2PA, EXIF, PNG chunks)
- `uv run remove-ai-watermarks metadata <image.png> --remove -o <out.png>` — strip all AI metadata
- `uv run remove-ai-watermarks batch <directory>` — process every supported image in a directory (output defaults to `<directory>_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the **full `invisible` knob set above** (`--strength`/`--steps`/`--guidance-scale`/`--pipeline`/`--controlnet-scale`/`--model`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token`/`--humanize`/`--unsharp`/`--adaptive-polish`/`--tile`/`--tile-size`/`--tile-overlap`), plus `--inpaint/--no-inpaint` for the visible pass. `--adaptive-polish` is ON by default; `--auto` is deprecated and a no-op that only warns. One engine cached per pipeline; the polish is resolved once before the loop.
- `uv run remove-ai-watermarks batch <directory>` — process every supported image in a directory (output defaults to `<directory>_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the **full `invisible` knob set above** (`--strength`/`--steps`/`--guidance-scale`/`--pipeline`/`--controlnet-scale`/`--model`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token`/`--humanize`/`--unsharp`/`--adaptive-polish`/`--tile`/`--tile-size`/`--tile-overlap`/`--force`), plus `--inpaint/--no-inpaint` for the visible pass. `--adaptive-polish` is ON by default; `--auto` is deprecated and a no-op that only warns. **No-signal skip (P0#5):** in invisible/all mode each image runs the same `has_invisible_target` gate — a signal-less image is skipped (no diffusion); in `invisible` mode the input is copied through to the output dir so it stays complete, in `all` mode the visible-removed result is kept and metadata is still stripped. `--force` scrubs every image regardless. One engine cached per pipeline; the polish is resolved once before the loop.
## Test and lint
+6
View File
@@ -365,6 +365,12 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0
# --adaptive-polish (ON by default) restores the input's detail level (sparing
# text) to counter the over-smoothed look; it self-limits to a no-op where
# there is no detail deficit. Disable with --no-adaptive-polish.
# By default, if no invisible AI watermark is locally detectable, the diffusion
# scrub is SKIPPED (regenerating pixels would only degrade a clean image): for
# `invisible` that writes no output and exits 2, for `all` it skips step 2 but
# still strips metadata and exits 0. A skip never claims the image is clean
# (a pixel SynthID is undetectable once its metadata is gone). Pass --force to
# regenerate regardless when you know the image is AI-generated.
# Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels)
# --check also flags SynthID-bearing sources: a C2PA manifest signed by
+2
View File
@@ -66,6 +66,8 @@ There is no reliable *local* detector of the SynthID *pixel* watermark — Googl
**This explains the recurring "oracle says clean but `identify` still flags SynthID" report (#14):** the oracle reads the *pixel* watermark (gone after our SDXL pass), while `identify` reads the *C2PA-metadata proxy* (still present if the manifest survived). Different signals, not a contradiction -- strip the metadata too (`metadata --remove` / `all`) and the proxy goes quiet, but a quiet proxy is not proof the pixel watermark is gone.
**Consequence for the P0#5 no-signal skip (`has_invisible_target`, 2026-06-22):** `invisible`/`all`/`batch` skip the diffusion scrub by default when no invisible AI signal is *locally* detectable, to avoid degrading a clean image (`--force` overrides). Because SynthID detection is metadata-only, a real AI image whose C2PA was **already stripped** (e.g. a re-encoded download, or the API/playground surfaces above that never emit C2PA) reads as no-signal and is therefore **skipped** — leaving its pixel SynthID in place. This is the deliberate trade: the skip's message never claims the image is clean, and the user re-runs with `--force` when they know it is AI. The blind spot is the same metadata-only ceiling, not a new bug; the visible-sparkle path (`check_visible`) still catches the no-C2PA Gemini-playground case for the *visible* mark, but not the invisible one.
**SynthID is durable to JPEG re-encode by design, so a GitHub-recompressed issue attachment is still a valid SynthID test subject** (verified 2026-06-01 on issue #14's pic3: the GitHub-served JPEG survived re-encoding and openai.com/verify still detected SynthID). Do NOT dismiss issue-attachment JPEGs as "not faithful originals" when reproducing a SynthID-survival report: the recompression strips the **C2PA metadata** (so `identify` reads Unknown on the attachment) but NOT the **pixel watermark** that openai.com/verify reads. A true byte-original only matters for the metadata/C2PA path, not for the pixel-SynthID-removal test. (Contrast the open imwatermark above, which IS fragile to JPEG.) The spectral phase-coherence approach from `github.com/aloshdenny/reverse-SynthID` was evaluated (May 2026) and **does not work for real-content detection**: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation **0.92**, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24, `scripts/synthid_pixel_probe.py`, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate *content* (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content.
**Correction (deeper re-examination 2026-05-25):** the carrier IS real on solid fills — the earlier "no carrier" was a *method* artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed *phase* at specific low frequencies, so the right metric is **per-bin phase coherence**. On 8 white `gemini-2.5-flash-image` fills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers `(0,±7..±12,±20..±23)` = **0.86** vs **0.31** random; single-image leave-one-out phase-match **+0.83** vs real photos **-0.24**. (Black `2.5-flash` fills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.)
+2
View File
@@ -55,6 +55,8 @@ module.
**High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`).
**`ai_from_metadata` field + `has_invisible_target` helper (P0#5, 2026-06-22):** the high-confidence union (everything that sets `confidence == "high"`: C2PA AI-issuer / SynthID proxy, IPTC, AIGC, local gen params, EXIF/xAI, open DWT-DCT / TrustMark — the medium-confidence `hf_only`/`visible_only`/`samsung_only` are excluded) is now surfaced as the public `ProvenanceReport.ai_from_metadata` boolean, so callers gate on intent rather than on the `confidence` string. `has_invisible_target(path)` wraps `identify(path, check_visible=False, check_invisible=True)` and returns that field — it is the decision gate for the diffusion scrub (the CLI `invisible`/`all`/`batch` no-signal skip, `cli._no_invisible_signal_exit`): a visible-only or no-signal image has it False, so regeneration (which would only degrade a clean image) does not run. It fails SAFE — any detector exception returns True so the removal still runs (leaving a watermark on a paid removal is worse than over-regenerating). It does NOT prove a pixel SynthID is absent (SynthID is detectable only via its metadata proxy, gone once stripped), so a False means "no locally-detectable target", never "clean". Guarded by `test_identify.py::{TestIdentifyRealSamples::test_has_invisible_target_*,TestHasInvisibleTargetFailSafe}`.
## `watermark_registry.py`
`watermark_registry.py`**single catalog of known visible watermarks**, the unified "find known marks in their usual places, recognize, remove" entry.
+98 -1
View File
@@ -281,6 +281,16 @@ _strength_option = click.option(
default=None,
help=f"Denoising strength (0.0-1.0). Default: {strength_default_help()}.",
)
_force_option = click.option(
"--force/--no-force",
default=False,
help=(
"Run the diffusion scrub even when no invisible AI watermark is locally "
"detectable. Default: skip it (regeneration only degrades a clean image; a "
"skip never claims the image is watermark-free -- a pixel SynthID is "
"undetectable once its metadata proxy is gone)."
),
)
def _resolve_auto_polish(auto: bool, adaptive_polish: bool) -> bool:
@@ -388,6 +398,55 @@ def _no_visible_mark_exit(source: Path) -> NoReturn:
raise SystemExit(EXIT_NO_VISIBLE_MARK)
# Same value as EXIT_NO_VISIBLE_MARK (2): a distinct-from-success / distinct-from-
# error code that tells a wrapping service (raiw.cc) "the diffusion scrub was skipped
# because no invisible watermark was locally detectable", so it can surface the
# message instead of charging for and serving an unchanged image as done.
EXIT_NO_INVISIBLE_SIGNAL = 2
def _no_invisible_signal_exit(source: Path) -> NoReturn:
"""Explain why the diffusion scrub was skipped, then exit non-zero.
The ``invisible`` command regenerates pixels to remove SynthID / open
watermarks; that regeneration also degrades a real photo. When
:func:`identify` finds no locally-detectable invisible AI signal, running it
anyway would damage a clean image for nothing -- the dominant paid score-0
cause on no-watermark uploads. So skip it, but do NOT imply the image is
clean: a pixel SynthID is undetectable here once its metadata proxy is gone.
Write no output and exit :data:`EXIT_NO_INVISIBLE_SIGNAL`; ``--force`` runs
the scrub regardless.
"""
console.print(
" No invisible AI watermark detected (no C2PA/SynthID proxy, no open\n"
" watermark). Skipped the diffusion scrub -- regenerating the pixels would\n"
" only degrade the image with nothing to remove, so no output was written.\n"
" This does NOT prove the image is clean: a pixel watermark such as SynthID\n"
" cannot be detected here once its metadata proxy is absent (it may have\n"
" been stripped earlier). If you know the image is AI-generated and want the\n"
" pixels regenerated regardless, re-run with --force:\n"
f" remove-ai-watermarks invisible {source.name} --force"
)
raise SystemExit(EXIT_NO_INVISIBLE_SIGNAL)
def _should_skip_invisible_scrub(force: bool, image_path: Path) -> bool:
"""True when the diffusion scrub should be skipped for *image_path*.
The shared no-signal gate for ``invisible`` / ``all`` / ``batch``: skip when
``--force`` is not set AND no invisible AI watermark is locally detectable
(regenerating pixels would only degrade a clean image -- the dominant paid
score-0 cause). Centralizes the condition + the lazy ``has_invisible_target``
import so the three call sites cannot drift. ``--force`` short-circuits the
detection entirely.
"""
if force:
return False
from remove_ai_watermarks.identify import has_invisible_target
return not has_invisible_target(image_path)
def _read_bgr_and_alpha(path: Path) -> tuple[NDArray[Any] | None, NDArray[Any] | None]:
"""Read an image preserving its alpha channel separately.
@@ -697,6 +756,7 @@ def cmd_erase(
@_auto_option
@_adaptive_polish_option
@_tile_options
@_force_option
@click.pass_context
def cmd_invisible(
ctx: click.Context,
@@ -721,6 +781,7 @@ def cmd_invisible(
tile: bool,
tile_size: int,
tile_overlap: int,
force: bool,
) -> None:
"""Remove invisible AI watermarks (SynthID, StableSignature, TreeRing).
@@ -745,6 +806,13 @@ def cmd_invisible(
device_str = None if device == "auto" else device
# Gate BEFORE building the engine: skip the destructive regeneration when no
# invisible AI watermark is locally detectable (it would only degrade a clean
# image -- dominant paid score-0 cause), so the common skip path pays nothing for
# engine construction. A skip never claims the image is clean; --force overrides.
if _should_skip_invisible_scrub(force, source):
_no_invisible_signal_exit(source)
def progress_cb(msg: str) -> None:
console.print(f" {msg}")
@@ -960,6 +1028,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
@_auto_option
@_adaptive_polish_option
@_tile_options
@_force_option
@click.pass_context
def cmd_all(
ctx: click.Context,
@@ -986,6 +1055,7 @@ def cmd_all(
tile: bool,
tile_size: int,
tile_overlap: int,
force: bool,
) -> None:
"""Remove ALL watermarks: visible + invisible + metadata.
@@ -1054,6 +1124,18 @@ def cmd_all(
" Warning: Skipped - GPU dependencies not installed.\n"
" Install them with: pip install 'remove-ai-watermarks[gpu]'"
)
elif _should_skip_invisible_scrub(force, source):
# No locally-detectable invisible watermark -> skip the destructive
# regeneration (it would only degrade the image). The visible-removed
# pixels in tmp_path are kept and step 3 still strips metadata, so this
# is a SUCCESS (exit 0), unlike the GPU-missing skip above. Read the
# pristine `source`, not tmp_path whose C2PA the visible pass already
# dropped. Not a clean-image guarantee; --force overrides.
console.print(
" Skipped (no invisible AI watermark detected; pixels left intact).\n"
" Not a clean-image guarantee: a pixel SynthID is undetectable once its\n"
" metadata proxy is gone. Re-run with --force to scrub regardless."
)
else:
from remove_ai_watermarks.invisible_engine import InvisibleEngine
@@ -1173,6 +1255,7 @@ def _process_batch_image(
tile: bool = False,
tile_size: int = 1024,
tile_overlap: int = 128,
force: bool = False,
) -> None:
"""Process a single image for batch mode.
@@ -1203,7 +1286,11 @@ def _process_batch_image(
is_available as invisible_available,
)
if invisible_available():
# Skip the destructive regeneration when no invisible watermark is locally
# detectable (would only degrade a clean image). Read the pristine `img_path`;
# `out_path` may already be the visible-processed result. --force overrides.
skip_no_signal = _should_skip_invisible_scrub(force, img_path)
if invisible_available() and not skip_no_signal:
from remove_ai_watermarks.invisible_engine import InvisibleEngine
# Cache the engine in ctx.obj so the batch builds it once (pipeline is a
@@ -1238,6 +1325,13 @@ def _process_batch_image(
# visible-processed `out_path` whose C2PA is already gone.
vendor=vendor_for_strength(img_path),
)
elif skip_no_signal and mode == "invisible" and not out_path.exists():
# No invisible target and the visible/all pass did not write out_path
# (invisible mode): copy the input through so the output dir is complete
# with the pixels deliberately left intact.
src_bgr, src_alpha = _read_bgr_and_alpha(img_path)
if src_bgr is not None:
_write_bgr_with_alpha(out_path, src_bgr, src_alpha)
if mode in ("metadata", "all"):
from remove_ai_watermarks.metadata import remove_ai_metadata
@@ -1294,6 +1388,7 @@ def _process_batch_image(
@_auto_option
@_adaptive_polish_option
@_tile_options
@_force_option
@click.pass_context
def cmd_batch(
ctx: click.Context,
@@ -1320,6 +1415,7 @@ def cmd_batch(
tile: bool,
tile_size: int,
tile_overlap: int,
force: bool,
) -> None:
"""Process all images in a directory."""
_banner()
@@ -1383,6 +1479,7 @@ def cmd_batch(
tile=tile,
tile_size=tile_size,
tile_overlap=tile_overlap,
force=force,
)
processed += 1
+37
View File
@@ -144,6 +144,14 @@ class ProvenanceReport:
# None -- no C2PA AI source-type (verdict, if AI, came from another
# signal: IPTC, AIGC, local gen params, xAI, ...).
ai_source_kind: str | None = None
# True when the AI verdict rests on a metadata or embedded-invisible signal
# (C2PA AI issuer / SynthID proxy, IPTC, AIGC, local gen params, EXIF/xAI, or
# an open DWT-DCT / TrustMark decode) -- as opposed to a visible mark or a
# weak medium-confidence hint (hf-job, Samsung genAIType). It is exactly the
# set of signals an invisible/diffusion scrub targets: a visible-only or
# no-signal image has it False. Equivalent to ``confidence == "high"``;
# surfaced as a field so callers gate on intent, not on the string.
ai_from_metadata: bool = False
watermarks: list[str] = field(default_factory=list[str])
signals: list[Signal] = field(default_factory=list["Signal"])
caveats: list[str] = field(default_factory=list[str])
@@ -758,8 +766,37 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
# Only meaningful when the AI verdict actually came from the C2PA source
# type; a non-C2PA AI signal (IPTC/AIGC/local gen) leaves it None.
ai_source_kind=c2pa_source_kind if (is_ai and has_c2pa) else None,
ai_from_metadata=ai_from_metadata,
watermarks=watermarks,
signals=signals,
caveats=caveats,
integrity_clashes=clashes,
)
def has_invisible_target(image_path: Path) -> bool:
"""True when a locally-detectable invisible/metadata AI signal is present.
The decision gate for the diffusion scrub (``invisible`` / ``all`` / ``batch``):
regenerating pixels removes an invisible watermark (SynthID, open DWT-DCT,
TrustMark) but degrades a real photo, so it must not run when there is nothing
to remove. Runs :func:`identify` with ``check_visible=False`` -- a visible mark
is handled by the separate visible pass and is NOT a diffusion target -- and
``check_invisible=True`` so an open watermark counts. Returns
``report.ai_from_metadata`` (C2PA AI issuer / SynthID proxy, IPTC, AIGC, local
gen params, EXIF/xAI, open DWT-DCT / TrustMark).
IMPORTANT -- this cannot prove a pixel SynthID is absent: SynthID is detectable
only through its C2PA proxy, so a metadata-stripped AI image reads as no signal
here. A False therefore means "no locally-detectable invisible target", not
"clean". Callers must NOT present a skip as a finished clean result.
Fail-safe: any error resolves to True so the removal still runs -- leaving a
watermark on a paid removal is worse than over-regenerating a clean image.
"""
try:
report = identify(image_path, check_visible=False, check_invisible=True)
except Exception: # unreadable / detector error -> do not skip the removal
log.debug("has_invisible_target: identify failed, defaulting to run", exc_info=True)
return True
return report.ai_from_metadata
+104 -13
View File
@@ -290,7 +290,7 @@ class TestInvisibleCommand:
):
result = runner.invoke(
main,
["invisible", str(sample_png), "-o", str(output)],
["invisible", str(sample_png), "-o", str(output), "--force"],
)
assert result.exit_code == 0, result.output
assert output.exists()
@@ -303,7 +303,7 @@ class TestInvisibleCommand:
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png)])
result = runner.invoke(main, ["invisible", str(sample_png), "--force"])
assert result.exit_code == 0, result.output
expected = sample_png.with_stem(sample_png.stem + "_clean")
assert expected.exists()
@@ -315,7 +315,7 @@ class TestInvisibleCommand:
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png)])
result = runner.invoke(main, ["invisible", str(sample_png), "--force"])
assert result.exit_code == 0, result.output
# adaptive_polish is ON by default (self-gating, so a no-op where not needed).
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
@@ -330,7 +330,7 @@ class TestInvisibleCommand:
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish"])
result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish", "--force"])
assert result.exit_code == 0, result.output
assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is False
@@ -343,7 +343,7 @@ class TestInvisibleCommand:
):
result = runner.invoke(
main,
["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5"],
["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5", "--force"],
)
assert result.exit_code == 0, result.output
assert mock_cls.call_args.kwargs["model_id"] == "org/custom-sdxl"
@@ -356,7 +356,7 @@ class TestInvisibleCommand:
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default"])
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default", "--force"])
assert result.exit_code == 0, result.output
# The legacy value warns and is normalized to "sdxl" before the engine is built.
assert "deprecated" in result.output.lower()
@@ -369,7 +369,7 @@ class TestInvisibleCommand:
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl"])
result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl", "--force"])
assert result.exit_code == 0, result.output
assert "deprecated" not in result.output.lower()
assert mock_cls.call_args.kwargs["pipeline"] == "sdxl"
@@ -378,6 +378,57 @@ class TestInvisibleCommand:
result = runner.invoke(main, ["invisible", "/nonexistent/file.png"])
assert result.exit_code != 0
def test_invisible_no_signal_skips_and_exits_two(self, runner, sample_png, tmp_path):
"""P0#5: when no invisible AI watermark is locally detectable, the diffusion
scrub must NOT run (it would only degrade a clean image). Mirrors the visible
no-mark contract: write no output, exit 2, and DO NOT imply the image is
clean (a stripped SynthID proxy is not proof of absence)."""
mock_cls, mock_engine = _mock_invisible_engine()
output = tmp_path / "clean.png"
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "-o", str(output)])
assert result.exit_code == 2, result.output
assert not output.exists()
mock_engine.remove_watermark.assert_not_called()
assert "--force" in result.output
assert "SynthID" in result.output # the message must preserve removal uncertainty
def test_invisible_force_runs_scrub_on_no_signal(self, runner, sample_png, tmp_path):
"""--force overrides the no-signal skip: the scrub runs regardless."""
mock_cls, mock_engine = _mock_invisible_engine()
output = tmp_path / "clean.png"
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(sample_png), "-o", str(output), "--force"])
assert result.exit_code == 0, result.output
mock_engine.remove_watermark.assert_called_once()
def test_invisible_runs_without_force_when_signal_present(self, runner, tmp_path):
"""An image carrying an AI metadata signal IS a scrub target, so the run
proceeds with no --force needed."""
img = Image.fromarray(np.random.default_rng(0).integers(0, 255, (200, 200, 3), dtype=np.uint8))
pnginfo = PngInfo()
pnginfo.add_text("parameters", "Steps: 20, Sampler: Euler, a test landscape")
src = tmp_path / "ai.png"
img.save(src, pnginfo=pnginfo)
output = tmp_path / "clean.png"
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
):
result = runner.invoke(main, ["invisible", str(src), "-o", str(output)])
assert result.exit_code == 0, result.output
mock_engine.remove_watermark.assert_called_once()
class TestAllCommand:
"""Tests for the 'all' subcommand (full pipeline)."""
@@ -397,7 +448,7 @@ class TestAllCommand:
):
result = runner.invoke(
main,
["all", str(sample_png), "-o", str(output)],
["all", str(sample_png), "-o", str(output), "--force"],
)
assert result.exit_code == 0, result.output
assert output.exists()
@@ -418,10 +469,28 @@ class TestAllCommand:
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
patch("remove_ai_watermarks.watermark_registry.best_auto_mark", return_value=None) as mock_best,
):
result = runner.invoke(main, ["all", str(sample_png), "-o", str(output)])
result = runner.invoke(main, ["all", str(sample_png), "-o", str(output), "--force"])
assert result.exit_code == 0, result.output
mock_best.assert_called() # the registry auto-detector drove the visible pass
def test_all_skips_invisible_on_no_signal_but_succeeds(self, runner, sample_png, tmp_path):
"""P0#5: with no detectable invisible watermark and no --force, `all` skips
the destructive step 2 (pixels left intact) but STILL succeeds (exit 0) --
visible removal + metadata strip ran and a file is written. Distinct from the
GPU-missing skip, which is a non-zero failure."""
mock_cls, mock_engine = _mock_invisible_engine()
output = tmp_path / "clean.png"
with (
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
):
result = runner.invoke(main, ["all", str(sample_png), "-o", str(output)])
assert result.exit_code == 0, result.output
assert output.exists()
mock_engine.remove_watermark.assert_not_called()
assert "Skipped (no invisible" in result.output
def test_all_loud_warning_and_nonzero_exit_when_gpu_missing(self, runner, sample_png, tmp_path):
"""Regression (#14/#47): when the GPU extra is absent the invisible step is
skipped, but the output still looks processed -- the run must fail loudly
@@ -453,7 +522,7 @@ class TestAllCommand:
patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
):
result = runner.invoke(main, ["all", str(src), "-o", str(output)])
result = runner.invoke(main, ["all", str(src), "-o", str(output), "--force"])
assert result.exit_code == 0, result.output
out = cv2.imread(str(output), cv2.IMREAD_UNCHANGED)
@@ -580,6 +649,26 @@ class TestBatchCommand:
input_dir = _make_batch_dir(tmp_path)
output_dir = tmp_path / "output"
mock_cls, _mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
):
result = runner.invoke(
main,
["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--force"],
)
assert result.exit_code == 0, result.output
assert "3 processed" in result.output
def test_batch_invisible_skips_no_signal_and_copies_through(self, runner, tmp_path):
"""P0#5: batch invisible mode skips the scrub on signal-less images (no
--force) and copies the input through, so the output dir is complete with the
pixels left intact and the engine never called."""
input_dir = _make_batch_dir(tmp_path)
output_dir = tmp_path / "output"
mock_cls, mock_engine = _mock_invisible_engine()
with (
patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
@@ -592,6 +681,8 @@ class TestBatchCommand:
)
assert result.exit_code == 0, result.output
assert "3 processed" in result.output
assert len(list(output_dir.glob("*.png"))) == 3 # inputs copied through
mock_engine.remove_watermark.assert_not_called()
def test_batch_all_mode(self, runner, tmp_path):
input_dir = _make_batch_dir(tmp_path)
@@ -605,7 +696,7 @@ class TestBatchCommand:
):
result = runner.invoke(
main,
["batch", str(input_dir), "-o", str(output_dir), "--mode", "all"],
["batch", str(input_dir), "-o", str(output_dir), "--mode", "all", "--force"],
)
assert result.exit_code == 0, result.output
assert "3 processed" in result.output
@@ -631,7 +722,7 @@ class TestBatchCommand:
):
result = runner.invoke(
main,
["batch", str(input_dir), "-o", str(output_dir), "--mode", "all"],
["batch", str(input_dir), "-o", str(output_dir), "--mode", "all", "--force"],
)
assert result.exit_code == 0, result.output
@@ -655,7 +746,7 @@ class TestBatchCommand:
):
result = runner.invoke(
main,
["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto"],
["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto", "--force"],
)
assert result.exit_code == 0, result.output
assert "2 processed" in result.output
+38
View File
@@ -22,6 +22,7 @@ from remove_ai_watermarks.identify import (
_integrity_clashes,
_issuers_in,
_vendor_of,
has_invisible_target,
identify,
)
from remove_ai_watermarks.watermark_registry import GEMINI_SPARKLE_TRUST_CONF
@@ -292,6 +293,19 @@ class TestIdentifyRealSamples:
assert r.confidence == "none"
assert r.watermarks == []
def test_has_invisible_target_true_on_metadata_ai(self):
# The scrub gate: a C2PA/SynthID image and an IPTC "Made with AI" image are
# both invisible/metadata targets, so the diffusion scrub should run.
assert has_invisible_target(SAMPLES_DIR / "chatgpt-1.png") is True
assert has_invisible_target(SAMPLES_DIR / "mj-1.png") is True
# ai_from_metadata mirrors confidence == "high" and backs the helper.
assert identify(SAMPLES_DIR / "chatgpt-1.png", check_visible=False).ai_from_metadata is True
def test_has_invisible_target_false_on_clean_photo(self, clean_photo: Path):
# No detectable invisible signal -> skip the scrub (do not degrade a clean image).
assert has_invisible_target(clean_photo) is False
assert identify(clean_photo, check_visible=False).ai_from_metadata is False
def test_strip_caveat_always_present(self, clean_photo: Path):
r = identify(clean_photo, check_visible=False)
assert any("not proof" in c for c in r.caveats)
@@ -300,6 +314,30 @@ class TestIdentifyRealSamples:
assert isinstance(identify(SAMPLES_DIR / "firefly-1.png", check_visible=False), ProvenanceReport)
class TestHasInvisibleTargetFailSafe:
"""The scrub gate fails SAFE: when a detector errors, it runs the removal."""
def test_detector_error_defaults_to_run(self, tmp_path: Path):
# If identify raises (a detector crash), the gate must return True so the
# caller still attempts removal -- leaving a watermark on a paid removal is
# worse than over-regenerating. (Garbage bytes do NOT raise; identify returns
# a clean None verdict there, so that path correctly skips -- see below.)
bad = tmp_path / "x.png"
bad.write_bytes(b"not image bytes")
with patch("remove_ai_watermarks.identify.identify", side_effect=RuntimeError("boom")):
assert has_invisible_target(bad) is True
def test_unreadable_bytes_are_not_a_target(self, tmp_path: Path):
# No raise, no signal -> not a scrub target (the CLI rejects undecodable
# images earlier anyway; this only documents the gate's own verdict).
bad = tmp_path / "x.png"
bad.write_bytes(b"not image bytes")
assert has_invisible_target(bad) is False
def test_local_ai_params_are_a_target(self, tmp_png_with_ai_metadata: Path):
assert has_invisible_target(tmp_png_with_ai_metadata) is True
# ── Local diffusion parameters (Stable Diffusion / ComfyUI) ─────────