feat(invisible): skip the diffusion scrub when no invisible watermark is detectable (P0#5)

Regenerating pixels removes SynthID / open watermarks but degrades a real photo, so running it on a clean image is the dominant paid score-0 cause on no-watermark uploads. Gate invisible/all/batch on identify.has_invisible_target: when no invisible AI signal is locally detectable and --force is unset, skip the regeneration. Per-command semantics: - invisible: write no output, exit EXIT_NO_INVISIBLE_SIGNAL (2) - all: skip step 2 but keep visible-removed pixels + strip metadata, exit 0 - batch: skip the scrub; copy the input through in invisible mode A skip never claims the image is clean (a pixel SynthID is undetectable once its metadata proxy is gone); the message says so and routes to --force. The gate fails safe (a detector error runs the removal). has_invisible_target wraps identify(check_visible=False, check_invisible=True) and returns the new ProvenanceReport.ai_from_metadata field (the confidence==high union), so the raiw.cc worker can reuse the same gate. Gate placed before engine construction so the skip path is cheap; shared via cli._should_skip_invisible_scrub. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-04 15:37:49 +02:00 · 2026-06-22 11:36:54 -07:00
parent 5a612adfef
commit 19f9ab0947
8 changed files with 290 additions and 17 deletions
@@ -17,14 +17,14 @@ Consequences for contributors (do not drift back into the stock niche just becau

 ## How to run

- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
+- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **No-signal skip (P0#5):** step 2 also runs the same `has_invisible_target` gate (see `invisible` below) — when no invisible watermark is detectable and `--force` is not set, step 2 is skipped and the pixels are left intact, but unlike the GPU-missing skip this is a **SUCCESS (exit 0)**: the visible pass + metadata strip still ran and a file is written (the message says so without claiming the image is clean). Distinct exit semantics by design: GPU-missing = couldn't do the work (non-zero); no-signal = nothing to do (zero). Regression-guarded by `test_all_skips_invisible_on_no_signal_but_succeeds`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
+- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). **No-signal skip (P0#5, roadmap):** before the diffusion runs, the command checks `identify.has_invisible_target(source)` (the `ProvenanceReport.ai_from_metadata` union: C2PA AI-issuer / SynthID proxy, IPTC, AIGC, local gen params, EXIF/xAI, open DWT-DCT / TrustMark — visible marks do NOT count, they are a separate pass). When nothing is locally detectable it does NOT regenerate (that would only degrade a clean image — the dominant paid score-0 cause on no-watermark uploads): it writes NO output, prints guidance that does NOT claim the image is clean (a pixel SynthID is undetectable once its metadata proxy is gone), and exits **`EXIT_NO_INVISIBLE_SIGNAL` (2)** — same value/role as the visible `EXIT_NO_VISIBLE_MARK`. `--force/--no-force` (**default skip = ON**) runs the scrub regardless. The check fails SAFE (a detector exception → run, since leaving a watermark on a paid removal is worse than over-regenerating). Helpers `cli._no_invisible_signal_exit` + `identify.has_invisible_target`; regression-guarded by `tests/test_cli.py::TestInvisibleCommand::{test_invisible_no_signal_skips_and_exits_two,test_invisible_force_runs_scrub_on_no_signal,test_invisible_runs_without_force_when_signal_present}` and `tests/test_identify.py::TestHasInvisibleTargetFailSafe`. **Test trap:** any `invisible`/`all`/`batch` test that exercises the diffusion path on a signal-LESS fixture (e.g. the synthetic `sample_png`) MUST pass `--force`, or the new gate skips step 2 (so `mock_engine.remove_watermark` is never called / `invisible` exits 2).
 - `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
 - `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
 - `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
 - `uv run remove-ai-watermarks metadata <image.png> --check` — inspect AI metadata (C2PA, EXIF, PNG chunks)
 - `uv run remove-ai-watermarks metadata <image.png> --remove -o <out.png>` — strip all AI metadata
- `uv run remove-ai-watermarks batch <directory>` — process every supported image in a directory (output defaults to `<directory>_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the **full `invisible` knob set above** (`--strength`/`--steps`/`--guidance-scale`/`--pipeline`/`--controlnet-scale`/`--model`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token`/`--humanize`/`--unsharp`/`--adaptive-polish`/`--tile`/`--tile-size`/`--tile-overlap`), plus `--inpaint/--no-inpaint` for the visible pass. `--adaptive-polish` is ON by default; `--auto` is deprecated and a no-op that only warns. One engine cached per pipeline; the polish is resolved once before the loop.
+- `uv run remove-ai-watermarks batch <directory>` — process every supported image in a directory (output defaults to `<directory>_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the **full `invisible` knob set above** (`--strength`/`--steps`/`--guidance-scale`/`--pipeline`/`--controlnet-scale`/`--model`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token`/`--humanize`/`--unsharp`/`--adaptive-polish`/`--tile`/`--tile-size`/`--tile-overlap`/`--force`), plus `--inpaint/--no-inpaint` for the visible pass. `--adaptive-polish` is ON by default; `--auto` is deprecated and a no-op that only warns. **No-signal skip (P0#5):** in invisible/all mode each image runs the same `has_invisible_target` gate — a signal-less image is skipped (no diffusion); in `invisible` mode the input is copied through to the output dir so it stays complete, in `all` mode the visible-removed result is kept and metadata is still stripped. `--force` scrubs every image regardless. One engine cached per pipeline; the polish is resolved once before the loop.

 ## Test and lint

@@ -365,6 +365,12 @@ remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0 --unsharp 0
 # --adaptive-polish (ON by default) restores the input's detail level (sparing
 # text) to counter the over-smoothed look; it self-limits to a no-op where
 # there is no detail deficit. Disable with --no-adaptive-polish.
+# By default, if no invisible AI watermark is locally detectable, the diffusion
+# scrub is SKIPPED (regenerating pixels would only degrade a clean image): for
+# `invisible` that writes no output and exits 2, for `all` it skips step 2 but
+# still strips metadata and exits 0. A skip never claims the image is clean
+# (a pixel SynthID is undetectable once its metadata is gone). Pass --force to
+# regenerate regardless when you know the image is AI-generated.

 # Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels)
 # --check also flags SynthID-bearing sources: a C2PA manifest signed by
@@ -66,6 +66,8 @@ There is no reliable *local* detector of the SynthID *pixel* watermark — Googl

 **This explains the recurring "oracle says clean but `identify` still flags SynthID" report (#14):** the oracle reads the *pixel* watermark (gone after our SDXL pass), while `identify` reads the *C2PA-metadata proxy* (still present if the manifest survived). Different signals, not a contradiction -- strip the metadata too (`metadata --remove` / `all`) and the proxy goes quiet, but a quiet proxy is not proof the pixel watermark is gone.

+**Consequence for the P0#5 no-signal skip (`has_invisible_target`, 2026-06-22):** `invisible`/`all`/`batch` skip the diffusion scrub by default when no invisible AI signal is *locally* detectable, to avoid degrading a clean image (`--force` overrides). Because SynthID detection is metadata-only, a real AI image whose C2PA was **already stripped** (e.g. a re-encoded download, or the API/playground surfaces above that never emit C2PA) reads as no-signal and is therefore **skipped** — leaving its pixel SynthID in place. This is the deliberate trade: the skip's message never claims the image is clean, and the user re-runs with `--force` when they know it is AI. The blind spot is the same metadata-only ceiling, not a new bug; the visible-sparkle path (`check_visible`) still catches the no-C2PA Gemini-playground case for the *visible* mark, but not the invisible one.
+
 **SynthID is durable to JPEG re-encode by design, so a GitHub-recompressed issue attachment is still a valid SynthID test subject** (verified 2026-06-01 on issue #14's pic3: the GitHub-served JPEG survived re-encoding and openai.com/verify still detected SynthID). Do NOT dismiss issue-attachment JPEGs as "not faithful originals" when reproducing a SynthID-survival report: the recompression strips the **C2PA metadata** (so `identify` reads Unknown on the attachment) but NOT the **pixel watermark** that openai.com/verify reads. A true byte-original only matters for the metadata/C2PA path, not for the pixel-SynthID-removal test. (Contrast the open imwatermark above, which IS fragile to JPEG.) The spectral phase-coherence approach from `github.com/aloshdenny/reverse-SynthID` was evaluated (May 2026) and **does not work for real-content detection**: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation **0.92**, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24, `scripts/synthid_pixel_probe.py`, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate *content* (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content.

 **Correction (deeper re-examination 2026-05-25):** the carrier IS real on solid fills — the earlier "no carrier" was a *method* artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed *phase* at specific low frequencies, so the right metric is **per-bin phase coherence**. On 8 white `gemini-2.5-flash-image` fills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers `(0,±7..±12,±20..±23)` = **0.86** vs **0.31** random; single-image leave-one-out phase-match **+0.83** vs real photos **-0.24**. (Black `2.5-flash` fills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.)
@@ -55,6 +55,8 @@ module.

 **High-precision by design:** only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC `AISystemUsed`, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are **excluded** (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved `platform` (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce **zero** clashes (false-positive guard in `test_identify.py::TestRealSamplesHaveNoClash`).

+**`ai_from_metadata` field + `has_invisible_target` helper (P0#5, 2026-06-22):** the high-confidence union (everything that sets `confidence == "high"`: C2PA AI-issuer / SynthID proxy, IPTC, AIGC, local gen params, EXIF/xAI, open DWT-DCT / TrustMark — the medium-confidence `hf_only`/`visible_only`/`samsung_only` are excluded) is now surfaced as the public `ProvenanceReport.ai_from_metadata` boolean, so callers gate on intent rather than on the `confidence` string. `has_invisible_target(path)` wraps `identify(path, check_visible=False, check_invisible=True)` and returns that field — it is the decision gate for the diffusion scrub (the CLI `invisible`/`all`/`batch` no-signal skip, `cli._no_invisible_signal_exit`): a visible-only or no-signal image has it False, so regeneration (which would only degrade a clean image) does not run. It fails SAFE — any detector exception returns True so the removal still runs (leaving a watermark on a paid removal is worse than over-regenerating). It does NOT prove a pixel SynthID is absent (SynthID is detectable only via its metadata proxy, gone once stripped), so a False means "no locally-detectable target", never "clean". Guarded by `test_identify.py::{TestIdentifyRealSamples::test_has_invisible_target_*,TestHasInvisibleTargetFailSafe}`.
+
 ## `watermark_registry.py`

 `watermark_registry.py` — **single catalog of known visible watermarks**, the unified "find known marks in their usual places, recognize, remove" entry.
@@ -281,6 +281,16 @@ _strength_option = click.option(
    default=None,
    help=f"Denoising strength (0.0-1.0). Default: {strength_default_help()}.",
 )
+_force_option = click.option(
+    "--force/--no-force",
+    default=False,
+    help=(
+        "Run the diffusion scrub even when no invisible AI watermark is locally "
+        "detectable. Default: skip it (regeneration only degrades a clean image; a "
+        "skip never claims the image is watermark-free -- a pixel SynthID is "
+        "undetectable once its metadata proxy is gone)."
+    ),
+)


 def _resolve_auto_polish(auto: bool, adaptive_polish: bool) -> bool:
@@ -388,6 +398,55 @@ def _no_visible_mark_exit(source: Path) -> NoReturn:
    raise SystemExit(EXIT_NO_VISIBLE_MARK)


+# Same value as EXIT_NO_VISIBLE_MARK (2): a distinct-from-success / distinct-from-
+# error code that tells a wrapping service (raiw.cc) "the diffusion scrub was skipped
+# because no invisible watermark was locally detectable", so it can surface the
+# message instead of charging for and serving an unchanged image as done.
+EXIT_NO_INVISIBLE_SIGNAL = 2
+
+
+def _no_invisible_signal_exit(source: Path) -> NoReturn:
+    """Explain why the diffusion scrub was skipped, then exit non-zero.
+
+    The ``invisible`` command regenerates pixels to remove SynthID / open
+    watermarks; that regeneration also degrades a real photo. When
+    :func:`identify` finds no locally-detectable invisible AI signal, running it
+    anyway would damage a clean image for nothing -- the dominant paid score-0
+    cause on no-watermark uploads. So skip it, but do NOT imply the image is
+    clean: a pixel SynthID is undetectable here once its metadata proxy is gone.
+    Write no output and exit :data:`EXIT_NO_INVISIBLE_SIGNAL`; ``--force`` runs
+    the scrub regardless.
+    """
+    console.print(
+        "  No invisible AI watermark detected (no C2PA/SynthID proxy, no open\n"
+        "  watermark). Skipped the diffusion scrub -- regenerating the pixels would\n"
+        "  only degrade the image with nothing to remove, so no output was written.\n"
+        "  This does NOT prove the image is clean: a pixel watermark such as SynthID\n"
+        "  cannot be detected here once its metadata proxy is absent (it may have\n"
+        "  been stripped earlier). If you know the image is AI-generated and want the\n"
+        "  pixels regenerated regardless, re-run with --force:\n"
+        f"    remove-ai-watermarks invisible {source.name} --force"
+    )
+    raise SystemExit(EXIT_NO_INVISIBLE_SIGNAL)
+
+
+def _should_skip_invisible_scrub(force: bool, image_path: Path) -> bool:
+    """True when the diffusion scrub should be skipped for *image_path*.
+
+    The shared no-signal gate for ``invisible`` / ``all`` / ``batch``: skip when
+    ``--force`` is not set AND no invisible AI watermark is locally detectable
+    (regenerating pixels would only degrade a clean image -- the dominant paid
+    score-0 cause). Centralizes the condition + the lazy ``has_invisible_target``
+    import so the three call sites cannot drift. ``--force`` short-circuits the
+    detection entirely.
+    """
+    if force:
+        return False
+    from remove_ai_watermarks.identify import has_invisible_target
+
+    return not has_invisible_target(image_path)
+
+
 def _read_bgr_and_alpha(path: Path) -> tuple[NDArray[Any] | None, NDArray[Any] | None]:
    """Read an image preserving its alpha channel separately.

@@ -697,6 +756,7 @@ def cmd_erase(
@_auto_option
@_adaptive_polish_option
@_tile_options
+@_force_option
@click.pass_context
 def cmd_invisible(
    ctx: click.Context,
@@ -721,6 +781,7 @@ def cmd_invisible(
    tile: bool,
    tile_size: int,
    tile_overlap: int,
+    force: bool,
 ) -> None:
    """Remove invisible AI watermarks (SynthID, StableSignature, TreeRing).

@@ -745,6 +806,13 @@ def cmd_invisible(

    device_str = None if device == "auto" else device

+    # Gate BEFORE building the engine: skip the destructive regeneration when no
+    # invisible AI watermark is locally detectable (it would only degrade a clean
+    # image -- dominant paid score-0 cause), so the common skip path pays nothing for
+    # engine construction. A skip never claims the image is clean; --force overrides.
+    if _should_skip_invisible_scrub(force, source):
+        _no_invisible_signal_exit(source)
+
    def progress_cb(msg: str) -> None:
        console.print(f"  {msg}")

@@ -960,6 +1028,7 @@ def cmd_identify(ctx: click.Context, source: Path, no_visible: bool, as_json: bo
@_auto_option
@_adaptive_polish_option
@_tile_options
+@_force_option
@click.pass_context
 def cmd_all(
    ctx: click.Context,
@@ -986,6 +1055,7 @@ def cmd_all(
    tile: bool,
    tile_size: int,
    tile_overlap: int,
+    force: bool,
 ) -> None:
    """Remove ALL watermarks: visible + invisible + metadata.

@@ -1054,6 +1124,18 @@ def cmd_all(
                "    Warning: Skipped - GPU dependencies not installed.\n"
                "    Install them with: pip install 'remove-ai-watermarks[gpu]'"
            )
+        elif _should_skip_invisible_scrub(force, source):
+            # No locally-detectable invisible watermark -> skip the destructive
+            # regeneration (it would only degrade the image). The visible-removed
+            # pixels in tmp_path are kept and step 3 still strips metadata, so this
+            # is a SUCCESS (exit 0), unlike the GPU-missing skip above. Read the
+            # pristine `source`, not tmp_path whose C2PA the visible pass already
+            # dropped. Not a clean-image guarantee; --force overrides.
+            console.print(
+                "    Skipped (no invisible AI watermark detected; pixels left intact).\n"
+                "    Not a clean-image guarantee: a pixel SynthID is undetectable once its\n"
+                "    metadata proxy is gone. Re-run with --force to scrub regardless."
+            )
        else:
            from remove_ai_watermarks.invisible_engine import InvisibleEngine

@@ -1173,6 +1255,7 @@ def _process_batch_image(
    tile: bool = False,
    tile_size: int = 1024,
    tile_overlap: int = 128,
+    force: bool = False,
 ) -> None:
    """Process a single image for batch mode.

@@ -1203,7 +1286,11 @@ def _process_batch_image(
            is_available as invisible_available,
        )

-        if invisible_available():
+        # Skip the destructive regeneration when no invisible watermark is locally
+        # detectable (would only degrade a clean image). Read the pristine `img_path`;
+        # `out_path` may already be the visible-processed result. --force overrides.
+        skip_no_signal = _should_skip_invisible_scrub(force, img_path)
+        if invisible_available() and not skip_no_signal:
            from remove_ai_watermarks.invisible_engine import InvisibleEngine

            # Cache the engine in ctx.obj so the batch builds it once (pipeline is a
@@ -1238,6 +1325,13 @@ def _process_batch_image(
                # visible-processed `out_path` whose C2PA is already gone.
                vendor=vendor_for_strength(img_path),
            )
+        elif skip_no_signal and mode == "invisible" and not out_path.exists():
+            # No invisible target and the visible/all pass did not write out_path
+            # (invisible mode): copy the input through so the output dir is complete
+            # with the pixels deliberately left intact.
+            src_bgr, src_alpha = _read_bgr_and_alpha(img_path)
+            if src_bgr is not None:
+                _write_bgr_with_alpha(out_path, src_bgr, src_alpha)

    if mode in ("metadata", "all"):
        from remove_ai_watermarks.metadata import remove_ai_metadata
@@ -1294,6 +1388,7 @@ def _process_batch_image(
@_auto_option
@_adaptive_polish_option
@_tile_options
+@_force_option
@click.pass_context
 def cmd_batch(
    ctx: click.Context,
@@ -1320,6 +1415,7 @@ def cmd_batch(
    tile: bool,
    tile_size: int,
    tile_overlap: int,
+    force: bool,
 ) -> None:
    """Process all images in a directory."""
    _banner()
@@ -1383,6 +1479,7 @@ def cmd_batch(
                    tile=tile,
                    tile_size=tile_size,
                    tile_overlap=tile_overlap,
+                    force=force,
                )
                processed += 1

@@ -144,6 +144,14 @@ class ProvenanceReport:
    #   None        -- no C2PA AI source-type (verdict, if AI, came from another
    #                  signal: IPTC, AIGC, local gen params, xAI, ...).
    ai_source_kind: str | None = None
+    # True when the AI verdict rests on a metadata or embedded-invisible signal
+    # (C2PA AI issuer / SynthID proxy, IPTC, AIGC, local gen params, EXIF/xAI, or
+    # an open DWT-DCT / TrustMark decode) -- as opposed to a visible mark or a
+    # weak medium-confidence hint (hf-job, Samsung genAIType). It is exactly the
+    # set of signals an invisible/diffusion scrub targets: a visible-only or
+    # no-signal image has it False. Equivalent to ``confidence == "high"``;
+    # surfaced as a field so callers gate on intent, not on the string.
+    ai_from_metadata: bool = False
    watermarks: list[str] = field(default_factory=list[str])
    signals: list[Signal] = field(default_factory=list["Signal"])
    caveats: list[str] = field(default_factory=list[str])
@@ -758,8 +766,37 @@ def identify(image_path: Path, *, check_visible: bool = True, check_invisible: b
        # Only meaningful when the AI verdict actually came from the C2PA source
        # type; a non-C2PA AI signal (IPTC/AIGC/local gen) leaves it None.
        ai_source_kind=c2pa_source_kind if (is_ai and has_c2pa) else None,
+        ai_from_metadata=ai_from_metadata,
        watermarks=watermarks,
        signals=signals,
        caveats=caveats,
        integrity_clashes=clashes,
    )
+
+
+def has_invisible_target(image_path: Path) -> bool:
+    """True when a locally-detectable invisible/metadata AI signal is present.
+
+    The decision gate for the diffusion scrub (``invisible`` / ``all`` / ``batch``):
+    regenerating pixels removes an invisible watermark (SynthID, open DWT-DCT,
+    TrustMark) but degrades a real photo, so it must not run when there is nothing
+    to remove. Runs :func:`identify` with ``check_visible=False`` -- a visible mark
+    is handled by the separate visible pass and is NOT a diffusion target -- and
+    ``check_invisible=True`` so an open watermark counts. Returns
+    ``report.ai_from_metadata`` (C2PA AI issuer / SynthID proxy, IPTC, AIGC, local
+    gen params, EXIF/xAI, open DWT-DCT / TrustMark).
+
+    IMPORTANT -- this cannot prove a pixel SynthID is absent: SynthID is detectable
+    only through its C2PA proxy, so a metadata-stripped AI image reads as no signal
+    here. A False therefore means "no locally-detectable invisible target", not
+    "clean". Callers must NOT present a skip as a finished clean result.
+
+    Fail-safe: any error resolves to True so the removal still runs -- leaving a
+    watermark on a paid removal is worse than over-regenerating a clean image.
+    """
+    try:
+        report = identify(image_path, check_visible=False, check_invisible=True)
+    except Exception:  # unreadable / detector error -> do not skip the removal
+        log.debug("has_invisible_target: identify failed, defaulting to run", exc_info=True)
+        return True
+    return report.ai_from_metadata
@@ -290,7 +290,7 @@ class TestInvisibleCommand:
        ):
            result = runner.invoke(
                main,
-                ["invisible", str(sample_png), "-o", str(output)],
+                ["invisible", str(sample_png), "-o", str(output), "--force"],
            )
        assert result.exit_code == 0, result.output
        assert output.exists()
@@ -303,7 +303,7 @@ class TestInvisibleCommand:
            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
        ):
-            result = runner.invoke(main, ["invisible", str(sample_png)])
+            result = runner.invoke(main, ["invisible", str(sample_png), "--force"])
        assert result.exit_code == 0, result.output
        expected = sample_png.with_stem(sample_png.stem + "_clean")
        assert expected.exists()
@@ -315,7 +315,7 @@ class TestInvisibleCommand:
            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
        ):
-            result = runner.invoke(main, ["invisible", str(sample_png)])
+            result = runner.invoke(main, ["invisible", str(sample_png), "--force"])
        assert result.exit_code == 0, result.output
        # adaptive_polish is ON by default (self-gating, so a no-op where not needed).
        assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is True
@@ -330,7 +330,7 @@ class TestInvisibleCommand:
            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
        ):
-            result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish"])
+            result = runner.invoke(main, ["invisible", str(sample_png), "--no-adaptive-polish", "--force"])
        assert result.exit_code == 0, result.output
        assert mock_engine.remove_watermark.call_args.kwargs["adaptive_polish"] is False

@@ -343,7 +343,7 @@ class TestInvisibleCommand:
        ):
            result = runner.invoke(
                main,
-                ["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5"],
+                ["invisible", str(sample_png), "--model", "org/custom-sdxl", "--guidance-scale", "5.5", "--force"],
            )
        assert result.exit_code == 0, result.output
        assert mock_cls.call_args.kwargs["model_id"] == "org/custom-sdxl"
@@ -356,7 +356,7 @@ class TestInvisibleCommand:
            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
        ):
-            result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default"])
+            result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "default", "--force"])
        assert result.exit_code == 0, result.output
        # The legacy value warns and is normalized to "sdxl" before the engine is built.
        assert "deprecated" in result.output.lower()
@@ -369,7 +369,7 @@ class TestInvisibleCommand:
            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
        ):
-            result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl"])
+            result = runner.invoke(main, ["invisible", str(sample_png), "--pipeline", "sdxl", "--force"])
        assert result.exit_code == 0, result.output
        assert "deprecated" not in result.output.lower()
        assert mock_cls.call_args.kwargs["pipeline"] == "sdxl"
@@ -378,6 +378,57 @@ class TestInvisibleCommand:
        result = runner.invoke(main, ["invisible", "/nonexistent/file.png"])
        assert result.exit_code != 0

+    def test_invisible_no_signal_skips_and_exits_two(self, runner, sample_png, tmp_path):
+        """P0#5: when no invisible AI watermark is locally detectable, the diffusion
+        scrub must NOT run (it would only degrade a clean image). Mirrors the visible
+        no-mark contract: write no output, exit 2, and DO NOT imply the image is
+        clean (a stripped SynthID proxy is not proof of absence)."""
+        mock_cls, mock_engine = _mock_invisible_engine()
+        output = tmp_path / "clean.png"
+        with (
+            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
+            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
+            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
+        ):
+            result = runner.invoke(main, ["invisible", str(sample_png), "-o", str(output)])
+        assert result.exit_code == 2, result.output
+        assert not output.exists()
+        mock_engine.remove_watermark.assert_not_called()
+        assert "--force" in result.output
+        assert "SynthID" in result.output  # the message must preserve removal uncertainty
+
+    def test_invisible_force_runs_scrub_on_no_signal(self, runner, sample_png, tmp_path):
+        """--force overrides the no-signal skip: the scrub runs regardless."""
+        mock_cls, mock_engine = _mock_invisible_engine()
+        output = tmp_path / "clean.png"
+        with (
+            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
+            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
+            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
+        ):
+            result = runner.invoke(main, ["invisible", str(sample_png), "-o", str(output), "--force"])
+        assert result.exit_code == 0, result.output
+        mock_engine.remove_watermark.assert_called_once()
+
+    def test_invisible_runs_without_force_when_signal_present(self, runner, tmp_path):
+        """An image carrying an AI metadata signal IS a scrub target, so the run
+        proceeds with no --force needed."""
+        img = Image.fromarray(np.random.default_rng(0).integers(0, 255, (200, 200, 3), dtype=np.uint8))
+        pnginfo = PngInfo()
+        pnginfo.add_text("parameters", "Steps: 20, Sampler: Euler, a test landscape")
+        src = tmp_path / "ai.png"
+        img.save(src, pnginfo=pnginfo)
+        output = tmp_path / "clean.png"
+        mock_cls, mock_engine = _mock_invisible_engine()
+        with (
+            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
+            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
+            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
+        ):
+            result = runner.invoke(main, ["invisible", str(src), "-o", str(output)])
+        assert result.exit_code == 0, result.output
+        mock_engine.remove_watermark.assert_called_once()
+

 class TestAllCommand:
    """Tests for the 'all' subcommand (full pipeline)."""
@@ -397,7 +448,7 @@ class TestAllCommand:
        ):
            result = runner.invoke(
                main,
-                ["all", str(sample_png), "-o", str(output)],
+                ["all", str(sample_png), "-o", str(output), "--force"],
            )
        assert result.exit_code == 0, result.output
        assert output.exists()
@@ -418,10 +469,28 @@ class TestAllCommand:
            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
            patch("remove_ai_watermarks.watermark_registry.best_auto_mark", return_value=None) as mock_best,
        ):
-            result = runner.invoke(main, ["all", str(sample_png), "-o", str(output)])
+            result = runner.invoke(main, ["all", str(sample_png), "-o", str(output), "--force"])
        assert result.exit_code == 0, result.output
        mock_best.assert_called()  # the registry auto-detector drove the visible pass

+    def test_all_skips_invisible_on_no_signal_but_succeeds(self, runner, sample_png, tmp_path):
+        """P0#5: with no detectable invisible watermark and no --force, `all` skips
+        the destructive step 2 (pixels left intact) but STILL succeeds (exit 0) --
+        visible removal + metadata strip ran and a file is written. Distinct from the
+        GPU-missing skip, which is a non-zero failure."""
+        mock_cls, mock_engine = _mock_invisible_engine()
+        output = tmp_path / "clean.png"
+        with (
+            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
+            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
+            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
+        ):
+            result = runner.invoke(main, ["all", str(sample_png), "-o", str(output)])
+        assert result.exit_code == 0, result.output
+        assert output.exists()
+        mock_engine.remove_watermark.assert_not_called()
+        assert "Skipped (no invisible" in result.output
+
    def test_all_loud_warning_and_nonzero_exit_when_gpu_missing(self, runner, sample_png, tmp_path):
        """Regression (#14/#47): when the GPU extra is absent the invisible step is
        skipped, but the output still looks processed -- the run must fail loudly
@@ -453,7 +522,7 @@ class TestAllCommand:
            patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
        ):
-            result = runner.invoke(main, ["all", str(src), "-o", str(output)])
+            result = runner.invoke(main, ["all", str(src), "-o", str(output), "--force"])

        assert result.exit_code == 0, result.output
        out = cv2.imread(str(output), cv2.IMREAD_UNCHANGED)
@@ -580,6 +649,26 @@ class TestBatchCommand:
        input_dir = _make_batch_dir(tmp_path)
        output_dir = tmp_path / "output"
        mock_cls, _mock_engine = _mock_invisible_engine()
+        with (
+            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
+            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
+            patch("remove_ai_watermarks.cli.invisible_available", return_value=True, create=True),
+            patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True),
+        ):
+            result = runner.invoke(
+                main,
+                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--force"],
+            )
+        assert result.exit_code == 0, result.output
+        assert "3 processed" in result.output
+
+    def test_batch_invisible_skips_no_signal_and_copies_through(self, runner, tmp_path):
+        """P0#5: batch invisible mode skips the scrub on signal-less images (no
+        --force) and copies the input through, so the output dir is complete with the
+        pixels left intact and the engine never called."""
+        input_dir = _make_batch_dir(tmp_path)
+        output_dir = tmp_path / "output"
+        mock_cls, mock_engine = _mock_invisible_engine()
        with (
            patch("remove_ai_watermarks.cli.InvisibleEngine", mock_cls, create=True),
            patch("remove_ai_watermarks.invisible_engine.InvisibleEngine", mock_cls),
@@ -592,6 +681,8 @@ class TestBatchCommand:
            )
        assert result.exit_code == 0, result.output
        assert "3 processed" in result.output
+        assert len(list(output_dir.glob("*.png"))) == 3  # inputs copied through
+        mock_engine.remove_watermark.assert_not_called()

    def test_batch_all_mode(self, runner, tmp_path):
        input_dir = _make_batch_dir(tmp_path)
@@ -605,7 +696,7 @@ class TestBatchCommand:
        ):
            result = runner.invoke(
                main,
-                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "all"],
+                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "all", "--force"],
            )
        assert result.exit_code == 0, result.output
        assert "3 processed" in result.output
@@ -631,7 +722,7 @@ class TestBatchCommand:
        ):
            result = runner.invoke(
                main,
-                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "all"],
+                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "all", "--force"],
            )
        assert result.exit_code == 0, result.output

@@ -655,7 +746,7 @@ class TestBatchCommand:
        ):
            result = runner.invoke(
                main,
-                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto"],
+                ["batch", str(input_dir), "-o", str(output_dir), "--mode", "invisible", "--auto", "--force"],
            )
        assert result.exit_code == 0, result.output
        assert "2 processed" in result.output
@@ -22,6 +22,7 @@ from remove_ai_watermarks.identify import (
    _integrity_clashes,
    _issuers_in,
    _vendor_of,
+    has_invisible_target,
    identify,
 )
 from remove_ai_watermarks.watermark_registry import GEMINI_SPARKLE_TRUST_CONF
@@ -292,6 +293,19 @@ class TestIdentifyRealSamples:
        assert r.confidence == "none"
        assert r.watermarks == []

+    def test_has_invisible_target_true_on_metadata_ai(self):
+        # The scrub gate: a C2PA/SynthID image and an IPTC "Made with AI" image are
+        # both invisible/metadata targets, so the diffusion scrub should run.
+        assert has_invisible_target(SAMPLES_DIR / "chatgpt-1.png") is True
+        assert has_invisible_target(SAMPLES_DIR / "mj-1.png") is True
+        # ai_from_metadata mirrors confidence == "high" and backs the helper.
+        assert identify(SAMPLES_DIR / "chatgpt-1.png", check_visible=False).ai_from_metadata is True
+
+    def test_has_invisible_target_false_on_clean_photo(self, clean_photo: Path):
+        # No detectable invisible signal -> skip the scrub (do not degrade a clean image).
+        assert has_invisible_target(clean_photo) is False
+        assert identify(clean_photo, check_visible=False).ai_from_metadata is False
+
    def test_strip_caveat_always_present(self, clean_photo: Path):
        r = identify(clean_photo, check_visible=False)
        assert any("not proof" in c for c in r.caveats)
@@ -300,6 +314,30 @@ class TestIdentifyRealSamples:
        assert isinstance(identify(SAMPLES_DIR / "firefly-1.png", check_visible=False), ProvenanceReport)


+class TestHasInvisibleTargetFailSafe:
+    """The scrub gate fails SAFE: when a detector errors, it runs the removal."""
+
+    def test_detector_error_defaults_to_run(self, tmp_path: Path):
+        # If identify raises (a detector crash), the gate must return True so the
+        # caller still attempts removal -- leaving a watermark on a paid removal is
+        # worse than over-regenerating. (Garbage bytes do NOT raise; identify returns
+        # a clean None verdict there, so that path correctly skips -- see below.)
+        bad = tmp_path / "x.png"
+        bad.write_bytes(b"not image bytes")
+        with patch("remove_ai_watermarks.identify.identify", side_effect=RuntimeError("boom")):
+            assert has_invisible_target(bad) is True
+
+    def test_unreadable_bytes_are_not_a_target(self, tmp_path: Path):
+        # No raise, no signal -> not a scrub target (the CLI rejects undecodable
+        # images earlier anyway; this only documents the gate's own verdict).
+        bad = tmp_path / "x.png"
+        bad.write_bytes(b"not image bytes")
+        assert has_invisible_target(bad) is False
+
+    def test_local_ai_params_are_a_target(self, tmp_png_with_ai_metadata: Path):
+        assert has_invisible_target(tmp_png_with_ai_metadata) is True
+
+
 # ── Local diffusion parameters (Stable Diffusion / ComfyUI) ─────────