fix(cli): preserve SynthID uncertainty in no-visible-mark message

The 'no signal' branch of the visible no-mark path claimed 'No AI provenance signal found either', which reads as 'the image is clean'. A missing metadata proxy is not proof an invisible pixel watermark (SynthID) is absent: it cannot be detected once metadata is gone and may have been stripped upstream. The message now preserves that uncertainty and routes to both 'all' (regenerate pixels) and 'erase'. Regression-guarded by the SynthID/all asserts in test_cli.py. CLAUDE.md visible-command note updated to match. Also adds a 'Scope and non-goals' section (CLAUDE.md + README): removing AI-provenance marks on the user's own content is in scope; stripping stock/paid-content watermarks (Shutterstock/Getty/iStock, classifieds) is out of scope by principle, not by difficulty. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 07:57:50 +02:00 · 2026-06-13 19:30:49 -07:00
parent d8cdc9f478
commit 41a2af2ecb
4 changed files with 31 additions and 3 deletions
@@ -2,11 +2,24 @@

 You are a **principal Python engineer** maintaining a CLI tool and library for removing visible and invisible AI watermarks from images.

+## Scope and non-goals
+
+The mission is removing **AI-provenance watermarks** that a platform stamps onto content the user generated themselves — SynthID, the Gemini / Nano Banana sparkle, the Doubao / Jimeng / Samsung visible AI labels, the Chinese TC260 "由…AI生成" label, and C2PA / IPTC / EXIF "Made with AI" metadata. The point is user autonomy over their own generated output.
+
+It deliberately does **not** remove watermarks that protect someone else's paid or copyrighted content — stock-agency overlays (Shutterstock, Getty, iStock, Adobe Stock), classifieds-site marks, or any tiled / diagonal "preview" watermark whose job is to gate a purchase. Stripping those makes a paid resource free off someone else's work; out of scope **by principle, not by technical difficulty**. The line: a visible mark is in scope when it labels the user's **own** AI generation, and out of scope when it protects a **third party's paid asset**.
+
+Consequences for contributors (do not drift back into the stock niche just because it is technically feasible):
+- Do not add stock / agency / classifieds watermark removal to `watermark_registry.py` or the eraser, and do not build tiled-overlay or multi-image watermark-estimation features aimed at them.
+- `erase --region` stays a generic **user-driven** tool (the user points at their own object); do not ship an *automatic* stock-watermark detector/remover on top of it.
+- New visible-mark templates are for **AI-generation labels only**.
+
+(Established 2026-06-13 by user instruction: "Я пытаюсь сделать платные ресурсы бесплатными — это не то, против чего мы боремся.")
+
 ## How to run

 - `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
 - `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), and `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise to `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
+- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
 - `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
 - `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
 - `uv run remove-ai-watermarks metadata <image.png> --check` — inspect AI metadata (C2PA, EXIF, PNG chunks)
@@ -19,6 +19,12 @@ If this tool saves you time, consider [sponsoring its development](https://githu

 > **Intended for lawful use only.** Publishing and running this software is lawful; responsibility for any downstream use, and for compliance with local law, rests entirely with the user. Some jurisdictions restrict removing an AI label as such (see [Legal](#legal)). The authors do not condone use for deception, fraud, or any unlawful activity.

+## Scope
+
+This tool removes **AI-provenance watermarks** that a platform stamps onto content **you generated yourself** — SynthID, the Gemini / Nano Banana sparkle, the Doubao / Jimeng / Samsung visible AI labels, the Chinese TC260 "由…AI生成" label, and C2PA / IPTC / EXIF "Made with AI" metadata. The point is your autonomy over your own output.
+
+It does **not** target watermarks that protect someone else's paid or copyrighted content — stock-agency overlays (Shutterstock, Getty, iStock, Adobe Stock), classifieds-site marks, or any tiled "preview" watermark whose job is to gate a purchase. Removing those is out of scope by design. `erase` is a generic, user-driven region tool for your own objects, not an automatic stock-watermark remover.
+
 ## Features

 - **Visible watermark removal** — a registry of known marks in their usual places: the Gemini / Nano Banana sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-left, locale-specific). Each is removed by **reverse-alpha blending** against a captured alpha map (`original = (wm − α·logo)/(1−α)`), recovering the true pixels rather than inpainting a guess. The Gemini sparkle recovers cleanly on its own on bright backgrounds; it adapts the alpha to each image's sparkle opacity, so a more-opaque-than-captured sparkle is still fully removed (and on a dark background, where the fixed alpha would over-subtract and leave a dark spot, it automatically inpaints the small sparkle footprint instead); the Doubao, Jimeng, and Samsung text marks re-rasterize slightly per image, so a thin residual inpaint over the glyph footprint clears the leftover edges (the alpha maps are reproducibly rebuilt from controlled captures by `scripts/visible_alpha_solve.py`). Fast, offline, no GPU. `visible --mark auto` finds and removes the strongest detected mark. (For arbitrary logos/objects, see `erase`.)
@@ -349,8 +349,12 @@ def _no_visible_mark_exit(source: Path) -> NoReturn:
        )
    else:
        console.print(
-            "  No AI provenance signal found either. If there is a logo or object to remove,\n"
-            "  target it directly with the region eraser:\n"
+            "  No visible mark and no readable AI provenance signal. This does not prove\n"
+            "  the image is clean: an invisible pixel watermark such as SynthID cannot be\n"
+            "  detected here once the metadata proxy is absent (it may have been stripped\n"
+            "  earlier). If the image is AI-generated, regenerate the pixels with:\n"
+            f"    remove-ai-watermarks all {source.name}\n"
+            "  If instead there is a logo or object to remove, target it with the region eraser:\n"
            f"    remove-ai-watermarks erase {source.name} --region x,y,w,h"
        )
    raise SystemExit(EXIT_NO_VISIBLE_MARK)
@@ -135,6 +135,11 @@ class TestVisibleCommand:
        assert result.exit_code == 2, result.output
        assert not output.exists()
        assert "erase" in result.output
+        # The "no signal" branch must NOT imply the image is clean: a missing
+        # metadata proxy is not proof an invisible pixel watermark (SynthID) is
+        # absent, so the message preserves that uncertainty and routes to 'all'.
+        assert "SynthID" in result.output
+        assert "all" in result.output

    def test_visible_auto_no_mark_routes_to_all_when_metadata(self, runner, tmp_path):
        # An image whose only signal is an invisible/metadata watermark (here SD