diff --git a/CLAUDE.md b/CLAUDE.md index 339d2c2..38e95bb 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,11 +2,24 @@ You are a **principal Python engineer** maintaining a CLI tool and library for removing visible and invisible AI watermarks from images. +## Scope and non-goals + +The mission is removing **AI-provenance watermarks** that a platform stamps onto content the user generated themselves — SynthID, the Gemini / Nano Banana sparkle, the Doubao / Jimeng / Samsung visible AI labels, the Chinese TC260 "由…AI生成" label, and C2PA / IPTC / EXIF "Made with AI" metadata. The point is user autonomy over their own generated output. + +It deliberately does **not** remove watermarks that protect someone else's paid or copyrighted content — stock-agency overlays (Shutterstock, Getty, iStock, Adobe Stock), classifieds-site marks, or any tiled / diagonal "preview" watermark whose job is to gate a purchase. Stripping those makes a paid resource free off someone else's work; out of scope **by principle, not by technical difficulty**. The line: a visible mark is in scope when it labels the user's **own** AI generation, and out of scope when it protects a **third party's paid asset**. + +Consequences for contributors (do not drift back into the stock niche just because it is technically feasible): +- Do not add stock / agency / classifieds watermark removal to `watermark_registry.py` or the eraser, and do not build tiled-overlay or multi-image watermark-estimation features aimed at them. +- `erase --region` stays a generic **user-driven** tool (the user points at their own object); do not ship an *automatic* stock-watermark detector/remover on top of it. +- New visible-mark templates are for **AI-generation labels only**. + +(Established 2026-06-13 by user instruction: "Я пытаюсь сделать платные ресурсы бесплатными — это не то, против чего мы боремся.") + ## How to run - `uv run remove-ai-watermarks all -o ` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True. - `uv run remove-ai-watermarks invisible -o ` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet` (default `controlnet`), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), and `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default). -- `uv run remove-ai-watermarks visible -o ` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise to `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark ` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0). +- `uv run remove-ai-watermarks visible -o ` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark ` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0). - `uv run remove-ai-watermarks erase --region x,y,w,h -o ` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable. - `uv run remove-ai-watermarks identify ` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector - `uv run remove-ai-watermarks metadata --check` — inspect AI metadata (C2PA, EXIF, PNG chunks) diff --git a/README.md b/README.md index 2a14436..d0e7970 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,12 @@ If this tool saves you time, consider [sponsoring its development](https://githu > **Intended for lawful use only.** Publishing and running this software is lawful; responsibility for any downstream use, and for compliance with local law, rests entirely with the user. Some jurisdictions restrict removing an AI label as such (see [Legal](#legal)). The authors do not condone use for deception, fraud, or any unlawful activity. +## Scope + +This tool removes **AI-provenance watermarks** that a platform stamps onto content **you generated yourself** — SynthID, the Gemini / Nano Banana sparkle, the Doubao / Jimeng / Samsung visible AI labels, the Chinese TC260 "由…AI生成" label, and C2PA / IPTC / EXIF "Made with AI" metadata. The point is your autonomy over your own output. + +It does **not** target watermarks that protect someone else's paid or copyrighted content — stock-agency overlays (Shutterstock, Getty, iStock, Adobe Stock), classifieds-site marks, or any tiled "preview" watermark whose job is to gate a purchase. Removing those is out of scope by design. `erase` is a generic, user-driven region tool for your own objects, not an automatic stock-watermark remover. + ## Features - **Visible watermark removal** — a registry of known marks in their usual places: the Gemini / Nano Banana sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-left, locale-specific). Each is removed by **reverse-alpha blending** against a captured alpha map (`original = (wm − α·logo)/(1−α)`), recovering the true pixels rather than inpainting a guess. The Gemini sparkle recovers cleanly on its own on bright backgrounds; it adapts the alpha to each image's sparkle opacity, so a more-opaque-than-captured sparkle is still fully removed (and on a dark background, where the fixed alpha would over-subtract and leave a dark spot, it automatically inpaints the small sparkle footprint instead); the Doubao, Jimeng, and Samsung text marks re-rasterize slightly per image, so a thin residual inpaint over the glyph footprint clears the leftover edges (the alpha maps are reproducibly rebuilt from controlled captures by `scripts/visible_alpha_solve.py`). Fast, offline, no GPU. `visible --mark auto` finds and removes the strongest detected mark. (For arbitrary logos/objects, see `erase`.) diff --git a/src/remove_ai_watermarks/cli.py b/src/remove_ai_watermarks/cli.py index f90b5c7..68a1e7b 100644 --- a/src/remove_ai_watermarks/cli.py +++ b/src/remove_ai_watermarks/cli.py @@ -349,8 +349,12 @@ def _no_visible_mark_exit(source: Path) -> NoReturn: ) else: console.print( - " No AI provenance signal found either. If there is a logo or object to remove,\n" - " target it directly with the region eraser:\n" + " No visible mark and no readable AI provenance signal. This does not prove\n" + " the image is clean: an invisible pixel watermark such as SynthID cannot be\n" + " detected here once the metadata proxy is absent (it may have been stripped\n" + " earlier). If the image is AI-generated, regenerate the pixels with:\n" + f" remove-ai-watermarks all {source.name}\n" + " If instead there is a logo or object to remove, target it with the region eraser:\n" f" remove-ai-watermarks erase {source.name} --region x,y,w,h" ) raise SystemExit(EXIT_NO_VISIBLE_MARK) diff --git a/tests/test_cli.py b/tests/test_cli.py index 9da3f6f..40db2ff 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -135,6 +135,11 @@ class TestVisibleCommand: assert result.exit_code == 2, result.output assert not output.exists() assert "erase" in result.output + # The "no signal" branch must NOT imply the image is clean: a missing + # metadata proxy is not proof an invisible pixel watermark (SynthID) is + # absent, so the message preserves that uncertainty and routes to 'all'. + assert "SynthID" in result.output + assert "all" in result.output def test_visible_auto_no_mark_routes_to_all_when_metadata(self, runner, tmp_path): # An image whose only signal is an invisible/metadata watermark (here SD