# Remove-AI-Watermarks

You are a **principal Python engineer** maintaining a CLI tool and library for removing visible and invisible AI watermarks from images.

## Scope and non-goals

The mission is removing **AI-provenance watermarks** that a platform stamps onto content the user generated themselves — SynthID, the Gemini / Nano Banana sparkle, the Doubao / Jimeng / Samsung visible AI labels, the Chinese TC260 "由…AI生成" label, and C2PA / IPTC / EXIF "Made with AI" metadata. The point is user autonomy over their own generated output.

It deliberately does **not** remove watermarks that protect someone else's paid or copyrighted content — stock-agency overlays (Shutterstock, Getty, iStock, Adobe Stock), classifieds-site marks, or any tiled / diagonal "preview" watermark whose job is to gate a purchase. Stripping those makes a paid resource free off someone else's work; out of scope **by principle, not by technical difficulty**. The line: a visible mark is in scope when it labels the user's **own** AI generation, and out of scope when it protects a **third party's paid asset**.

Consequences for contributors (do not drift back into the stock niche just because it is technically feasible):
- Do not add stock / agency / classifieds watermark removal to `watermark_registry.py` or the eraser, and do not build tiled-overlay or multi-image watermark-estimation features aimed at them.
- `erase --region` stays a generic **user-driven** tool (the user points at their own object); do not ship an *automatic* stock-watermark detector/remover on top of it.
- New visible-mark templates are for **AI-generation labels only**.

(Established 2026-06-13 by user instruction: "Я пытаюсь сделать платные ресурсы бесплатными — это не то, против чего мы боремся.")

## How to run

- `uv run remove-ai-watermarks all <image.png> -o <output.png>` — full pipeline (visible + invisible + metadata). Same diffusion knobs as `invisible` below, plus the visible-pass `--inpaint/--no-inpaint`/`--inpaint-method`. **When the `[gpu]` extra is absent, step 2 (invisible/SynthID) is skipped** — `all` still writes an output (visible mark + metadata stripped) but prints a prominent end-of-run banner ("the invisible (SynthID) watermark was NOT removed") AND exits **non-zero** (1), so a skipped SynthID pass is not mistaken for a clean result (the recurring #14/#47 trap, where the old quiet inline warning was missed). `invisible` already hard-errors without the extra; only `all` continued, hence the loud end-banner. Regression-guarded by `tests/test_cli.py::TestAllCommand::test_all_loud_warning_and_nonzero_exit_when_gpu_missing`. **Test trap:** any `all` test that exercises the full pipeline MUST `patch("remove_ai_watermarks.invisible_engine.is_available", return_value=True)` — CI installs core+dev only (no `[gpu]`), so an unpatched `all` test takes the skip branch and now hits the non-zero exit. This passed locally (gpu present → `is_available()` True) but red-failed every matrix cell on the v0.11.0 commit (`test_all_basic`/`test_all_visible_step_uses_registry` asserted exit 0); both now patch `is_available` True.
- `uv run remove-ai-watermarks invisible <image.png> -o <out.png>` — diffusion SynthID removal. **Full knob set** (kept identical across `invisible`/`all`/`batch`): `--strength` (vendor-adaptive default), `--steps`, `--guidance-scale` (CFG, default 7.5), `--pipeline sdxl|controlnet|qwen` (default `controlnet`; `qwen` is a manual opt-in only — see the qwen note in the module map), `--controlnet-scale`, `--model` (HF model id, default SDXL base), `--device`, `--seed`, `--hf-token`, `--max-resolution`/`--min-resolution`, `--upscaler lanczos|esrgan`, `--humanize` (Analog Humanizer grain), `--unsharp` (final sharpen), `--adaptive-polish/--no-adaptive-polish` (**ON by default**; detail-targeted polish that self-gates to a no-op where there is no deficit), and `--tile/--no-tile` + `--tile-size`/`--tile-overlap` (**OFF by default**; sliding-window tiled diffusion -- the *lossless* alternative to a `--max-resolution` downscale for large inputs that OOM on MPS/GPU. Engages only when the long side exceeds `--tile-size`, default 1024; tiles are feather-blended over `--tile-overlap` px, default 128. Pair with `--max-resolution 0`). `--auto` is deprecated and now a no-op that only warns (the polish it used to enable is ON by default).
- `uv run remove-ai-watermarks visible <image.png> -o <out.png>` — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. `--mark auto` (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, the Jimeng "★ 即梦AI" wordmark, and the Samsung Galaxy AI "✦ Contenuti generati dall'AI" strip (bottom-LEFT, locale-specific — Italian variant calibrated); `--mark gemini` / `--mark doubao` / `--mark jimeng` / `--mark samsung` force one (choices come from the registry). Gemini/Doubao recover pixels exactly with no inpaint at native; **Jimeng and Samsung add an always-on thin residual inpaint over the glyph footprint** (their marks re-rasterize per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use `erase`. **When `--mark auto` finds no known mark (the common case — ~74% of real uploads carry no registered visible mark), the command does NOT silently re-serve the input as a finished result.** It runs a cheap metadata-only `identify`, prints actionable guidance (if the image carries an invisible/metadata mark, e.g. an OpenAI/Gemini C2PA image, it points to `all`; otherwise it does NOT imply the image is clean -- it warns that an invisible pixel watermark like SynthID cannot be detected once the metadata proxy is gone and routes to both `all` and `erase --region`), writes NO output file, and exits **`EXIT_NO_VISIBLE_MARK` (2)** — distinct from success (0) and a hard error (1) so a wrapping service (raiw.cc) can surface the message instead of treating the unchanged image as done (the production "it didn't work" / score-0 trap). Same handling for an explicit `--mark <name>` that is not detected. Helper `cli._no_visible_mark_exit`; regression-guarded by `tests/test_cli.py::TestVisibleCommand::test_visible_auto_no_mark_exits_two_with_eraser_hint` and `test_visible_auto_no_mark_routes_to_all_when_metadata`. `--no-detect` still forces the gemini fallback and proceeds (exit 0).
- `uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>` — universal region eraser (any logo/object, any position). `--backend cv2` (default, no deps) or `--backend lama` (big-LaMa via onnxruntime, extra `lama`); `--region` is repeatable.
- `uv run remove-ai-watermarks identify <image>` — provenance verdict (platform + watermark inventory + confidence); `--json` for machine output, `--no-visible` to skip the cv2 sparkle detector
- `uv run remove-ai-watermarks metadata <image.png> --check` — inspect AI metadata (C2PA, EXIF, PNG chunks)
- `uv run remove-ai-watermarks metadata <image.png> --remove -o <out.png>` — strip all AI metadata
- `uv run remove-ai-watermarks batch <directory>` — process every supported image in a directory (output defaults to `<directory>_clean/`, set with `-o`). `--mode visible|invisible|metadata|all` (default `visible`); the invisible/all path reuses the **full `invisible` knob set above** (`--strength`/`--steps`/`--guidance-scale`/`--pipeline`/`--controlnet-scale`/`--model`/`--device`/`--max-resolution`/`--min-resolution`/`--upscaler`/`--seed`/`--hf-token`/`--humanize`/`--unsharp`/`--adaptive-polish`/`--tile`/`--tile-size`/`--tile-overlap`), plus `--inpaint/--no-inpaint` for the visible pass. `--adaptive-polish` is ON by default; `--auto` is deprecated and a no-op that only warns. One engine cached per pipeline; the polish is resolved once before the loop.

## Test and lint

- **CI** (`.github/workflows/test.yml`): runs on push to `main` + every PR. A `lint` job (ubuntu: `ruff check` + `ruff format --check`) plus a `test` matrix (ubuntu/macos/windows x py3.10/3.12) that does `uv sync --frozen --extra dev` then `pytest`. The matrix installs only core + dev (no `gpu` extra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keep `uv.lock` valid (don't break `--frozen`) when editing `pyproject.toml`.
- **Release flow + distribution channels** (PyPI publish via `publish.yml`/`uv publish`, the automated Homebrew-tap + HF-Space bumps in `distribute.yml`, conda-forge, ComfyUI Registry, the sdist `data/` exclusion, hatchling pin history): see `docs/release-and-distribution.md` before cutting a release.
- `bash maintain.sh` — uv-outdated, uv-secure, ruff check/fix, ruff format, pyright (scoped `src/`, see the OOM note below), pytest -n auto. The helper tools live in the `dev` extra (`pytest-xdist`, plus `uv-outdated`/`uv-secure` marker-gated to py3.12+ so the py3.10 resolution stays solvable) — a bare env without `--extra dev` does not have them.
- **Strict pyright is clean across `src/` (0 errors).** The cv2/torch/diffusers boundary files (`gemini_engine`, `region_eraser`, `doubao_engine`, `humanizer`, `invisible_engine`, `noai/watermark_remover`) carry a documented per-file `# pyright:` relax pragma that turns off only the unknown-type / untyped-third-party rules — those libs ship no usable types, so strict typing there fights the ecosystem. Pure-logic files stay fully strict; `typings/piexif/__init__.pyi` is a local stub so `metadata.py`/`extractor.py` resolve piexif. Public ndarray-returning signatures on the relaxed engines are still annotated `NDArray[Any]` so strict consumers (`cli.py`) stay clean. When touching a relaxed file, prefer fixing real issues over widening the pragma; keep the pragma scoped to genuinely-untyped boundaries. (`uv-secure` is clean since idna was bumped 3.11 -> 3.16, fixing GHSA-65pc-fj4g-8rjx, and aiohttp 3.13.5 -> 3.14.0 via `uv lock --upgrade-package aiohttp`, fixing GHSA-hg6j-4rv6-33pg + GHSA-jg22-mg44-37j8. (The old basicsr Dependabot alert (GHSA-86w8-vhw6-q9qq) is resolved by removal: the experimental `restore` extra was retired and basicsr is no longer anywhere in the dependency tree.) The torch Dependabot alert **GHSA-rrmf-rvhw-rf47** (`torch.jit.script` memory corruption, vulnerable `<= 2.12.0`) is **dismissed as `not_used`** (2026-06-10): torch is a transitive dep of the optional `gpu` extra only, the codebase never calls `torch.jit` (grep-verified), and **no patched torch version exists** (`first_patched_version` is null), so it cannot be closed by an upgrade — do not re-triage it.
- **Full-project `uv run pyright` (no path) OOMs/crashes node on this ML-heavy repo** (emits a `libnode` stack frame, no summary) — a known environment limit, not a code error. Gate with `uv run --extra dev --extra gpu pyright src/` (completes, authoritative) or scope to changed files; also run `uv run ruff check` and `uv run pytest` directly.
- Run `uv run` from the repo root — from another cwd it falls back to a bare env without numpy/cv2/torch.
- **Stale `trustmark` remnant in site-packages after an extras change:** the `trustmark` package downloads model weights INTO its own package dir, so when a narrower `uv sync` prunes the package, a `trustmark/models/` directory survives as an empty namespace package. Symptom: pyright `"TrustMark" is unknown import symbol` on `trustmark_detector.py` and `find_spec("trustmark")` returning a loader-less spec (so `is_available()` lies True). Fix: `rm -rf .venv/lib/python3.12/site-packages/trustmark` (regenerable weights cache).
- To add a dev tool (pytest/ruff/pyright) into the env, use `uv sync --frozen --extra dev --extra gpu`, **never `uv pip install`** — `uv pip install` re-resolves and rewrites `uv.lock`, which silently bumped `transformers` to a build incompatible with the pinned `diffusers` (`cannot import name 'Qwen3VLForConditionalGeneration'`) and broke every `identify`/metadata import. Recovery: `git checkout uv.lock && uv sync --frozen --extra gpu --extra dev`. The `gpu` extra holds `diffusers`/`transformers`/`torch`, so a bare `uv sync` (no extras) removes them; `noai/__init__` is now **lazy** (PEP 562 `__getattr__`, so importing `identify`/`metadata` no longer pulls `watermark_remover`/torch), so a bare env breaks only when the removal pipeline is actually invoked, not on import. `maintain.sh`'s `uv sync --all-extras` also pulls the heavy `trustmark`/`lama` wheels (pytorch-lightning, onnxruntime) — fine on a good connection, but on flaky DNS sync only `--extra gpu --extra dev` and run the lint/test steps by hand.
- Metadata/C2PA tests assert against real committed fixtures in `data/samples/` (`chatgpt-*.png` = OpenAI C2PA, `firefly-1.png` = Adobe, `mj-*` = Midjourney IPTC, `doubao-1.png` = ByteDance Doubao with the China TC260 `<TC260:AIGC>` XMP label **and** a visible "豆包AI生成" text mark bottom-right; `grok-1.jpg` = xAI Grok with its EXIF-only `Signature:` blob + UUID `Artist` and no C2PA/SynthID/IPTC; `flux-1.png` / `flux-1.jpg` = real Black Forest Labs FLUX.2 Playground output, signed C2PA (issuer "Black Forest Labs" + `trainedAlgorithmicMedia`) -- `flux-1.jpg` is the first committed **JPEG-with-C2PA** fixture, exercising the c2pa-python non-PNG reader path end to end; whether BFL hosted output also embeds the open DWT-DCT pixel watermark is UNRESOLVED -- our detector returns None on these fox samples, but they are high-texture carriers where even a known-embedded watermark fails the round-trip, see the content-fragility caveat in `docs/watermarking-landscape.md`); synthetic byte blobs cover the remaining JPEG/ISOBMFF format paths. The "non-AI / clean photo" control is no longer in `data/samples/` -- the `clean_photo` conftest fixture serves a verified-negative image from the corpus `neg/` set (skips if the corpus is absent).
- SynthID reference corpus: `scripts/synthid_corpus.py` ingests labeled images into `data/synthid_corpus/`. The labeled `images/` (`pos/` `neg/` `cleaned/`) are **committed** (public repo -- review every image for private content before adding; `manifest.csv` is kept in sync with the files on disk, one row per tracked image); only the synthetic `refs/` calibration fills are gitignored. See its README for the collection protocol and verification oracles. **`cleaned/` examples must be produced by a CURRENT shipped removal method** -- the default SDXL img2img pass (optionally `--max-resolution`). Do NOT archive cleaned outputs from methods that are no longer in the pipeline (ctrlregen, the old text/face-protection, IP-Adapter FaceID, CodeFormer) or from the experimental opt-in paths (controlnet, face restore) as corpus examples; a cleaned reference should represent the canonical removal, and a removed method's output is not a reproducible example. Keep those experiment outputs in a local working dir, never in the committed corpus.

## Configuration

- GPU/ML modules (invisible_engine, watermark_remover) are optional — guard imports with `is_available()` checks
- Optional detection extras: `detect` (imwatermark — open SD/SDXL/FLUX watermark) and `trustmark` (Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded by `is_available()` and skipped by `identify` when absent.
- Optional `esrgan` extra (spandrel only): Real-ESRGAN pre-diffusion super-resolution for small inputs (`upscaler.py`, CLI `--upscaler esrgan` on `invisible`/`all`/`batch`). Guarded by `upscaler.is_available()`; the default upscaler stays Lanczos (cv2, no deps) and the engine falls back to Lanczos when the extra is absent or the model errors. spandrel is MIT and pulls NO basicsr (only torch/torchvision/safetensors/numpy/einops); Real-ESRGAN weights are BSD-3-Clause and download on first use via `torch.hub` (never bundled). Kept OUT of `all` (heavy + model download).
- Tests for the *model-running* paths are limited to availability checks (multi-GB downloads). But the **pure helpers inside ML-adjacent modules are unit-tested without any download** and must stay that way: `_target_size` (native-vs-downscale-cap-vs-upscale-floor, `test_invisible_engine.py`), `humanizer.unsharp_mask`/`adaptive_polish` (`test_humanizer.py`), and the MPS->CPU fallback control flow via mocked pipelines (`test_img2img_runner.py`, 100% cover). Don't skip these as "ML, needs a model" — only `remove_watermark`/the diffusion bodies do.

## Key modules

Compact map. The full per-module detail (design decisions, tuned thresholds, calibration history, incident records, and the regression-guard map) lives in `docs/module-internals.md` — **read the relevant section there before changing any module below.**

- `noai/c2pa.py` — C2PA reading. `extract_c2pa_info(path)` uses the official **c2pa-python `Reader`** first (core dep, any container; `read_manifest_store_json` returns the WHOLE store JSON — active + ingredient manifests — so an AI marker on a parent manifest is seen), and falls back to the hand-rolled caBX/CBOR parser (`has_c2pa_metadata` / `extract_c2pa_chunk` / `_extract_c2pa_info_png`) for synthetic/partial blobs the validator rejects or a broken/absent wheel. The registry scan (issuer / source-type / SynthID / soft-binding) is shared by both paths via `_populate_registry_fields`, so the return-dict shape is identical. Do not reimplement chunk parsing; chunk reads are clamped to the remaining file size by design. `extract_c2pa_chunk`/`inject_c2pa_chunk` stay PNG-only (raw caBX bytes, test/extractor use).
- `noai/constants.py` — the single `C2PA_AI_VENDORS` registry (+ `C2PA_SOFT_BINDINGS`) from which `C2PA_ISSUERS` / `SYNTHID_C2PA_ISSUERS` / `identify._ISSUER_PLATFORM` are all derived. Add a new vendor as one registry entry; never edit the derived dicts and never add inline.
- `metadata.py` — `scan_head(path)` is the shared (memoized) input for every C2PA/AIGC/IPTC byte scan; use it instead of `open().read(1MB)` for any new marker scan. Also home to `synthid_source`, `xai_signature`, `iptc_ai_system`, `aigc_label`, `huggingface_job`, `samsung_genai`, and `remove_ai_metadata` (fail-safe `strip_c2pa_boxes`).
- `identify.py` — aggregates every locally-readable signal into one `ProvenanceReport`; `is_ai_generated` is True or None, never asserted False. `ProvenanceReport.ai_source_kind` exposes the C2PA digital-source-type split — `"generated"` (trainedAlgorithmicMedia, fully AI) vs `"enhanced"` (compositeWithTrainedAlgorithmicMedia, a real photo with an AI-composited region), else None — so a caller branches full-frame scrub vs region-targeted clean (see `noai/tiling.feather_region_composite` + `WatermarkRemover.remove_watermark(region=...)`). The sparkle provenance threshold is the SHARED `watermark_registry.GEMINI_SPARKLE_TRUST_CONF` (imported, not a private copy) so the provenance "is there a sparkle" verdict and the removal "take the sparkle" decision can never drift. `import identify` is deliberately light (lazy `noai/__init__`, fits a 512 MB host) — keep heavy imports out (the `watermark_registry` constant import stays light: engines are lazy there). Add capture-camera tokens to `_DEVICE_C2PA_PLATFORM` only when verified against a real C2PA file; editing-app/AI-device signer tokens go to `_SIGNER_C2PA_PLATFORM`; generator/issuer platforms to `C2PA_AI_VENDORS` in `constants.py`. Integrity-clash detection is high-precision by design (only hard generator stamps feed it, source-grouped independence).
- `watermark_registry.py` — the single catalog of known visible watermarks (gemini / doubao / jimeng / samsung), reverse-alpha based by policy. Add a new visible text mark = one `_text_mark(...)` row + a `TextMarkConfig` with a captured alpha map; do not re-add per-mark `if` branches. `cli._write_bgr_with_alpha` must NOT zero alpha in the watermark bbox (issue #30 white-box regression).
- `gemini_engine.py` — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU): top-K size-weighted fusion candidate selection (`_SELECT_TOPK`), corner-promote, over/under-subtraction guards, false-positive gate, self-verify repair. Detection scores the top-K size-weighted matches by full fusion (spatial+gradient+variance) and keeps the highest — NOT the raw-NCC argmax, which re-admits the tiny-patch FPs the size weight suppresses (the osachub 2026-06-12 sub-0.85 corner-sparkle regression; see `docs/module-internals.md`). Keep the 0.85 corner-promote NCC gate; a margin/chroma-gated lower promote was measured and REJECTED 2026-06-11 (~33% FP on non-Google content). Gate any removal candidate on a physical brightness check, not the detector alone.
- `_text_mark_engine.py` — shared base for the three reverse-alpha text-mark engines (extracted 2026-06-09); the per-engine modules are config-only subclasses. New text mark = a `TextMarkConfig` + a thin subclass + one registry row. Gemini stays a separate engine (different model).
- `doubao_engine.py` / `jimeng_engine.py` / `samsung_engine.py` — thin `TextMarkEngine` subclasses: Doubao "豆包AI生成" (bottom-right), Jimeng "★ 即梦AI" (bottom-right), Samsung Galaxy AI "✦ Contenuti generati dall'AI" (bottom-LEFT, locale-specific — Italian variant calibrated). Removal = reverse-alpha (always-align) + thin residual inpaint, **with an over-subtraction guard ported from `gemini_engine` (2026-06-20)**: `_reverse_alpha_oversubtracts` predicts the reverse-alpha output PER PIXEL over the glyph body from the INPUT, and when the recovered body lands more than `_OVERSUB_DARK_MARGIN` (25) gray levels below the local ring it abandons the reverse-alpha pixels and inpaints the footprint from the original surroundings (`_inpaint_footprint`) — fixing the dark-pit ghost on dark/mid-tone backgrounds (roadmap P0#8). Predicting per-pixel from the input (not the produced output) keeps a clean full-strength mark byte-identical (no false trip). A detector-only removal test is insufficient — assert visual residual (the textured-shift tests + `tests/test_text_mark_oversubtraction.py`).
- `region_eraser.py` — universal region eraser (`erase` CLI): cv2 backend default (no deps), optional big-LaMa via onnxruntime (~3.5-4 GB peak RAM, ~5-6 s/call CPU — does not fit a minimal droplet).
- `invisible_watermark.py` — decodes the OPEN DWT-DCT watermarks (SD / SDXL / FLUX) via `imwatermark` (extra `detect`, pulls torch). Fragile two ways: (1) does not survive JPEG re-encode/resize; (2) **carrier-fragile on a broad class of pristine images** -- a clean encode->decode round-trip recovers 48/48 on chatgpt/firefly/random but FAILS (28-39/48, below the `_MATCH_48`=44 gate) on the FLUX fox, doubao, a flat FLUX generation, AND a clean synthetic flat fill with no watermark. The failure does NOT track texture; it goes with a degenerate **all-ones decode that is a CARRIER ARTIFACT, not a watermark** (synthetic clean image reproduces it). So `detect_invisible_watermark` is **positive-only**: trust a hit; a `None` is inconclusive unless a same-carrier positive-control embed first recovers >=44. Verified 2026-06-19; full caveat in `docs/watermarking-landscape.md`.
- `trustmark_detector.py` — Adobe TrustMark open decoder (extra `trustmark`). Do NOT remove the JPEG re-encode false-positive gate — a lone TrustMark hit without it is almost always content noise.
- `noai/watermark_remover.py` — `WatermarkRemover` with three diffusion pipelines selected by the explicit `pipeline` ctor arg, never inferred from `model_id`: `sdxl` (plain SDXL img2img), `controlnet` (SDXL + canny ControlNet, **the DEFAULT since 2026-06-09**), and `qwen` (Qwen-Image 20B MMDiT img2img, Apache-2.0, CUDA/cloud-class — best **text** preservation (incl. CJK); `_load_qwen_pipeline`/`_run_qwen`, bf16, no MPS fallback; call shape in the pure `_build_qwen_kwargs` using `true_cfg_scale`). Removal comes from the img2img `strength`; ControlNet only preserves text/face STRUCTURE — SynthID CAN survive controlnet on photoreal content at low strength. Qwen CERTIFIED oracle floors (2026-06-20): OpenAI **0.10** (seed-robust, clean on seeds 0-4), Gemini **0.25** (seed 0 verified, pin a seed — Gemini oracle rate-limits volume; higher than the controlnet Gemini floor 0.15). `resolve_strength(..., pipeline="qwen")` carries the Qwen ladder (`_QWEN_VENDOR_STRENGTH`), so `--pipeline qwen` gets the 0.25 Gemini floor automatically (the old manual `--strength 0.25` workaround is retired). `_build_qwen_kwargs` passes an explicit `height`/`width` from the input (floored to /16 via `_qwen_target_size`) — without it the pipeline defaults to a 1024x1024 SQUARE and silently squishes non-square inputs (fixed 2026-06-20). **`qwen` is a MANUAL opt-in only — there is NO auto-router.** Measured (`scripts/fidelity_metrics.py`, OCR-CER / ArcFace / LPIPS / Laplacian-var, NOT eyeball): qwen beats controlnet on ONE niche only — **clean body text on a plain background, no faces** (openai_1/2 CER 0.241 vs 0.385). controlnet wins FACES (it always has) AND **display/decorative text in a scene** (abba poster: controlnet CER 0.114 vs qwen 0.379 — canny holds letter shapes, qwen re-renders and garbles them). So a content `--pipeline auto` router and a faces+text **mixed dual-pass** were prototyped and **DROPPED** (2026-06-20): on the canonical faces+text case controlnet wins every metric incl. text, so mixed loses; and "text→qwen" can't be auto-decided (it is body-vs-display text that matters, undetectable cheaply). qwen stays for callers who KNOW their content is clean-text-heavy and face-free. No face-restore extra ships, by validated decision (every restore approach looked MORE AI-generated). `remove_watermark(region=(x,y,w,h), region_feather=...)` runs the regeneration but feather-composites only the AI box back over the original (via `noai/tiling.feather_region_composite`), preserving the real photo elsewhere — the **AI-enhanced composite** path (`identify` `ai_source_kind == "enhanced"`); the box is supplied by the caller (a C2PA composite manifest carries no reliable machine-readable region, so we do not fabricate one).
- `noai/tiling.py` — sliding-window tiled diffusion for large inputs (CLI `--tile`). `WatermarkRemover.remove_watermark` branches to `run_tiled` when `tile` is set AND the long side exceeds `tile_size`, refactoring the single-pass `_generate` into a per-tile `_generate_one` (the ControlNet edge map is rebuilt per tile inside it). Pure helpers `plan_tiles` (uniform-size tiles, last one flush to the edge) and `feather_weights` (strictly-positive separable taper -> partition-of-unity blend) are unit-tested without the model. Also home to `feather_region_composite(base, regenerated, box, *, feather)` — the pure region-targeted compositor for **AI-enhanced composites** (`ai_source_kind == "enhanced"`): blends the regenerated AI box back over the original with a feathered seam, leaving the real photo OUTSIDE the box pixel-exact. It backs `WatermarkRemover.remove_watermark(region=...)` (regenerate ONLY the AI region, not the whole frame); the no-model lossless region path stays `region_eraser.erase`. New tile/region-blend tuning goes in these pure helpers; do not inline blend math into the runner.
- `auto_config.py` + the content-detection layer were REMOVED 2026-06-09; `--auto` is a deprecated no-op (controlnet is the default pipeline and the adaptive polish is ON by default and self-gates to a no-op where there is no detail deficit).
- `upscaler.py` — optional Real-ESRGAN pre-diffusion super-resolution for small inputs (extra `esrgan`, spandrel only). Manual opt-in; the default `--upscaler` stays `lanczos` and the engine always falls back to Lanczos on absence/error. ESRGAN can degrade faces and thin text.
- `image_io.py` — Unicode-safe cv2 IO (issue #17). Every cv2 file read/write in the package routes through `imread`/`imwrite`; do not call `cv2.imread`/`cv2.imwrite` directly. `to_bgr(image)` is the shared channel normalizer — use it instead of inlining `cvtColor` branches.

For the Doubao alpha-distillation history (why content-image reverse-alpha distillation fails by physics and controlled captures were required), see `docs/research-doubao-distillation.md`.

## Watermarking landscape

Who embeds what (C2PA / IPTC / EXIF / TC260 AIGC / xAI signature / open and proprietary invisible watermarks), whether each is locally detectable, the C2PA 2.4 durable-credentials implications, and the regulatory driver table live in `docs/watermarking-landscape.md` (research 2026-05-24, updated through 2026-06-10). Read it before adding a new `identify` signal, vendor token, or metadata marker. See `identify.py` for what we read today.

## Known limitations

Compact list. Full measurements, incident history, and oracle-validation runs live in `docs/known-limitations.md` — **read the relevant section there before changing the diffusion pipelines, strength defaults, resolution handling, or metadata coverage.**

- `invisible` processes at native resolution for inputs >= 1024px long side and auto-upscales smaller inputs to a 1024px floor (`--min-resolution 0` disables; `--max-resolution N` is an opt-in cap to bound GPU/MPS memory). MPS OOM is memory-tier dependent, not a hard limit: ~24 GB unified memory falls back to CPU (slow but weight-identical output), 32 GB runs native on MPS. The native-vs-cap-vs-floor decision lives in the pure helper `invisible_engine._target_size` — keep the logic there, unit-tested without the model. For large inputs that OOM, `--tile` is the **lossless** alternative to `--max-resolution`: sliding-window diffusion at native resolution, each tile near SDXL's 1024 training size, feather-blended over the overlap (`noai/tiling.py`). It only engages when the long side exceeds `--tile-size`; the geometry (`plan_tiles`) and the blend window (`feather_weights`) are pure and unit-tested (`tests/test_tiling.py`). Caveat: each tile is an independent low-strength regeneration, so at the certified removal strengths (0.20-0.30) tile drift is minimal but not zero; tiling is a memory workaround, not a quality upgrade over a single native pass.
- fp16 VAE black-output (issues #29/#41): the fp16-fixed SDXL VAE (`madebyollin/sdxl-vae-fp16-fix`) is swapped in for the default SDXL checkpoint on cuda/xpu fp16, plus a model-agnostic backstop that detects a degenerate (all-black) fp16 output and re-runs once in fp32. cpu/mps run fp32 and never reproduce the bug.
- Pyright first run is slow (2-3 min) due to ML deps (torch/diffusers/transformers stubs); full-project `uv run pyright` can stall for many minutes — scope it to changed files.
- A third-party PIL plugin autoload (e.g. an HEIF/AVIF plugin) can raise a non-OSError (`ModuleNotFoundError`), not `UnidentifiedImageError`, when opening a file. Code that opens user-supplied or unknown-format files should `except Exception`, not just `OSError`/`UnidentifiedImageError`.
- rich was dropped: the CLI + analysis scripts print plain text (`click.echo` / the `scripts/_plain_console.py` shim). `rich` is NOT a dependency — importing it breaks the core+dev CI sync; new scripts must use the shim. No Unicode glyphs / colors / progress bars in CLI output by design.
- AVIF/HEIF/JPEG-XL metadata detection is a binary scan; C2PA removal in those containers (and MP4/MOV/M4V) is `noai/isobmff.py`; non-ISOBMFF audio/video (WebM/MP3/WAV/FLAC/OGG) strips losslessly via ffmpeg on PATH. An AI-generator token in an `Exif` meta-box *item* (bytes in `mdat`/`idat`) is now blanked **in place** by `isobmff.blank_ai_exif_tokens` (same-length space overwrite, piexif-validated so a coincidental II/MM run in pixels is ignored — no `iinf`/`iloc` surgery, mirrors `blank_ai_xmp_packets`); it scrubs the AI-token value only, leaving camera/editor EXIF intact. Still NOT built: Resemble PerTh audio detection (no presence/confidence flag exists).
- **SynthID technical reference: `docs/synthid.md`** — primary-source-cited doc covering mechanism (post-hoc encoder/decoder pair, 136-bit payload at 512x512, pixel-space, model weights NOT modified), robustness numbers (arXiv:2510.09263: ~99.98% TPR@0.1%FPR across 30 transforms including JPEG/crop/resize/color/noise), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable at >98% TPR@1%FPR), detectability limits (no public decoder, metadata-proxy only), oracle scope, and adoption landscape. Read that doc first before adding notes here.
- **SynthID detection is metadata-only.** No local pixel detector is possible by design (Google's decoder is proprietary, trusted-testers only); we read the C2PA companion proxy, which goes quiet once metadata is stripped — a quiet proxy is not proof the pixel watermark is gone. Each vendor has its OWN oracle and it detects only that vendor's content: the Gemini app "Verify with SynthID" for Google, `openai.com/verify` for OpenAI. **Validate the OpenAI arm FIRST** — `openai.com/verify` is more accessible (fewer per-check restrictions) and the strongest automation candidate (Playwright / Chrome MCP); the Gemini flow is more manual. Ordering/throughput choice, not a substitution (see `docs/synthid.md`). SynthID survives JPEG re-encode, so GitHub issue attachments remain valid pixel-watermark test subjects. Every spectral/phase detection approach evaluated (reverse-SynthID, our own probes) works only on controlled solid fills, never on real content.
- **External AI-vs-real classifier models are out of scope** (decided 2026-05-24): per-generator, degrade off-distribution, and our own light SDXL pass would likely defeat them. Detection stays local + signal-based.
- **Default strength is VENDOR-ADAPTIVE, one ladder for BOTH pipelines** (since 2026-06-09): `resolve_strength(strength, vendor)` picks OpenAI **0.20** / Gemini **0.30** / unknown **0.30** when `--strength` is unset; explicit `--strength` always wins. Removal at low strength is content x pipeline dependent, and near-threshold removal is SEED-NON-DETERMINISTIC — pick a strength with margin and oracle-revalidate per content type. Certified controlnet floors (Modal cert 2026-06-04): OpenAI 0.20 (resolution-independent), Gemini 0.30 (only <= 1536px; native large Gemini needs ~0.35+ or a cap).
- **`controlnet` is the default pipeline**; `--pipeline sdxl` is the lighter opt-down. Neither pipeline clears all content at low strength (photoreal survives controlnet, flat graphics survive sdxl — the lever is higher strength). A removal-priority caller MUST oracle-validate strength across content types; prod recipe: controlnet + per-vendor floor + FIXED seed. Forensic-stealth caveat (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility — removal-processed images are flaggable at >98% TPR@1%FPR.