mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-06-04 18:18:00 +02:00

Files

T

Victor Kuznetsov 4a6cd71ab2 Merge branch 'claude/silly-northcutt-c2bf06': unify C2PA vendor registry + code-health + uv publish

Brings in commit 5cf68a6 (single C2PA_AI_VENDORS registry, erase_lama
grayscale/BGRA support, batch device-cache clearing + --controlnet-scale,
uv publish via OIDC, hatchling pin <1.31). Auto-merged with no conflicts;
ruff/pytest(544)/pyright all clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-03 22:10:25 -07:00

92 KiB

Raw Blame History

Remove-AI-Watermarks

You are a principal Python engineer maintaining a CLI tool and library for removing visible and invisible AI watermarks from images.

How to run

uv run remove-ai-watermarks all <image.png> -o <output.png>
uv run remove-ai-watermarks visible <image.png> -o <out.png> — known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map. --mark auto (default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, and the Jimeng "★ 即梦AI" wordmark; --mark gemini / --mark doubao / --mark jimeng force one. Gemini/Doubao recover pixels exactly with no inpaint at native; Jimeng adds an always-on residual inpaint over the glyph footprint (its mark re-rasterizes per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects use erase.
uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png> — universal region eraser (any logo/object, any position). --backend cv2 (default, no deps) or --backend lama (big-LaMa via onnxruntime, extra lama); --region is repeatable.
uv run remove-ai-watermarks identify <image> — provenance verdict (platform + watermark inventory + confidence); --json for machine output, --no-visible to skip the cv2 sparkle detector
uv run remove-ai-watermarks metadata <image.png> --check — inspect AI metadata (C2PA, EXIF, PNG chunks)
uv run remove-ai-watermarks metadata <image.png> --remove -o <out.png> — strip all AI metadata
uv run remove-ai-watermarks batch <directory> — process every supported image in a directory (output defaults to <directory>_clean/, set with -o). --mode visible|invisible|metadata|all (default visible); the invisible/all path reuses the same --strength/--steps/--pipeline/--controlnet-scale/--device/--max-resolution/--min-resolution/--seed/--hf-token knobs as invisible, --inpaint/--no-inpaint for the visible pass, --humanize for the Analog Humanizer + --unsharp for the final sharpening post-filter, and --restore-faces/--no-restore-faces + --restore-faces-weight for the GFPGAN face-identity post-pass

Test and lint

CI (.github/workflows/test.yml): runs on push to main + every PR. A lint job (ubuntu: ruff check + ruff format --check) plus a test matrix (ubuntu/macos/windows x py3.10/3.12) that does uv sync --frozen --extra dev then pytest. The matrix installs only core + dev (no gpu extra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keep uv.lock valid (don't break --frozen) when editing pyproject.toml. publish.yml stays release-only and now verifies the release tag matches the pyproject.toml version (fails the build on a mismatch) before building, then uploads via uv publish (PyPI trusted publishing over OIDC, no token — replaced the pypa/gh-action-pypi-publish action so the upload no longer depends on that action's bundled twine accepting the Metadata-Version; the id-token: write permission + pypi environment + workflow filename are unchanged, so PyPI's trusted-publisher entry still matches). Release flow: bump the version in pyproject.toml + src/remove_ai_watermarks/__init__.py + uv.lock (the project's own [[package]] entry, ~line 2868), commit chore(release): vX.Y.Z, git tag -a vX.Y.Z -m vX.Y.Z (annotated — git tag without -m errors here), push main + the tag, then gh release create vX.Y.Z — PyPI publish triggers on the GitHub Release published event, NOT on the tag push, so the tag alone does not publish. Sdist must exclude data/ ([tool.hatch.build.targets.sdist] exclude = ["/data"]): hatchling's default sdist bundles all VCS-tracked files, so the committed data/ test corpora (the multi-hundred-MB synthid_corpus images + the visible-mark captures) pushed the 0.8.0 sdist past PyPI's per-project file-size limit (400 "File too large") — the wheel uploaded but the sdist was rejected, so 0.8.0 shipped wheel-only and 0.8.1 carried the fix. The wheel only ships src/ (via [tool.hatch.build.targets.wheel] packages), so it was never affected. A failed PyPI upload of one artifact still leaves the other live and you cannot re-upload the same version — fix the build and cut the next patch. Build backend is pinned hatchling<1.31 ([build-system] requires): hatchling 1.30.0 made Metadata-Version 2.5 (PEP 794) the default, which the twine bundled in pypa/gh-action-pypi-publish@release/v1 rejects ("'2.5' is not a valid Metadata-Version") — this failed the v0.8.3 PyPI upload on 2026-06-01 (tag-match + build passed, the upload step failed; nothing was uploaded, so the version stayed empty on PyPI), when unpinned requires = ["hatchling"] pulled 1.30.0. hatchling 1.30.1 reverted the default back to 2.4 ("kept at 2.4 until more tools support 2.5"), and 1.27-1.29 always emitted 2.4 — so <1.31 keeps uv build on a 2.4-emitting hatchling (it resolves to the latest allowed, 1.30.1, which uploads fine). (The earlier "1.28+ emits 2.5" note was imprecise: the 2.5 default landed only in 1.30.0, verified against hatch's changelog.) The publish workflow now uses uv publish (its uploader handles 2.5), so this pin is no longer load-bearing — it stays as belt-and-suspenders so the first uv-publish release ships 2.4 metadata (isolating the uploader swap from the metadata-version bump); drop it to requires = ["hatchling"] once that release confirms the path.
bash maintain.sh — uv-outdated, uv-secure, ruff check/fix, ruff format, pyright, pytest -n auto
Strict pyright is clean across src/ (0 errors). The cv2/torch/diffusers boundary files (gemini_engine, region_eraser, doubao_engine, humanizer, invisible_engine, noai/watermark_remover) carry a documented per-file # pyright: relax pragma that turns off only the unknown-type / untyped-third-party rules — those libs ship no usable types, so strict typing there fights the ecosystem. Pure-logic files stay fully strict; typings/piexif/__init__.pyi is a local stub so metadata.py/extractor.py resolve piexif. Public ndarray-returning signatures on the relaxed engines are still annotated NDArray[Any] so strict consumers (cli.py) stay clean. When touching a relaxed file, prefer fixing real issues over widening the pragma; keep the pragma scoped to genuinely-untyped boundaries. (uv-secure is clean since idna was bumped 3.11 -> 3.16, fixing GHSA-65pc-fj4g-8rjx.)
Full-project uv run pyright (no path) OOMs/crashes node on this ML-heavy repo (emits a libnode stack frame, no summary) — a known environment limit, not a code error. Gate with uv run --extra dev --extra gpu pyright src/ (completes, authoritative) or scope to changed files; also run uv run ruff check and uv run pytest directly.
Run uv run from the repo root — from another cwd it falls back to a bare env without numpy/cv2/torch.
To add a dev tool (pytest/ruff/pyright) into the env, use uv sync --frozen --extra dev --extra gpu, never uv pip install — uv pip install re-resolves and rewrites uv.lock, which silently bumped transformers to a build incompatible with the pinned diffusers (cannot import name 'Qwen3VLForConditionalGeneration') and broke every identify/metadata import. Recovery: git checkout uv.lock && uv sync --frozen --extra gpu --extra dev. The gpu extra holds diffusers/transformers/torch, so a bare uv sync (no extras) removes them; noai/__init__ is now lazy (PEP 562 __getattr__, so importing identify/metadata no longer pulls watermark_remover/torch), so a bare env breaks only when the removal pipeline is actually invoked, not on import. maintain.sh's uv sync --all-extras also pulls the heavy trustmark/lama wheels (pytorch-lightning, onnxruntime) — fine on a good connection, but on flaky DNS sync only --extra gpu --extra dev and run the lint/test steps by hand.
Metadata/C2PA tests assert against real committed fixtures in data/samples/ (chatgpt-*.png = OpenAI C2PA, firefly-1.png = Adobe, mj-* = Midjourney IPTC, doubao-1.png = ByteDance Doubao with the China TC260 <TC260:AIGC> XMP label and a visible "豆包AI生成" text mark bottom-right; grok-1.jpg = xAI Grok with its EXIF-only Signature: blob + UUID Artist and no C2PA/SynthID/IPTC); synthetic byte blobs cover the JPEG/ISOBMFF format paths. The "non-AI / clean photo" control is no longer in data/samples/ -- the clean_photo conftest fixture serves a verified-negative image from the corpus neg/ set (skips if the corpus is absent).
SynthID reference corpus: scripts/synthid_corpus.py ingests labeled images into data/synthid_corpus/. The labeled images/ (pos/ neg/ cleaned/) are committed (public repo -- review every image for private content before adding; manifest.csv is kept in sync with the files on disk, one row per tracked image); only the synthetic refs/ calibration fills are gitignored. See its README for the collection protocol and verification oracles. cleaned/ examples must be produced by a CURRENT shipped removal method -- the default SDXL img2img pass (optionally --max-resolution). Do NOT archive cleaned outputs from methods that are no longer in the pipeline (ctrlregen, the old text/face-protection, IP-Adapter FaceID, CodeFormer) or from the experimental opt-in paths (controlnet, GFPGAN restore) as corpus examples; a cleaned reference should represent the canonical removal, and a removed method's output is not a reproducible example. Keep those experiment outputs in a local working dir, never in the committed corpus.

Configuration

GPU/ML modules (invisible_engine, watermark_remover) are optional — guard imports with is_available() checks
Optional detection extras: detect (imwatermark — open SD/SDXL/FLUX watermark) and trustmark (Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded by is_available() and skipped by identify when absent.
Optional restore extra (gfpgan/facexlib/basicsr): the GFPGAN face-identity post-pass (face_restore.py, CLI --restore-faces, EXPERIMENTAL, opt-in, OFF by default). Guarded by face_restore.is_available(); when enabled it auto-skips with a debug log when the extra is absent or no face is detected. numpy<2-pinned and Python-3.12-pinned (see the face_restore.py Key-modules bullet).
Tests for the model-running paths are limited to availability checks (multi-GB downloads). But the pure helpers inside ML-adjacent modules are unit-tested without any download and must stay that way: _target_size (native-vs-downscale-cap-vs-upscale-floor, test_invisible_engine.py), humanizer.unsharp_mask/adaptive_polish (test_humanizer.py), auto_config.plan/detectors (test_auto_config.py), and the MPS->CPU fallback control flow via mocked pipelines (test_img2img_runner.py, 100% cover). Don't skip these as "ML, needs a model" — only remove_watermark/the diffusion bodies do.

Key modules

noai/c2pa.py — PNG chunk parser; use extract_c2pa_chunk(path) to get raw caBX payload, has_c2pa_metadata(path) to detect. Do not reimplement chunk parsing. extract_c2pa_info(path) sets synthid_watermark/synthid_vendors when the manifest is signed by a SynthID-using vendor, and soft_binding/soft_binding_vendors when a c2pa.soft-binding alg names a forensic-watermark vendor (soft_binding_vendors_in(buffer) is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). PNG/caBX chunk reads are clamped to the remaining file size (safe_length = min(length, remaining); skipped chunks use seek) so a malformed huge length cannot drive a multi-GB allocation (shared safety discipline matching isobmff.scan_c2pa_region).
noai/constants.py — PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, and C2PA_AI_VENDORS — the single C2paAiVendor registry of C2PA-signing vendors (issuer byte, resolved org name, the identify platform label, and a synthid flag), from which C2PA_ISSUERS, SYNTHID_C2PA_ISSUERS (issuers that pair SynthID with C2PA: Google, OpenAI), and identify._ISSUER_PLATFORM are all derived — plus C2PA_SOFT_BINDINGS (soft-binding alg prefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new C2PA vendor as one C2PA_AI_VENDORS entry (never edit the derived dicts), a new soft-binding to C2PA_SOFT_BINDINGS; not inline.
metadata.py — scan_head(path, size=1MB) is the shared input for every C2PA/AIGC/IPTC byte scan: first size bytes plus the payloads of any provenance metadata found beyond that window — for ISOBMFF, the late provenance boxes from isobmff.scan_c2pa_region (catches a manifest after a large mdat); for PNG, the late tEXt/iTXt/zTXt/eXIf/iCCP chunks from _png_late_metadata (catches an XMP/EXIF packet appended after a large IDAT, e.g. a TC260 AIGC label at ~2.7 MB). Behavior-neutral (f.read(size)) for non-ISOBMFF inputs and for any file that fits within size. Use it instead of open().read(1MB) for any new marker scan. synthid_source(path) returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker). get_ai_metadata surfaces the verdict, and metadata --check prints it as a callout. Both get_ai_metadata and has_ai_metadata guard the PIL open with except Exception (HEIC/unknown formats raise non-OSError) and fall through to the binary scan. xai_signature(path) detects xAI/Grok's EXIF-only scheme (ImageDescription = Signature: <base64> + UUID Artist); it feeds has_ai_metadata, get_ai_metadata (key xai_signature), and identify. iptc_ai_system(path) detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (IPTC_AI_FIELD_MARKERS = AISystemUsed/AISystemVersionUsed/AIPromptInformation/AIPromptWriterName) and returns the AISystemUsed generator name (or "fields present"). remove_ai_metadata routes ISOBMFF video (.mp4/.mov/.m4v) through the same isobmff.strip_c2pa_boxes as AVIF/HEIF (MP4 is ISOBMFF), and _scrub_ai_exif removes the xAI signature + AI-generator EXIF tags on JPEG output. strip_c2pa_boxes is fail-safe on a malformed box: it returns the original bytes unchanged with a logged warning instead of truncating the tail to EOF (detection-only scan_c2pa_region still stops at a malformed box). _png_late_metadata clamps each late-chunk read to the remaining file size (safe_length = min(length, remaining)) so a malformed length cannot drive a multi-GB allocation.
identify.py — the OpenAI rollout caveat is keyed on _vendor_of(synthid) == "OpenAI" (not a raw substring over the issuer + verdict blob). identify(path) aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1 AISystemUsed, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature via metadata.xai_signature, the China TC260 AIGC label via metadata.aigc_label, the HuggingFace hf-job-id job marker via metadata.huggingface_job, the Samsung Galaxy AI editing marker via metadata.samsung_genai, the visible marks — Gemini sparkle plus the ByteDance Doubao 豆包AI生成 / Jimeng 即梦AI text marks via the watermark_registry — open invisible watermark, Adobe TrustMark via trustmark_detector) into one ProvenanceReport. is_ai_generated is True or None (never asserted False — stripped metadata is not proof of clean origin). The hf_job, visible-mark, and Samsung samsung_genai signals are medium confidence: each lifts an otherwise-Unknown verdict to a tentative AI (hf_only / visible_only / samsung_only, parallel branches; visible_only fires on any visible_* signal) but is excluded from the high-confidence ai_from_metadata set, so none overrides a hard metadata signal. Visible-mark detection (check_visible, signals visible_sparkle / visible_doubao / visible_jimeng): the Gemini sparkle keeps its own file-level path (_visible_sparkle → gemini_engine.detect_sparkle_confidence, promoted only at confidence ≥ _SPARKLE_THRESHOLD 0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng reuse the registry detectors (_visible_text_marks → watermark_registry), each gated by its own engine NCC threshold via MarkDetection.detected (Doubao 0.4, Jimeng 0.45). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label, so the visible path is their stripped-metadata fallback. Visible marks set platform only when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here. import identify is deliberately light (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a full check_visible run): it imports only the pure noai.c2pa/noai.constants submodules, and noai/__init__ is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a full gpu/detect install — fits a 512 MB host. The heavy paths are opt-in: check_invisible=True needs the detect/trustmark extras (each pulls torch; TrustMark also downloads weights), so on a core-only deploy leave check_invisible off (it is a no-op there anyway). Before the lazy __init__, the mere presence of torch in the env inflated import identify to ~420 MB. C2PA platform attribution is device-token-first, issuer-scan fallback (_device_platform scans manifest bytes for _DEVICE_C2PA_PLATFORM tokens, then _attribute_platform/_ISSUER_PLATFORM). Why, verified on real signed files 2026-05-26: the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. Token distinctiveness is load-bearing: bare b"Truepic" mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAI chatgpt-1.png fixture), so the token is the specific b"Truepic_Lens" from the Lens SDK claim generator; likewise b"Pixel Camera" (cert CN) not bare b"Pixel". _DEVICE_C2PA_PLATFORM lists ONLY tokens verified against a real C2PA file: Leica (lc_c2pa/Leica Camera), Nikon (NIKON), Pixel (Pixel Camera -- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (sony.sig/sony.cert -- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (Truepic_Lens). Canon/Bria have no public direct-download C2PA sample (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the sony.* namespace but are not separately verified. Samsung Galaxy + ASUS Gallery live in a separate _SIGNER_C2PA_PLATFORM (scanned after _device_platform, before the issuer fallback), NOT in _DEVICE_C2PA_PLATFORM — verified on real signed files 2026-05-29. Reason: a Galaxy phone stamps BOTH its device cert AND a trainedAlgorithmicMedia/genAIType AI marker on a Generative-Edit image, so treating it as a "genuine camera capture" would false-fire integrity-clash rule 2 on every Galaxy AI edit. The signer tokens (b"Samsung Galaxy" cert org — distinct from the EXIF SM-xxxx model string on ordinary Samsung photos; b"com.asus.gallery" claim generator) only resolve the platform label; the AI verdict still comes from the source-type / genAIType. ASUS Gallery is a C2PA-signed edit with no AI marker, so it attributes the platform without asserting is_ai. Samsung's genAIType (in the proprietary PhotoEditor_Re_Edit_Data JSON) is an undocumented Galaxy-AI editing marker (metadata.samsung_genai, gated on the PhotoEditor_Re_Edit_Data container; non-zero value = AI tool used, values {1,5} observed): medium-confidence because the field has no public spec (verified 2026-05-29: absent from C2PA spec + Samsung docs), but it co-occurred with trainedAlgorithmicMedia in 3/3 verified files that record a source-type and was the SOLE AI marker on a Galaxy S24 file that omits the source type. Camera C2PA marks capture authenticity, not AI (Pixel carries computationalCapture, not trainedAlgorithmicMedia), so these never set is_ai -- that stays driven by digital-source-type. c2pa.cbor_text_after (now public) is best-effort for the generator detail string only and can be None when the manifest keys it claim_generator_info (Pixel). Issuer→generator mapping is is_ai-gated (_attribute_platform(issuers, is_ai=c2pa_is_ai)): a specific AI-generator platform is named only when the digital-source-type is trainedAlgorithmicMedia; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an unmapped Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute). _attribute_platform defaults is_ai=True so the mapping stays unit-testable in isolation. Add capture-camera tokens to _DEVICE_C2PA_PLATFORM, editing-app/AI-device signer tokens to _SIGNER_C2PA_PLATFORM, generator/issuer platforms to the C2PA_AI_VENDORS registry in constants.py (which derives _ISSUER_PLATFORM), not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (_issuers_in) and generator (_ai_tools_in, reusing C2PA_AI_TOOLS) are recovered by binary-scanning the first MB. EXIF Software / Make / Artist / ImageDescription and XMP CreatorTool generator tags are read by metadata.exif_generator (PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched against AI_GENERATOR_TOKENS so ordinary editors (plain "Adobe Photoshop") and real-camera Make ("Apple"/"Canon") are not flagged. Ideogram tags its output with EXIF Make="Ideogram AI" (verified on a real download 2026-05-24) — that's why Make is read. Integrity-clash detection (_integrity_clashes, surfaced as ProvenanceReport.integrity_clashes, printed in red by identify and serialized to --json): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIF Make="Ideogram AI"), and (2) a camera-capture C2PA device (_DEVICE_C2PA_PLATFORM) coexisting with any AI-generation marker. Independence is source-grouped (_CLASH_SOURCE, added 2026-06-02): the C2PA issuer attribution (c2pa) and the SynthID proxy (synthid) are NOT independent — the proxy is inferred from the same manifest — so they share one source and two vendors named within a single manifest do not clash. This killed a false-positive class found on the spaces corpus: legitimate multi-actor manifests where a product wraps another vendor's engine (Microsoft Designer on OpenAI → OpenAI, Microsoft; Microsoft on Google → Microsoft, Google LLC, Google C2PA Core Generator Library) or an edit chain re-signs (Adobe over a Gemini original → Adobe c2pa + Google synthid) — 19 such files across the 2026-06-01/02 batches read as clashes before the fix. Rule 1 still fires when a manifest vendor disagrees with a genuinely independent stamp (EXIF/XMP generator, IPTC AISystemUsed, AIGC, xAI); each non-c2pa/synthid family is its own source (test_identify.py::TestIntegrityClashes::{test_multi_actor_manifest_no_clash,test_manifest_vendor_vs_independent_signal_clashes}). Vendor normalization is _vendor_of over _AI_VENDOR_TOKENS (so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). High-precision by design: only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTC AISystemUsed, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are excluded (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolved platform (a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce zero clashes (false-positive guard in test_identify.py::TestRealSamplesHaveNoClash).
watermark_registry.py — single catalog of known visible watermarks, the unified "find known marks in their usual places, recognize, remove" entry. Reverse-alpha based by policy: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (original = (wm - a*logo)/(1-a)) — Gemini recovers cleanly with no inpaint (its sparkle alpha comes from a pure-black capture, so it is near-exact), while Doubao and Jimeng both add an always-on THIN residual inpaint over the glyph footprint (their text marks re-rasterize + jitter a few px per image, so a single capture cannot pixel-cancel them; the inpaint blends into the reverse-alpha-recovered pixels). Arbitrary-region inpainting still lives in region_eraser/erase. Each KnownMark ties a key to {usual location, in_auto flag, recovery (="reverse-alpha"), a detect adapter → uniform MarkDetection, a remove adapter}. Entries today: gemini (bottom-right sparkle), doubao (bottom-right "豆包AI生成"), and jimeng (bottom-right "★ 即梦AI"). detect_marks scans all; best_auto_mark picks the highest-confidence detection. Cross-engine confidences aren't directly comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (_GEMINI_AUTO_MIN_CONF) for its detected flag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacks auto. The shape-keyed Doubao/Jimeng NCC detectors don't cross-fire (jimeng scores ~0.22 on the Doubao strip, well under its 0.45 threshold), so auto picks the right one on a Doubao vs Jimeng image. cli.cmd_visible is registry-driven: --mark auto → best_auto_mark, --mark <key> → that mark; --mark choices come from mark_keys(). _doubao_remove/_jimeng_remove apply reverse-alpha only when the mark is detected AND reverse_alpha_available; outside that, removal is skipped (not inpainted). Add a new visible mark = one KnownMark entry + its engine (with a captured alpha map); do not re-add per-mark if branches in the CLI. Alpha-on-save policy (issue #30): cli._write_bgr_with_alpha rejoins the input's alpha plane unchanged — it must NOT zero alpha in the watermark bbox. Reverse-alpha (and erase inpaint) recover real pixels there, so zeroing alpha punched a transparent hole that renders as a solid white box on any non-transparent viewer (Gemini app exports are opaque RGBA, so every user hit it; regression-guarded by test_visible_keeps_alpha_opaque_in_watermark_region). The registry remove() still returns its region (used for inpaint_residual positioning), but the CLI no longer uses it to clear alpha.
gemini_engine.py — visible Gemini-sparkle remover/detector (cv2/numpy, no GPU). detect_sparkle_confidence(path) is the file-level entry point used by identify.py. The public entry points normalize a grayscale (2D) or RGBA (4-channel) input to BGR up front so a non-BGR image does not crash the cv2 pipeline. Detection localization (issue #36): detect_watermark's global multi-scale NCC search applies a size weight ((scale/96)**0.5) that suppresses tiny-patch false positives but can let a larger, mediocre match (e.g. a bright collar in a portrait) outrank a small, near-perfect sparkle in the corner — so a faint sparkle on a busy background scored below threshold and read as clean (the regression osachub reported from widening the search window 256px->512px between v0.7.2 and v0.8.8). _corner_promote adds a bottom-right-corner raw-NCC pass on top of the global search: a match with raw NCC >= _CORNER_PROMOTE_NCC 0.85 that beats the global pick overrides it (it only ever replaces a lower-fidelity pick, so it cannot weaken an existing detection), rescuing the buried sparkle without reverting the wider window. The corner side is relative-clamped (_CORNER_PROMOTE_FRAC 0.20 of the short side, clamped to [_CORNER_PROMOTE_MIN 96, _CORNER_PROMOTE_MAX 384]): a fixed 256px is a true corner on a large image but covers ~70% of a small portrait, where a real photo raw-matches the star at ~0.81 (relative tightening drops that worst case to ~0.69, while the upper clamp stops the corner ballooning on huge images where a real photo reached ~0.83 at 512px). The 0.85 gate sits midway between the worst real-photo corner match (~0.78 across native + downscaled negatives) and a genuine faint sparkle (~0.93), so promotion adds true detections with zero corpus false positives (Gemini's sparkle sits ~60-160px from the corner at fixed margins, covered by the [96, 384] band at every measured size). Regression-guarded by test_gemini_engine.py::TestCornerPromotion. Removal is reverse-alpha with an over-subtraction guard (remove_watermark → _reverse_alpha_blend, else _inpaint_footprint): the sparkle alpha is computed (alpha = max(R,G,B)/255) from the bundled sparkle-on-black captures assets/gemini_bg_{96,48}.png (the capture max is ~130, NOT 255 — the sparkle is a ~51%-opaque white overlay, so alpha maxes at ~0.51, which is CORRECT for the capture, not under-exposed). The alpha is near-exact only when the real mark's effective opacity matches the capture, which holds on bright/flat backgrounds — re-verified clean on demo_banana_before.png 2026-05-31. Issue #30 (dark-background black pit): on a dark/textured background (e.g. grass, ~73) the real sparkle's effective opacity is LOWER than the captured 0.51, so the fixed-alpha reverse blend OVER-subtracts (watermarked - a*logo goes negative) and drives the footprint to black — the white sparkle becomes a black diamond. remove_watermark now detects this via _reverse_alpha_oversubtracts (fraction of footprint pixels with alpha >= _FOOTPRINT_ALPHA 0.1 whose numerator < 0 exceeds _OVERSUB_FOOTPRINT_FRAC 0.05) and inpaints the footprint (_inpaint_footprint, cv2 NS over the dilated alpha mask) from the surrounding pixels instead. Behavior-neutral on the working case: a bright background over-subtracts at ~0% so reverse-alpha is used and the output is byte-identical to before (verified: demo_banana 0.0 frac vs issue-#30 grass 0.61 frac; regression-guarded by test_gemini_engine.py::TestOverSubtractionGuard, which composites the sparkle at a reduced effective alpha to reproduce the mismatch). Under-subtraction (the symmetric case, fixed 2026-06-03): some real Gemini sparkles are rendered MORE opaque than the captured ~0.51, so the fixed-alpha reverse blend UNDER-subtracts and leaves a bright sparkle residual the detector still fires on (measured on the spaces corpus: a visible-removal audit through the registry path left a detectable sparkle on a meaningful fraction of marks, all under-removals, NOT a background-brightness class — failures and successes had the same input confidence and the same background-luma distribution; the discriminator was the removal delta itself). remove_watermark now estimates a per-image alpha gain (_estimate_alpha_gain: effective sparkle opacity at the bright core vs the local background ring, a_eff/a_cap, clamped [1.0, _ALPHA_GAIN_MAX 1.94]) and scales the alpha to match before the over-sub/blend branch. The gain cleanly separates on the corpus (under-removed marks ~1.47, cleanly-removed ~1.00), and a deadband (_ALPHA_GAIN_DEADBAND 1.05) keeps a matching sparkle byte-identical to the pre-fix output, so the fix is purely additive (0 regressions on the audit set; the over-sub guard still runs on the scaled alpha as the safety net for an over-shooting estimate). Regression-guarded by test_gemini_engine.py::TestUnderSubtractionGain (composites a more-opaque-than-capture sparkle; asserts on footprint pixels, NOT the detector — the detector's NCC is degenerate on a flat synthetic background, so a re-detect conf is meaningless there; the real corpus removal drops the detector from ~0.80 to ~0.27). False-positive gate (added 2026-06-03): detect_watermark's shape-only NCC (spatial*0.5 + gradient*0.3 + var*0.2) fires on ornate/flat content (text strips, banners, hatching) that coincidentally matches the diamond shape — a real Gemini sparkle is a bright WHITE overlay, so its core sits above the local background, but the NCC is contrast-invariant and cannot see that. The fusion now demotes (caps confidence to 0.30) any match that is BOTH low-confidence (< _SPARKLE_FP_CONF 0.65) AND has a low core-ring brightness margin (_core_ring_margin < _SPARKLE_FP_MARGIN 5). Real sparkles escape via EITHER high confidence (white-bg sparkles score ≥0.79 despite a low margin — the NCC shape match is strong) OR high margin (dark/mid backgrounds, incl. the #36 faint-corner case, lift well clear), so BOTH must fail to demote. The gate is monotonic (only ever removes detections, never adds), so it cannot regress the verified-negative corpus (already 0 FPs). On the spaces corpus it demoted 16/495 flagged sparkles (13 carried no AI metadata = content FPs; the 3 AI-meta were visually FPs / a near-invisible white-on-white sparkle whose AI verdict is held by metadata anyway), and dropped the removal-audit failures 20→15 (post-removal flat footprints the NCC re-fired on). _core_ring_margin and _estimate_alpha_gain share the _core_and_bg helper (core 75th-pct brightness vs background-ring median). Regression-guarded by test_gemini_engine.py::TestSparkleFalsePositiveGate. The registry's optional inpaint_residual (edge cleanup) is a no-op on a clean reverse-alpha removal; an earlier "Gemini smears" read was a misjudged soft-fur original, not an artifact. The bg assets are now rebuilt from OUR OWN controlled captures (data/gemini_capture/captures/, committed) by scripts/visible_alpha_solve.py gemini, which locates the 96px sparkle on the black capture and crops it to the two logo sizes; our capture matched the previously third-party-sourced gemini_bg_96.png to NCC 0.9998, validating the asset and making it reproducible. Gemini's multi-size fixed-slot model is genuinely different from the Doubao/Jimeng text-strip engines (so it stays a separate engine, not part of the shared-base refactor).
doubao_engine.py — visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU). DoubaoEngine.locate anchors a bottom-right box by geometry (mark scales with image WIDTH), extract_mask pulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxy sat = roi.max(axis=2) - roi.min(axis=2) (no HSV conversion). detect is shape-consistent: it matches the bundled alpha glyph silhouette (assets/doubao_alpha.png) against the candidate via zero-mean normalized correlation (_template_match_score, cv2 TM_CCOEFF_NORMED), gated at DETECT_NCC_THRESHOLD 0.4 over a small DETECT_MIN_COVERAGE floor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243). Removal = reverse-alpha + thin residual inpaint (remove_watermark_reverse_alpha): original = (wm - a*logo)/(1-a) from the bundled alpha map + _ALPHA_LOGO_BGR (pure white) + _ALPHA_*_FRAC geometry, then a deliberately THIN inpaint (_RESIDUAL_*, INPAINT_NS) over the glyph footprint clears leftover edges without smearing. Alpha is rebuilt by scripts/visible_alpha_solve.py (the careful gray-self solve: cubic background fit, mean over channels, full halo, unblurred), same recipe as Jimeng — the captures are committed in data/doubao_capture/captures/. Removal aligns ALWAYS (no _ALPHA_NATIVE_BAND fast-path): it tries fixed geometry AND _aligned_alpha_map's TM_CCOEFF_NORMED scale+position search and keeps the lower-residual one — the mark is re-rasterized and a few px off per image, so fixed geometry alone leaves a visible outline even at 2048. The locate box (WM_*) is generous (0.22 wide, margins 0.004) and reaches close to the corner — a tight box (the old 0.185 / margin 0.012) let a corner-ward shift fall OUTSIDE the alignment search, so the align missed and a readable outline survived; regression-guarded by test_recovers_shifted_mark_on_texture (composes the alpha shifted on a known texture; old box ~29 vs new ~1 mean residual). Issue #13 follow-up defect (found 2026-05-31): the SHIPPED Doubao removal left a clearly READABLE "豆包AI生成" outline on the real doubao-1.png sample, while detect returned conf 0.0 (it is fooled by a thin outline) so test_reverse_alpha_removes_mark passed and the old "56/56 clean" claim was detector-measured, not visual. Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight box; the careful rebuild + always-align + thin inpaint + wide box takes it from a readable outline to faint texture-level traces (parity with Jimeng — a single capture cannot pixel-cancel a per-image re-rasterized mark). Lesson: a detector-only removal test is insufficient; assert visual residual (the textured-shift test). extract_mask guards a degenerate ROI (bh < 16 or bw < 16 -> empty mask, skips cv2): the always-align removal scores each placement with a residual detect(out), and on an extremely wide/short image (e.g. 2048x1, test_wide_short_does_not_raise) that fed cv2's GaussianBlur a ~1-px-tall ROI and faulted natively on Windows py3.12 (access violation, non-deterministic — one CI cell went red while a re-run passed); the old at-native path never ran detect on degenerate sizes. Real images always clear the guard (the WM_* box floors are max(16, …) height / max(40, …) width), so it only short-circuits slivers. reverse_alpha_available is just "asset present"; the registry gates removal on detect. The shipped third-party _refs/zhengsuanfa_doubao_alpha_120x20.png is NOT a usable alpha (verified 2026-05-29). Arbitrary-region inpainting is region_eraser/erase.
jimeng_engine.py — visible Jimeng / Dreamina "★ 即梦AI" remover/detector (cv2/numpy, no GPU), built 2026-05-30 from issue #13's solid captures (@powersee). Mirrors doubao_engine: locate anchors a bottom-right box by geometry (scales with WIDTH), extract_mask pulls the light low-chroma glyphs (white top-hat + grayish + min-luma), detect matches the bundled "即梦AI" glyph silhouette (assets/jimeng_alpha.png) via TM_CCOEFF_NORMED over a coverage floor. Threshold DETECT_NCC_THRESHOLD 0.45 cleanly separates real Jimeng marks (>=0.81) from the Doubao strip (0.21) and other AI output (0.0), so the two ByteDance marks don't cross-fire in --mark auto. Logo is pure white (255,255,255) (_ALPHA_LOGO_BGR; the white capture + an L-pair-solve confirm 254.6); compositing is sRGB, not linear (a linear-light solve tripled the cross-residual). Alpha rebuilt by scripts/visible_alpha_solve.py from the GRAY capture (data/jimeng_capture/captures/, the solid captures now committed): a = (I - B)/(255 - B), B a per-capture cubic background fit over the non-glyph pixels, **averaged over channels, full halo extent (down to a0.02), unblurred**. Gray (bg ~132) is the deliberate choice over black: it is the best proxy for real content (the mark sits on bright photo areas, not on black), and the careful build drops the gray self-residual to ~1.3. The mask quality, not the method, was the earlier limit — a max-channel / quadratic-bg / blurred / halo-truncated build (and a black-dominated LS) left a visible outline (lesson from issue #13: when reverse-alpha leaves a ghost, suspect the captured alpha map before adding heuristics or switching method). Geometry emitted by the solver at _ALPHA_NATIVE_WIDTH 2048: _ALPHA_WIDTH_FRAC 0.202, _ALPHA_HEIGHT_FRAC 0.058, margins ~0.029. Removal = reverse-alpha + a deliberately THIN residual inpaint (remove_watermark_reverse_alpha, _RESIDUAL_DILATE 5 over the _RESIDUAL_ALPHA_FLOOR 0.05 footprint, _RESIDUAL_INPAINT_RADIUS 2, INPAINT_NS): a single 2048 alpha cannot pixel-cancel the mark re-rasterized at another resolution (alpha maps from independent captures correlate 0.998, not 1.0; off-native reverse-alpha alone only halves the mark), so a tight inpaint clears the residual edges WITHOUT the texture/edge smear a wide full-footprint pass caused. Placement ALWAYS tries fixed geometry AND _aligned_alpha_map's NCC scale+position search, keeping the lower-residual — the mark re-rasterizes + jitters a few px per image even at the captured width, so fixed geometry alone misses (there is no _ALPHA_NATIVE_BAND fast-path; the scale search _ALPHA_ALIGN_SEARCH is fine-stepped, and the WM_* locate box is generous so a corner-ward shift stays inside the search — the same widen that fixed Doubao). Verified clean on the solid captures (native 2048; faint self-residual ~1.3 visible only on a dead-flat field, hidden by real texture) and a real 1440-wide Jimeng download (off-native, table edge preserved). reverse_alpha_available is just "asset present"; the registry gates on detect. No committed real sample (the real content download stays gitignored; only the solid calibration captures are committed) — tests/test_jimeng_engine.py synthesizes a mark from the bundled alpha asset, and test_recovers_shifted_mark_on_texture guards the align-on-shift path that the Doubao defect exposed. Jimeng images are independently caught by the China TC260 AIGC label in metadata/identify, so this engine is the visible-mark removal path, not a new identify signal.
region_eraser.py — universal region eraser (erase CLI). erase(image, boxes=|mask=, backend=) accepts grayscale (2D) and RGBA (4-channel) inputs on both backends (erase_cv2 and erase_lama each split off any alpha plane and re-attach it unchanged, and promote grayscale to BGR for processing — LaMa would otherwise crash on grayscale and drop alpha on BGRA): boxes_to_mask → cv2.inpaint (cv2 backend, default, no deps) or big-LaMa via onnxruntime (lama backend, extra lama, Carve/LaMa-ONNX Apache-2.0 model downloaded on first use, never bundled). erase_lama crops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy _get_lama_session singleton; lama_available() guards the optional import. LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU (FFC working set, not arena — enable_cpu_mem_arena=False does not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal.
invisible_watermark.py — detect_invisible_watermark(path) decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via the imwatermark library. Known fixed patterns (verified against upstream source) live in _BITS_48 (SDXL 48-bit, FLUX.2 48-bit) and _SD1_STRING ("StableDiffusionV1", SD 1.x/2.x). Optional dep (extra detect); returns None when absent. The detect extra pulls torch transitively (invisible-watermark declares torch a hard dep, and WatermarkDecoder eagerly imports rivaGan -> torch at import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separate gpu extra required. Unlike SynthID this is locally detectable, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs.
trustmark_detector.py — detect_trustmark(path) decodes the OPEN, keyless Adobe TrustMark watermark (the soft binding behind Adobe Durable Content Credentials, alg com.adobe.trustmark.P) via the optional trustmark package (extra trustmark; pulls torch, downloads model weights on first use). Mirrors invisible_watermark.py (lazy singleton guarded by a double-checked threading.Lock so concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects provenance, not AI origin as such (TrustMark also marks human-authored content), so identify lists it as a watermark without setting is_ai_generated. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only named via the C2PA_SOFT_BINDINGS scan, not decoded. False-positive gate (added 2026-05-29): TrustMark's wm_present is a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a durable soft binding engineered to survive re-encoding, so detect_trustmark re-decodes after a mild JPEG round-trip (_survives_reencode, _REENCODE_QUALITY 95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise.
noai/watermark_remover.py — the WatermarkRemover class has two diffusion pipelines, selected by the explicit pipeline ctor arg (NOT inferred from model_id -- both use the same SDXL base, DEFAULT_MODEL_ID). default runs plain SDXL img2img (_run_img2img). controlnet (EXPERIMENTAL, opt-in; _run_controlnet, _load_controlnet_pipeline) runs StableDiffusionXLControlNetImg2ImgPipeline with the SDXL-native canny ControlNet xinsir/controlnet-canny-sdxl-1.0 (watermark_profiles.CONTROLNET_CANNY_MODEL): the control image is cv2.Canny(gray, 100, 200) stacked to 3 channels (_CANNY_LOW/_CANNY_HIGH, prompt _CONTROLNET_PROMPT / _CONTROLNET_NEGATIVE). Removal still comes from the img2img regeneration (strength); the ControlNet only PRESERVES text and face STRUCTURE via the edge map -- no original pixels are copied or frozen, so SynthID does not survive. Canny holds face STRUCTURE but NOT identity (the regenerated face drifts in likeness -- canny carries edges, not identity; face identity is preserved by the optional --restore-faces GFPGAN post-pass (EXPERIMENTAL, opt-in, OFF by default) -- see face_restore.py). controlnet_conditioning_scale (ctor arg, default 1.0) is the structure-preservation knob. Same dtype rule as default (fp32 on cpu/mps, fp16 only on cuda/xpu; the fp16-fixed SDXL VAE _SDXL_FP16_VAE_ID is swapped in on fp16 GPUs -- issue #29) and the same MPS->CPU fallback (reload on cpu/fp32, drop a non-cpu generator, retry once).
face_restore.py — optional GFPGAN face-restoration post-pass (cv2/torch/gfpgan boundary, top-of-file pyright pragma). EXPERIMENTAL, opt-in, OFF by default. Runs AFTER the diffusion removal pass (InvisibleEngine.remove_watermark, params restore_faces=False / restore_faces_weight=0.5; CLI --restore-faces/--no-restore-faces + --restore-faces-weight on invisible/all/batch). Restores face IDENTITY while still scrubbing the pixel watermark: GFPGAN re-synthesizes each face from a StyleGAN2 prior (codebook/GAN pixels, NOT the original), so the composited face regions carry no watermark and no pixel-copy -- oracle-validated clean at weight 0.5 with identity preserved. Flow: GFPGANer.enhance runs on the ORIGINAL (watermarked) image -> identity faces + RetinaFace boxes (restorer.face_helper.det_faces); _composite_faces feather-composites those restored face REGIONS into the diffusion-cleaned image. is_available() gates on gfpgan + facexlib; lazily-built GFPGANer singleton forces CPU unless CUDA (the pip GFPGANer has an MPS device-mismatch bug; it is a cheap post-pass on a few face crops). _apply_basicsr_shim() recreates the removed torchvision.transforms.functional_tensor module that basicsr imports. The pure _composite_faces helper (Gaussian-feathered rectangular alpha per box, out = restored*a + base*(1-a)) is unit-tested without the model (tests/test_face_restore.py); the model-running path is gated behind is_available(). Commercial-safe (GFPGAN Apache-2.0 + RetinaFace MIT); the CodeFormer alternative is NON-COMMERCIAL and is NOT shipped. The restore extra (gfpgan/facexlib/basicsr) is kept OUT of all (heavy + the GFPGANv1.4 + RetinaFace weights download on first use, never bundled). restore pins numpy<2 (same trap class as the removed faceid/insightface extra): basicsr/gfpgan/facexlib are an old ecosystem, so the extra caps scipy<1.18 (>=1.18 uses np.long, gone in numpy 1.24-1.26) and numba<0.60 to keep the whole env on one numpy 1.26 resolution; verified the --extra dev --extra gpu gate env stays numpy 1.26.4 + diffusers.loaders.peft importable with restore present. basicsr 1.4.2 builds only on Python <3.13 (its setup.py get_version() uses exec(...) + locals()['__version__'], which the 3.13 fast-locals change broke -> KeyError: '__version__'), so the project is pinned to Python 3.12 via .python-version and [tool.uv.extra-build-dependencies] basicsr = ["setuptools<69"]. basicsr ships sdist-only (no wheel).
auto_config.py — the --auto quality-mode planner (EXPERIMENTAL). plan(image_path) -> AutoConfig | None inspects the INPUT image (before the diffusion model loads) and picks the pipeline modes, so the run adapts to content. Designed to run as the FIRST step of the invisible/all pipeline, wherever that runs — locally or the raiw.cc Modal GPU worker — never on the 512 MB web host (image work there OOM-crashes the container; the planner is _apply_auto in cli.py for the CLI, and raiw-app would call plan() inside RaiwProtect.remove). Quality-priority routing: ControlNet (text/face-structure preservation) is the default; it is skipped for default (plain SDXL) only on a clearly structure-less image (not has_face and not has_text and edge_density < _STRUCTURELESS_EDGE_MAX 0.008). restore_faces is on when a face is present. When a smoothing pass (controlnet/restore) ran, the adaptive polish (humanizer.adaptive_polish) is applied: it targets the input's Laplacian variance (detail level) with a capped unsharp + edge-masked grain, restoring photo/face texture while sparing text (text is already high-frequency, so the deficit is tiny and almost no polish lands -- the old fixed unsharp/grain speckled small text; validated 2026-06-03 on gemini_3 lap-var 84->334 toward the 592 original, openai_1 text near-untouched). Detection is cv2-only and torch-free (~100 MB peak RSS, a few ms — measured): OpenCV YuNet (cv2.FaceDetectorYN, MIT, 232 KB model bundled at assets/face_detection_yunet_2023mar.onnx) for faces, a Canny edge-density + MSER region heuristic for text/structure (the text part is a rough Phase-1 placeholder — DBNet via cv2.dnn is the planned precision upgrade; it only ever ADDS controlnet so a miss is backstopped by edge-density and a false positive only costs a controlnet run), and edge_density. min_resolution stays 1024. Every auto decision is independently overridable (interface principle): _apply_auto (cli.py) overrides only the three content-adaptive modes the user left at their click default (ctx.get_parameter_source(...) == DEFAULT) — --pipeline, --restore-faces/--no-restore-faces, and --adaptive-polish/--no-adaptive-polish always win; --min-resolution/--strength/--unsharp/--humanize are independent knobs. --adaptive-polish also works WITHOUT --auto (manual detail-targeted polish; the engine's adaptive_polish param uses the full-res original as the detail reference). Prints the chosen plan (AutoConfig.reason). Wired into cmd_all/cmd_invisible (not batch yet — its engine is cached per-mode, auto needs a per-image pipeline). Adds ZERO new pip deps (all cv2 core + the bundled MIT model + the cv2-only adaptive polish). Still deferred: Real-ESRGAN-via-Spandrel upscaling (a new esrgan extra) and a DBNet text detector (replacing the MSER heuristic). Unit-tested without the model where possible (tests/test_auto_config.py): flat/text synthetic images for routing, monkeypatched detect_face/detect_text for the face/text branches (a real detectable-face fixture is private, never committed). Production adoption path for raiw.cc: validate (must keep SynthID removed, not hallucinate micro-text, beat plain SDXL on the real upload distribution), then bump the library SHA in modal_app.py and pass auto=True.
image_io.py — Unicode-safe cv2 IO (issue #17). imread(path, flags=None) / imwrite(path, img) wrap np.fromfile+cv2.imdecode / cv2.imencode+tofile so non-ASCII paths work on Windows -- bare cv2.imread/cv2.imwrite use the platform ANSI code-page API there and fail (empty decode + can't open/read file) on Chinese/Cyrillic/accented filenames. imread keeps cv2.imread semantics (defaults to IMREAD_COLOR, returns None on missing/empty/undecodable). Every cv2 file read/write in the package routes through here; do not call cv2.imread/cv2.imwrite directly. imwrite returns False on an unwritable path (OSError caught) instead of raising, matching cv2.imwrite semantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env.

Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29)

RESOLVED 2026-05-29: black+gray Doubao captures were obtained and a reverse-alpha is built (doubao_engine.remove_watermark_reverse_alpha, assets/doubao_alpha.png; see the doubao_engine.py bullet above). The captures (data/doubao_capture/captures/, now committed) confirmed the alpha-composite model: on black captured = a*logo, logo pure white. UPDATE 2026-05-31 (issue #13 follow-up): the first build was NOT "exact" — it left a readable "豆包AI生成" outline on the real sample (the detector was fooled, conf 0.0). The alpha is now rebuilt by scripts/visible_alpha_solve.py (the careful gray-self solve shared with Jimeng), removal always-aligns + thin-inpaints, and the locate box was widened; see the doubao_engine.py bullet. The notes below (the failed content-image distillation) are retained as the record of why controlled captures were necessary.

Conclusion (historical): pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it. The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- data/spaces/originals/ holds plenty. Curate them with DoubaoEngine.detect + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. 15 pixel-aligned 2048² marks (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean O + weighted-LS (and per-pixel I-on-O regression) for α (+ logo colour) was tried end-to-end and still leaves a persistent ghost outline.

Diagnosed why, empirically (cached stacks, /tmp/doubao_distill): (1) the mark is a clean white overlay with no dark halo -- over glyph pixels ~54% are brighter than the clean bg, only ~4% darker -- so the white-logo model I=(1-α)O+α·255 is correct; (2) but content backgrounds are almost never dark under the mark (median darkest available bg over glyph pixels = 58/255; only ~13% of mark pixels are ever observed on a bg < 40), so on bright backgrounds the equation is ill-conditioned and α is unidentifiable; (3) LaMa's O is a plausible hallucination, not the true pre-mark background, which compounds the error, and per-pixel regression on ~15 obs overfits into colour noise.

Why Gemini's engine is clean (verified in GeminiWatermarkTool src/core/watermark_engine.cpp): its alpha map is the watermark stamped on a PURE-BLACK background, where watermarked = α·255 + (1-α)·0 = α·255, so alpha = capture/255 exactly -- no estimation. (gemini_bg_*.png is literally the sparkle in grey on black.) So the real Doubao unlock is the same controlled capture, not more content images. Black/white/gray seeds exist (data/doubao_capture/seeds/seed_*_1x1_2048x2048.png); a capture run (feed a black seed through doubao.com edit mode, download the original) was requested from the #13 reporter 2026-05-29. With ~2-3 black captures we get α = capture/255 for free, Gemini-quality.

Until black captures arrive, the shipped direction is precise canonical glyph mask + inpaint (cv2 default, lama optional), NOT reverse-alpha. The consensus glyph silhouette across the aligned marks distills cleanly (proto: a tight "豆包AI生成" strip, width ≈ 0.156 × image-width) and is good both as an exact inpaint mask and as an NCC localiser -- the latter also fixes the #23 detector false-positives (match the real glyph shape, not any bright low-saturation corner). Do not retry content-image reverse-alpha: it is data-limited by physics (no dark-background observations), not by effort.

Watermarking landscape (research 2026-05-24)

Who embeds what, and whether it is locally detectable (so we know which gaps are fillable). See identify.py for what we read.

Locally detectable (open decoder, no key/API): Stable Diffusion / SDXL / FLUX via imwatermark DWT-DCT (now covered by invisible_watermark.py). FLUX uses the same library (black-forest-labs/flux2 src/flux2/watermark.py, 48-bit 0b001010101111111010000111100111001111010100101110); SDXL is the diffusers WATERMARK_MESSAGE (0b101100111110110010010000011110111011000110011110). Caveat: fragile to re-encoding.
C2PA / IPTC (covered by the issuer/marker scan): OpenAI, Google, Adobe Firefly, Microsoft (Designer + Bing Image Creator — collected 2026-05-24; Bing now runs Microsoft's own MAI-Image model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), and Stability AI (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to C2PA_ISSUERS). Still unsampled: Canva (its downloads are re-encoded design exports that strip C2PA, so a Canva "positive" is inconclusive — skipped), Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (our mj-* sample carried only the IPTC tag). Samsung Galaxy AI (Generative Edit / Sketch to Image / Portrait Studio on Galaxy S23 FE / S24 / S25, One UI 7+) signs C2PA as "Samsung Galaxy" with the standard trainedAlgorithmicMedia source type AND a proprietary genAIType marker; verified on real signed files 2026-05-29 (the standard scan catches the source type; genAIType additionally catches a Galaxy S24 file that omits it). ASUS Gallery also signs edited photos as C2PA (com.asus.gallery) but with no AI source type — a signer, not an AI marker. Black Forest Labs (FLUX) API output signs C2PA: claim_generator_info "Black Forest Labs API" + a c2pa.ai_generated_content assertion + trainedAlgorithmicMedia (issuer b"Black Forest Labs" added to C2PA_ISSUERS, platform "Black Forest Labs (FLUX)"). ByteDance Volcano Engine (Volcengine) — the cloud behind Doubao / Jimeng — signs its AI image output with a cert from certificate_center@volcengine.com + trainedAlgorithmicMedia (issuer b"volcengine" → "ByteDance (Volcano Engine)", platform "ByteDance (Doubao / Jimeng / Volcano Engine)"); note this is the C2PA-signed surface, distinct from the XMP/PNG TC260 AIGC label Doubao also uses. All three verified on real signed files 2026-05-29.
EXIF/XMP generator tag (caught by exif_generator): Ideogram writes EXIF Make="Ideogram AI" (collected 2026-05-24 — no C2PA, no SynthID, no imwatermark; the Make tag is the only signal).
xAI / Grok — its own EXIF signature scheme, NOT C2PA (DETECTED by metadata.xai_signature, built 2026-05-26). Grok JPEG downloads (Aurora model) carry no C2PA, no XMP, no SynthID, no IPTC — only EXIF Artist = a UUID and EXIF ImageDescription = Signature: <base64> (a crypto signature, unverifiable locally without xAI's public key). This empirically kills the earlier unverified "xAI signs C2PA as xAI" lead — xAI is not even a C2PA member. exif_generator misses it (neither field holds an AI_GENERATOR_TOKENS token), so a dedicated detector xai_signature(path) matches the pair (ImageDescription ~ ^Signature: [A-Za-z0-9+/=]{64,} AND UUID Artist); wired into has_ai_metadata, get_ai_metadata (key xai_signature), and identify (signal xai_signature, platform "xAI (Grok / Aurora)"). Format confirmed stable across n=3 genuine generations: exactly three EXIF tags (Artist, ExifOffset, ImageDescription), Signature: prefix constant, base64 payload 300-1004 chars. Two capture facts: (a) the Artist UUID equals the public image id in the asset URL (https://imagine-public.x.ai/imagine-public/images/<uuid>.jpg), so it is NOT a private per-user secret — only the Signature blob is; (b) the Grok web-UI image is a re-encoded WebP with no signature — the EXIF survives only in the original JPEG (download button or that public tokenless URL), which is why screenshots / re-encodes are metadata-stripped. A real fixture data/samples/grok-1.jpg plus synthetic JPEG fixtures (fake UUID + fake Signature: blob) cover the detector; never add a real Grok image carrying private content (the repo is public). Stripped on removal too: remove_ai_metadata now calls _scrub_ai_exif on the JPEG EXIF, which deletes the xAI Signature+UUID-Artist pair and any Software/Make/Artist/ImageDescription tag holding an AI_GENERATOR_TOKENS token (so Ideogram's Make="Ideogram AI" is scrubbed too), while keeping genuine camera/editor EXIF. The shared _is_xai_signature_pair helper (module-level compiled regexes) is the single source of truth for the pattern, used by both xai_signature and _scrub_ai_exif. (AVIF/HEIF/JXL still strip only C2PA boxes via isobmff, not EXIF — unchanged.)
China TC260 AIGC label (caught by AIGC_MARKERS / metadata.aigc_label, surfaced by identify as the aigc signal): China-served generators embed an XMP <TC260:AIGC>{"Label":"1","ContentProducer":...} block — China's mandatory AI-content labeling (TC260 namespace tc260.org.cn/ns/AIGC). Doubao (ByteDance) uses it (verified on the real #13 sample 2026-05-25; ContentProducer 001191110102MACQD9K64010000, no C2PA/SynthID/imwatermark — the XMP block is the only signal; GitHub attachment upload did NOT strip it). The same standard is mandatory for Jimeng/Kling/Qwen/Ernie etc., so the one marker covers the whole China-AIGC-labeled ecosystem. aigc_label reads three serializations through a shared _parse helper: the HTML-entity-encoded XMP TC260:AIGC block in either RDF form — the nested element <TC260:AIGC>{...}</TC260:AIGC> (Doubao) or the attribute TC260:AIGC="{...}" (PicWish, ContentProducer="picwish", verified on the corpus 2026-05-30) — via a container-agnostic raw-byte scan (any JSON object accepted), a raw-JSON PNG AIGC tEXt chunk (Doubao also writes the label this way, no namespaced marker at all — confirmed on the corpus 2026-05-28, ContentProducer="doubao"), and a bare raw-JSON {"AIGC":{...}} object embedded in JPEG EXIF (UserComment) by some China-served generators, brace-matched from the scan head with json.JSONDecoder().raw_decode (no namespaced marker, no PNG chunk — confirmed on the corpus 2026-05-30, ContentProducer="001191440300708461136T1308L"). Both generic forms (the PNG chunk and the bare {"AIGC":...} object) are gated on at least one TC260 field (_TC260_FIELDS) so a generic AIGC key cannot false-positive; the namespaced XMP element is unambiguous and needs no gate. In identify, aigc fires on the parsed label or the AIGC_MARKERS byte scan (the latter preserves the laundering-tell case where the JSON payload is truncated).
HuggingFace-hosted job (caught by metadata.huggingface_job, surfaced by identify as the hf_job signal, MEDIUM confidence): HuggingFace Jobs / Spaces stamp generated PNGs with an hf-job-id tEXt chunk holding the job UUID (3 on the corpus 2026-05-28, no other signal). It marks the hosting job, not a model — most commonly diffusion output — so it lifts an Unknown verdict to a tentative AI via hf_only (parallel to the visible sparkle) but never overrides a hard metadata signal; _HF_JOB_CAVEAT states the limit (job, not model; not proof of AI pixels). Stripped on removal (the PNG save whitelist keeps only STANDARD_METADATA_KEYS, so hf-job-id and the AIGC chunk are both dropped). The exact writer is not authoritatively documented (HF Jobs are generic GPU jobs), hence medium not high.
No detectable signal on download (correctly reported unknown): Recraft (PNG export is a re-encoded design export — strips everything), Krea hosting FLUX 2 (no imwatermark despite FLUX — the host omits the encoder, same as Stability's hosted SDXL), and Midjourney (embeds nothing). Lesson: the imwatermark detector only fires on pristine output from a pipeline that runs the encoder (diffusers default, official BFL), not from re-hosts (Krea/Stability) or re-encoded exports (Recraft/Canva).
Invisible but NOT locally detectable (proprietary, API/oracle only — same wall as SynthID): Amazon Titan Image Generator + Nova Canvas (Bedrock DetectGeneratedContent API), Kakao (new SynthID image adopter, May 2026), NVIDIA Cosmos (SynthID video). No local detector possible; treat like SynthID.
C2PA 2.4 "Durable Content Credentials" (April 2026; verified against the spec) raise the bar for metadata stripping. 2.4 defines soft bindings (an invisible watermark or a content fingerprint) plus a server-side manifest repository and a new c2pa.repository-receipt assertion. Per the spec: "if a C2PA manifest is removed from an asset, but a copy of that manifest remains in a provenance store elsewhere, the manifest and asset may be matched using available soft bindings." So our local metadata --remove deletes the embedded manifest, but a fingerprint/watermark soft binding can still re-link the image to its manifest in a repository server-side. Stripping the file is becoming necessary-but-not-sufficient against durable provenance. (Our parsers target the stable embedded-manifest format documented in C2PA 2.1 §11; that format is unchanged in 2.4 -- the new pieces are repository/soft-binding infra, not the on-file box layout, so no parser change is implied.) Spec: https://spec.c2pa.org/specifications/specifications/2.4/specs/C2PA_Specification.html We now READ the soft-binding alg (C2PA_SOFT_BINDINGS / soft_binding_vendors_in) to name the forensic-watermark vendor, and locally DECODE the one open scheme, Adobe TrustMark (trustmark_detector); the rest (Digimarc/Imatag/Steg.AI/...) stay name-only (proprietary decoders).
Built 2026-05-26 (this batch): soft-binding alg vendor detection; IPTC Photo Metadata 2025.1 AI-disclosure fields (AISystemUsed etc.); video C2PA metadata detect + strip for MP4/MOV/M4V (free — isobmff.py is format-agnostic, MP4 is ISOBMFF); Adobe TrustMark open decoder. NOT done (out of cheap reach, per the feasibility review): visible video-logo removal (needs a video frame pipeline) and audio (SynthID/ElevenLabs/Resemble/Suno all oracle-only or unmarked). Box detection window — now handled (v0.6.8): detection no longer relies on a fixed first-MB read. metadata.scan_head(path, size) reads the first size bytes and, for ISOBMFF, appends the payloads of late provenance boxes found by isobmff.scan_c2pa_region (a file-seeking top-level box walker that skips past mdat by size without reading it), so a C2PA/AIGC/IPTC manifest placed AFTER a large mdat in a streaming/non-faststart MP4 is now caught. Every C2PA/marker byte scan (has_ai_metadata, aigc_label, iptc_ai_system, synthid_source, exif_generator XMP, get_ai_metadata soft-binding, and identify) goes through scan_head; it is behavior-neutral for non-ISOBMFF inputs (exactly f.read(size)). Meta-box XMP removal — now handled (v0.6.9): an AI-label XMP packet stored as a meta-box mime item (HEIF/AVIF; out of reach of the top-level box stripper) is blanked in place by isobmff.blank_ai_xmp_packets — it locates the packet by its <?xpacket begin … end?> delimiters and, if it carries an AI marker (_AI_LABEL_MARKERS), overwrites it with spaces of the SAME length, so box sizes / iloc offsets stay valid and the coded image is untouched (selective: plain non-AI XMP is left alone, mirroring the top-level uuid logic). Wired into remove_ai_metadata's ISOBMFF branch after strip_c2pa_boxes. The remaining gap is an Exif meta-box item (rare; the AI labels are XMP) — still needs iinf/iloc surgery or exiftool.
Regulatory driver (context, not a code change): AI-content labeling mandates are expanding, which pushes more generators toward exactly the C2PA + watermark signals we read. The full per-jurisdiction table lives in README "## Legal" -- keep it there, not duplicated here. Newly added + primary-source verified 2026-05-26: EU AI Act Article 50 machine-readable marking applicable 2026-08-02 (verified against the article text); South Korea AI Framework Act Art. 31(3) in force since 22 January 2026 (verified via Kim & Chang + FPF/Korea Times; Enforcement Decree accepts an invisible-watermark label); California AB 853 (amends the CA AI Transparency Act) latent-disclosure duty operative 2026-08-02, requiring a disclosure "permanent or extraordinarily difficult to remove" (verified against the leginfo bill text -- this is the exact disclosure our tool strips); India IT Amendment Rules 2026 in force 2026-02-20 (verified via Chambers), which prominently-label + permanent-provenance-id all synthetic media AND expressly prohibit removing/suppressing the label or metadata -- the first major all-content removal ban outside China. Removal liability (README "## Legal" disclaimer): the tool is lawful general-purpose software; liability sits with the remover and is intent-gated -- downstream acts (fraud/deception/IP), plus US DMCA 17 USC 1202 (removing copyright-management info to conceal infringement), plus the removal-as-such bans in China + India. When extending the README table, verify each date/article against the statute/bill text before committing, not against search summaries.

Known limitations

invisible pipeline processes at native resolution for inputs whose long side is >= 1024px, and auto-upscales smaller inputs UP to a 1024px floor (min_resolution=1024, the default; --min-resolution 0 disables) before diffusion -- SDXL img2img distorts badly on a tiny latent (a 381x512 portrait wrecks at native, the #36 follow-up), and the output is restored to the original input size so the floor is a transparent quality boost (it adds time/memory on small inputs). max_resolution=0 (default) means no downscale cap, matching the hosted raiw.cc backend (fal fast-sdxl, no pre-downscale). The old forced downscale-to-1024 -> upscale-back round-trip for LARGE images was the main quality loss (issue #10) and is gone; at strength ~0.05 SDXL img2img does not need a downscale. Final --unsharp post-filter (humanizer.unsharp_mask, opt-in, default 0): applied LAST (after the GFPGAN face pass, else it would be smoothed over) to counter the soft/over-smoothed look diffusion + restoration leave (an AI tell); ~0.5-0.8 safe, higher risks halos. Pairs with --humanize (grain adds sensor-noise texture, unsharp adds crispness). --max-resolution N re-introduces an opt-in long-side cap purely to bound GPU/MPS memory on very large inputs (it reintroduces the lossy round-trip). For huge images that OOM at native, tile-based diffusion is still the proper long-term fix. Concrete MPS data points (the OOM is memory-tier-dependent, NOT a hard MPS limit): on a ~24 GB unified-memory machine (verified 2026-05-25, 1254x1254 gpt-image SDXL, fp32) native res OOMs at the UNet step (peak ~17 GiB), not only the VAE decode, and the auto-fallback in img2img_runner reloads on CPU and finishes (slow, ~13 min) -- the output is still weight-identical and defeats SynthID, so "looks hung/crashed" on Mac is usually this CPU fallback, not a pipeline error. On a 32 GB unified-memory machine the same default SDXL pass runs entirely on MPS with no CPU fallback (verified 2026-05-31, 1122x1402 gpt-image, all/default, ~155 s end-to-end), so 32 GB clears the native-res UNet peak that 24 GB could not. Adding enable_vae_tiling() alone does NOT prevent the 24 GB OOM (the peak is the UNet, not the VAE). The fast Mac workarounds for memory-constrained machines are fp16 on MPS (roughly halves memory) or --max-resolution to cap the long side; neither is wired as the default. The controlnet pipeline adds the canny ControlNet weights on top of SDXL, so its peak is a bit higher than the plain default pass; the same MPS->CPU fallback covers an OOM. The native-vs-cap-vs-floor decision lives in the pure helper invisible_engine._target_size(w, h, max_resolution, min_resolution) (returns None for native, a target tuple for a downscale cap OR an upscale floor; cap takes precedence, the floor is skipped on a min>max misconfig) so it is unit-tested (tests/test_invisible_engine.py::TestTargetSize, the #10/#15/#36 regression guard) without loading the model -- keep that logic in the helper, don't re-inline it.
fp16 VAE black-output fix (issue #29, 2026-05-30): on a CUDA/XPU fp16 backend the stock SDXL VAE overflows to NaN and the plain img2img path decodes to an all-black image (reproduced on the raiw.cc result: a 1086x1448 input -> a uniformly black 4.6 KB PNG, mean 0). watermark_remover._load_pipeline / _load_controlnet_pipeline swap in the fp16-fixed SDXL VAE (madebyollin/sdxl-vae-fp16-fix = _SDXL_FP16_VAE_ID) when _needs_fp16_vae_fix(model_id, DEFAULT_MODEL_ID, is_fp16) is true -- only the default SDXL checkpoint on fp16. cpu/mps run fp32 (the stock VAE is fine there, which is why the bug never reproduces on Mac). A custom non-SDXL model_id keeps its own VAE (the fp16-fix VAE is SDXL-architecture-specific). The decision is a pure helper, unit-tested without a download (tests/test_platform.py::TestFp16VaeFix); the actual black->clean recovery needs a CUDA GPU. Confirmed on real CUDA hardware 2026-06-03: running all on a 1086x1448 OpenAI gpt-image (the #29 repro size) at fp16 produced a normal (non-black) output, so the fp16-fix VAE swap resolves the all-black decode. (It was not reproducible on this MPS machine, which runs fp32, so the verification had to happen on an NVIDIA box.)
Pyright first run is slow (2-3 min) due to ML deps (torch/diffusers/transformers stubs); full-project uv run pyright can stall for many minutes — scope it to changed files.
A third-party PIL plugin autoload (e.g. an HEIF/AVIF plugin) can raise a non-OSError (ModuleNotFoundError), not UnidentifiedImageError, when opening a file. Code that opens user-supplied or unknown-format files should except Exception, not just OSError/UnidentifiedImageError.
rich was dropped (CLI + scripts print plain text via click.echo). cli.py renders through small _Console/_Table/_Progress shims; the analysis scripts (scripts/synthid_corpus.py, synthid_pixel_probe.py, text_detection_benchmark.py, corpus_gap_scan.py) import Console/Table from the shared scripts/_plain_console.py shim (markup like [bold]/[/] is stripped, tables render aligned). Consequences: (1) rich is NOT a dependency, so anything that imports it breaks a clean uv sync --frozen (CI installs core+dev only) — this exact gap red-failed CI after the refactor when those 4 scripts still imported rich; if you add a script, use the _plain_console shim, not rich. (2) The old [gpu]-bracket-eaten bug (#19) is gone — plain click.echo prints pip install 'remove-ai-watermarks[gpu]' verbatim, no escaping needed (regression-guarded by tests/test_cli.py::TestGpuHintMarkup). (3) No Unicode glyphs / colors / progress bars in CLI output by design.
Metadata detection for AVIF/HEIF/JPEG-XL relies on a binary scan for C2PA_UUID + IPTC_AI_MARKERS, plus EXIF Software / XMP CreatorTool generator tags via metadata.exif_generator (validated with synthesized AVIF/JPEG fixtures + an XMP raw-scan fixture). C2PA removal in those containers is implemented via noai/isobmff.py (top-level uuid / jumb box stripper, no re-encoding), which now also drops a top-level XMP uuid box that carries an AI label (matched by AI-marker content, not by the XMP UUID, so byte-order-robust) and covers MP4/MOV/M4V/M4A by content sniff. Non-ISOBMFF audio/video removal is via ffmpeg (_FFMPEG_STRIP_EXTS -> _strip_with_ffmpeg): WebM/Matroska (EBML), MP3 (ID3), WAV/FLAC/OGG (RIFF/Vorbis) are stripped losslessly with ffmpeg -map_metadata -1 -map_chapters -1 -c copy (codec data untouched). Requires ffmpeg on PATH; raises RuntimeError if absent or if ffmpeg can't parse the file. Verified end-to-end (a real ffmpeg-made WAV/MP3 with a title=Suno AI tag -> tag gone, audio bytes preserved). Meta-box XMP now handled (isobmff.blank_ai_xmp_packets, v0.6.9): an AI-label XMP packet stored as a meta-box mime item (AVIF/HEIF) is blanked in place (overwritten with spaces of the same length, so iloc offsets and the coded image stay valid). Still NOT built: an Exif item inside the meta box (rare -- AI labels are XMP) needs full iinf/iloc surgery (offset rewrite) with corruption risk -- exiftool (R/W/C for HEIC/AVIF EXIF+XMP, verified on exiftool.org 2026-05-27) would do it but is a non-installed binary dep, so it stays a documented gap. Audio watermark DETECTION (Resemble PerTh) was evaluated and NOT built (2026-05-26): resemble-perth's PerthImplicitWatermarker.get_watermark() returns a raw bit-array with no presence/confidence flag (clean audio decodes to arbitrary bits too), so reliably distinguishing watermarked-from-clean needs either Resemble's fixed payload or a confidence API -- neither is public, and there's no real Resemble sample to calibrate against. Same wall-class as the SynthID pixel detector: the decode exists, reliable presence-detection does not. (perth's top-level PerthImplicitWatermarker is also gated to None unless librosa is importable.)
SynthID technical reference: docs/synthid.md — primary-source-cited doc covering mechanism (post-hoc encoder/decoder pair, 136-bit payload at 512x512, pixel-space, model weights NOT modified), robustness numbers (arXiv:2510.09263: ~99.98% TPR@0.1%FPR across 30 transforms including JPEG/crop/resize/color/noise), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable at >98% TPR@1%FPR), detectability limits (no public decoder, metadata-proxy only), oracle scope, and adoption landscape. Read that doc first before adding notes here.
SynthID detection is metadata-only. There is no reliable local detector of the SynthID pixel watermark — Google's decoder is proprietary, no public spec or API (only a waitlisted portal). Authoritative confirmation: Google DeepMind's own paper "SynthID-Image: Image watermarking at internet scale" (Gowal et al., arXiv:2510.09263) states the verification service is restricted to "trusted testers" and does not release detector weights or a reproducible algorithm — so a local pixel detector is infeasible by design, not just unbuilt. https://arxiv.org/abs/2510.09263 We detect SynthID by its C2PA companion (synthid_source / SYNTHID_C2PA_ISSUERS), which is reliable while the manifest is intact but says nothing once C2PA is stripped. Surface-dependent blind spot (verified 2026-05-24): the same Google model emits different metadata per surface -- the Gemini app wraps outputs in Google C2PA, but the API/playground (AI Studio, Nano Banana / gemini-2.5-flash-image) emits the SynthID pixel watermark (confirmed via the Gemini-app oracle) + the visible sparkle but no C2PA/IPTC at all, so synthid_source returns None despite SynthID being present. Only the pixel oracle or the visible-sparkle detector catches those. (Meta AI is another surface mismatch: it writes the IPTC digitalSourceType=trainedAlgorithmicMedia marker, not C2PA and not SynthID.) Google→SynthID is long-standing; OpenAI→SynthID is confirmed by OpenAI's Help Center (ChatGPT/Codex/API "include both C2PA metadata and SynthID watermarks", updated 2026-05-21) but time-gated (pre-rollout OpenAI images carry C2PA without SynthID), so the OpenAI verdict is hedged "likely". Oracles: Gemini app "Verify with SynthID" (Google), openai.com/verify (OpenAI). Each vendor's oracle detects only its OWN content (verified on the page 2026-05-31): openai.com/research/verify states verbatim "OpenAI generation signals will only be detected if the image was generated with our tools" and "Content could also still be AI-generated by another company's model, which the tool currently does not detect" -- SynthID is shared tech but the verifier is keyed to its own vendor's payload, so a Google-SynthID image reads clean on OpenAI's verifier and vice-versa. This explains the recurring "oracle says clean but identify still flags SynthID" report (#14): the oracle reads the pixel watermark (gone after our SDXL pass), while identify reads the C2PA-metadata proxy (still present if the manifest survived). Different signals, not a contradiction -- strip the metadata too (metadata --remove / all) and the proxy goes quiet, but a quiet proxy is not proof the pixel watermark is gone. SynthID is durable to JPEG re-encode by design, so a GitHub-recompressed issue attachment is still a valid SynthID test subject (verified 2026-06-01 on issue #14's pic3: the GitHub-served JPEG survived re-encoding and openai.com/verify still detected SynthID). Do NOT dismiss issue-attachment JPEGs as "not faithful originals" when reproducing a SynthID-survival report: the recompression strips the C2PA metadata (so identify reads Unknown on the attachment) but NOT the pixel watermark that openai.com/verify reads. A true byte-original only matters for the metadata/C2PA path, not for the pixel-SynthID-removal test. (Contrast the open imwatermark above, which IS fragile to JPEG.) The spectral phase-coherence approach from github.com/aloshdenny/reverse-SynthID was evaluated (May 2026) and does not work for real-content detection: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation 0.92, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24, scripts/synthid_pixel_probe.py, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate content (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content. Correction (deeper re-examination 2026-05-25): the carrier IS real on solid fills — the earlier "no carrier" was a method artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed phase at specific low frequencies, so the right metric is per-bin phase coherence. On 8 white gemini-2.5-flash-image fills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers (0,±7..±12,±20..±23) = 0.86 vs 0.31 random; single-image leave-one-out phase-match +0.83 vs real photos -0.24. (Black 2.5-flash fills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.) But it does not generalize: (a) carriers are model-version + resolution + color specific — the repo's v4 codebook (built for gemini-3.1-flash-image-preview + nano-banana-pro-preview) scores ~0.527 on my 2.5-flash white fills, indistinguishable from negatives (~0.50), i.e. carriers shift across model versions and need a per-model codebook; (b) on real content (30 2.5-flash images) the carrier collapses — set phase coherence at carriers 0.37 ≈ random 0.42, and the repo's v4 detector gives content 0.518 ≈ negatives 0.504 (no separation; a faint +0.24 single-image lean is likely a brightness confound). Net: the spectral/phase approach is a real controlled-fill characterizer, NOT an arbitrary-real-content detector, and is brittle to model version. Metadata proxy + visible sparkle + online oracles remain the ceiling for real content.
External AI-vs-real classifier models are out of scope (decided 2026-05-24). Generic HuggingFace detectors (Organika/sdxl-detector Swin Transformer, umm-maybe/AI-image-detector, and fine-tunes) exist and report ~0.98 on their own SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency.
DEFAULT STRENGTH IS NOW VENDOR-ADAPTIVE (2026-06-01, SUPERSEDES every fixed-default claim in this bullet and the next). resolve_strength(strength, profile, vendor) + vendor_for_strength(path) (watermark_profiles.py) read the C2PA issuer (metadata.synthid_source) on the ORIGINAL input and pick OPENAI_STRENGTH 0.10 / GEMINI_STRENGTH 0.15 / UNKNOWN_STRENGTH 0.15 when --strength is unset; explicit --strength always wins. The CLI detects the vendor from the pristine source (before the visible pass / metadata-strip removes C2PA from the temp file) and passes it to the engine, so display and execution agree; cmd_invisible/cmd_all/batch + the module-level remove_watermark all thread vendor. This replaces the single 0.30 default AND the prior "do NOT build a vendor-adaptive default" policy -- both came from the now-debunked region-rescrub-contaminated study (the per-region re-scrub that contaminated those numbers was removed in the controlnet refactor). Basis: the oracle-verified June 2026 controlled study (clean v0.8.6, protect OFF): OpenAI clears at 0.05 across 1024-1600 (n=4, resolution-independent); Google needs 0.15 on the capped-1536 path (n=4). docs/synthid.md §2.2 (data) + §5.2 (the adaptive default) are authoritative. CAVEAT: Google's 0.15 was validated only on --max-resolution 1536; native large Gemini (2816) was not locally measurable (OOM on M-series) and is pending GPU validation on raiw.cc -- if it survives 0.15 native, raise --strength. Everything below in this bullet about a fixed 0.10/0.30 default is HISTORICAL; trust the vendor-adaptive constants + docs/synthid.md.
SynthID removal: strength + oracle scope. Default strength is vendor-adaptive (see the bullet above); docs/synthid.md §2.2 is authoritative for the numbers. Oracle scope (load-bearing): the Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle (detects Google's mark on any image); openai.com/verify is scoped to OpenAI provenance (its own C2PA), NOT a SynthID oracle -- a negative there is meaningless for SynthID. There is no local SynthID detector, so the tool cannot self-check; if the oracle still reads SynthID, raise --strength to the lowest value that verifies clean. Only the default (plain SDXL img2img) and controlnet (SDXL + canny ControlNet) profiles exist; the local invisible default is weight-for-weight identical to raiw.cc prod (fal-ai/fast-sdxl = stabilityai/stable-diffusion-xl-base-1.0, runtime-downloaded, not bundled). Forensic-stealth caveat (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility -- independent detectors flag removal-processed images vs genuinely-clean ones at >98% TPR@1%FPR, so do not over-claim "indistinguishable from a real photo".
controlnet pipeline (text/face STRUCTURE preservation, EXPERIMENTAL, opt-in --pipeline controlnet). SDXL + the canny ControlNet xinsir/controlnet-canny-sdxl-1.0 via StableDiffusionXLControlNetImg2ImgPipeline (watermark_remover._run_controlnet / _load_controlnet_pipeline). Removal still comes from the img2img regeneration (strength); the ControlNet only PRESERVES text and face STRUCTURE by conditioning on the canny edge map (cv2.Canny(gray, 100, 200), 3-channel). Canny preserves edges, NOT face identity (a regenerated face drifts in likeness); face identity is preserved by the optional --restore-faces GFPGAN post-pass (EXPERIMENTAL, opt-in, OFF by default -- see face_restore.py, the restore extra), which re-synthesizes each face from a StyleGAN2 prior so the composited face pixels carry no watermark. The CodeFormer alternative stays NON-COMMERCIAL and is not shipped. The earlier --face-id IP-Adapter FaceID layer was REMOVED (footgun: it needs high strength and corrupts faces at the low removal strength). No original pixels are copied or frozen, so SynthID does not survive -- unlike the deleted text/face-protection subsystems, which restored or re-scrubbed original pixels and could shield the watermark. controlnet_conditioning_scale (CLI --controlnet-scale, default 1.0) is the structure-preservation knob (higher = closer to the original structure). It shares the SDXL base, so it uses the SAME vendor-adaptive strength as default (resolve_strength); fp32 on cpu/mps, fp16-fixed VAE on cuda/xpu. The controlnet profile is threaded explicitly (WatermarkRemover(pipeline=...) / InvisibleEngine(pipeline=...)), NOT inferred from model_id. This productionizes the scripts/controlnet_sweep.py prototype; see docs/controlnet-removal-pipeline-research.md. Forensic-stealth caveat still applies (arXiv:2605.09203): defeating the SynthID verifier is not forensic invisibility -- a "this image went through a removal pipeline" classifier can still flag the output.

92 KiB Raw Blame History Unescape Escape