Both text and face protection were shielding SynthID from removal. The text-protection high-res re-scrub regenerates pixels at an upscaled resolution where the per-region pass may not be strong enough to re-destroy the SynthID payload, allowing it to survive in text areas. Face protection has an even more direct mechanism: it pastes back the original (pre-diffusion, watermarked) face pixels after the global pass, guaranteeing SynthID survives in face regions regardless of strength. Both --protect-text and --protect-faces are now off by default and opt-in. Rename from --no-protect-text / --no-protect-faces to --protect-text / --protect-faces. Extract shared click.option decorators to module-level constants (_protect_text_option, _protect_faces_option) to eliminate copy-paste between cmd_invisible and cmd_all. Add docs/synthid.md: primary-source-cited technical reference for SynthID-Image covering mechanism (post-hoc encoder/decoder, 136-bit payload, pixel-space, no model-weight modification), robustness numbers (arXiv:2510.09263: ~99.98% TPR at 0.1% FPR across 30 transforms), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable >98% TPR@1%FPR), detectability limits, oracle scope, adoption landscape, and practical implications including the protect-text/faces SynthID-preservation finding. Verified June 2026 on gpt-image 1600x1600 via openai.com/verify: with --protect-text SynthID detected; without, SynthID removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
85 KiB
Remove-AI-Watermarks
You are a principal Python engineer maintaining a CLI tool and library for removing visible and invisible AI watermarks from images.
How to run
uv run remove-ai-watermarks all <image.png> -o <output.png>uv run remove-ai-watermarks visible <image.png> -o <out.png>— known-visible-mark removal, CPU, no GPU. Reverse-alpha based: each mark is removed by inverting its captured alpha map.--mark auto(default) picks the strongest detected of the Gemini sparkle, the Doubao "豆包AI生成" text strip, and the Jimeng "★ 即梦AI" wordmark;--mark gemini/--mark doubao/--mark jimengforce one. Gemini/Doubao recover pixels exactly with no inpaint at native; Jimeng adds an always-on residual inpaint over the glyph footprint (its mark re-rasterizes per image, so reverse-alpha alone leaves a faint outline). For arbitrary logos/objects useerase.uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>— universal region eraser (any logo/object, any position).--backend cv2(default, no deps) or--backend lama(big-LaMa via onnxruntime, extralama);--regionis repeatable.uv run remove-ai-watermarks identify <image>— provenance verdict (platform + watermark inventory + confidence);--jsonfor machine output,--no-visibleto skip the cv2 sparkle detectoruv run remove-ai-watermarks metadata <image.png> --check— inspect AI metadata (C2PA, EXIF, PNG chunks)uv run remove-ai-watermarks metadata <image.png> --remove -o <out.png>— strip all AI metadatauv run remove-ai-watermarks batch <directory>— process every supported image in a directory (output defaults to<directory>_clean/, set with-o).--mode visible|invisible|metadata|all(defaultvisible); the invisible/all path reuses the same--strength/--steps/--pipeline/--device/--max-resolution/--seed/--hf-tokenknobs asinvisible,--inpaint/--no-inpaintfor the visible pass, and--humanizefor the Analog Humanizer
Test and lint
- CI (
.github/workflows/test.yml): runs on push tomain+ every PR. Alintjob (ubuntu:ruff check+ruff format --check) plus atestmatrix (ubuntu/macos/windows x py3.10/3.12) that doesuv sync --frozen --extra devthenpytest. The matrix installs only core + dev (nogpuextra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keepuv.lockvalid (don't break--frozen) when editingpyproject.toml.publish.ymlstays release-only and now verifies the release tag matches thepyproject.tomlversion (fails the build on a mismatch) before building. Release flow: bump the version inpyproject.toml+src/remove_ai_watermarks/__init__.py+uv.lock(the project's own[[package]]entry, ~line 2868), commitchore(release): vX.Y.Z,git tag -a vX.Y.Z -m vX.Y.Z(annotated —git tagwithout-merrors here), pushmain+ the tag, thengh release create vX.Y.Z— PyPI publish triggers on the GitHub Releasepublishedevent, NOT on the tag push, so the tag alone does not publish. Sdist must excludedata/([tool.hatch.build.targets.sdist] exclude = ["/data"]): hatchling's default sdist bundles all VCS-tracked files, so the committeddata/test corpora (synthid_corpus images ~65 MB + the visible-mark captures) pushed the 0.8.0 sdist past PyPI's per-project file-size limit (400 "File too large") — the wheel uploaded but the sdist was rejected, so 0.8.0 shipped wheel-only and 0.8.1 carried the fix. The wheel only shipssrc/(via[tool.hatch.build.targets.wheel] packages), so it was never affected. A failed PyPI upload of one artifact still leaves the other live and you cannot re-upload the same version — fix the build and cut the next patch. Build backend is pinnedhatchling<1.28([build-system] requires): hatchling 1.28+ emits Metadata-Version 2.5 (PEP 639), which the twine bundled inpypa/gh-action-pypi-publish@release/v1rejects ("'2.5' is not a valid Metadata-Version") — this failed the v0.8.3 PyPI upload on 2026-06-01 (tag-match + build passed, the upload step failed; nothing was uploaded, so the version stayed empty on PyPI). 1.27.x emits 2.4, which uploads fine (0.8.2 shipped on it). The pin is unpinnedrequires = ["hatchling"]no longer safe becauseuv buildpulls the latest hatchling. Lift the pin only once the publish action's twine is ≥ 6.1.0 (2.5-aware) or the workflow moves touv publish. bash maintain.sh— uv-outdated, uv-secure, ruff check/fix, ruff format, pyright, pytest -n auto- Strict pyright is clean across
src/(0 errors). The cv2/torch/diffusers boundary files (gemini_engine,region_eraser,doubao_engine,face_protector,humanizer,invisible_engine,noai/watermark_remover, and the wholenoai/ctrlregen/subpackage) carry a documented per-file# pyright:relax pragma (or, forctrlregen, atool.pyright.executionEnvironmentsentry) that turns off only the unknown-type / untyped-third-party rules — those libs ship no usable types, so strict typing there fights the ecosystem. Pure-logic files stay fully strict;typings/piexif/__init__.pyiis a local stub sometadata.py/extractor.pyresolve piexif. Public ndarray-returning signatures on the relaxed engines are still annotatedNDArray[Any]so strict consumers (cli.py) stay clean. When touching a relaxed file, prefer fixing real issues over widening the pragma; keep the pragma scoped to genuinely-untyped boundaries. (uv-secureis clean since idna was bumped 3.11 -> 3.16, fixing GHSA-65pc-fj4g-8rjx.) - Full-project
uv run pyright(no path) OOMs/crashes node on this ML-heavy repo (emits alibnodestack frame, no summary) — a known environment limit, not a code error. Gate withuv run --extra dev --extra gpu pyright src/(completes, authoritative) or scope to changed files; also runuv run ruff checkanduv run pytestdirectly. - Run
uv runfrom the repo root — from another cwd it falls back to a bare env without numpy/cv2/torch. - To add a dev tool (pytest/ruff/pyright) into the env, use
uv sync --frozen --extra dev --extra gpu, neveruv pip install—uv pip installre-resolves and rewritesuv.lock, which silently bumpedtransformersto a build incompatible with the pinneddiffusers(cannot import name 'Qwen3VLForConditionalGeneration') and broke everyidentify/metadata import. Recovery:git checkout uv.lock && uv sync --frozen --extra gpu --extra dev. Thegpuextra holdsdiffusers/transformers/torch, so a bareuv sync(no extras) removes them;noai/__init__is now lazy (PEP 562__getattr__, so importingidentify/metadatano longer pullswatermark_remover/torch), so a bare env breaks only when the removal pipeline is actually invoked, not on import.maintain.sh'suv sync --all-extrasalso pulls the heavytrustmark/lamawheels (pytorch-lightning, onnxruntime) — fine on a good connection, but on flaky DNS sync only--extra gpu --extra devand run the lint/test steps by hand. - Metadata/C2PA tests assert against real committed fixtures in
data/samples/(chatgpt-*.png= OpenAI C2PA,firefly-1.png= Adobe,mj-*= Midjourney IPTC,doubao-1.png= ByteDance Doubao with the China TC260<TC260:AIGC>XMP label and a visible "豆包AI生成" text mark bottom-right;grok-1.jpg= xAI Grok with its EXIF-onlySignature:blob + UUIDArtistand no C2PA/SynthID/IPTC); synthetic byte blobs cover the JPEG/ISOBMFF format paths. The "non-AI / clean photo" control is no longer indata/samples/-- theclean_photoconftest fixture serves a verified-negative image from the corpusneg/set (skips if the corpus is absent). - SynthID reference corpus:
scripts/synthid_corpus.pyingests labeled images intodata/synthid_corpus/. The labeledimages/(pos/neg/cleaned/) are committed (public repo -- review every image for private content before adding;manifest.csvis kept in sync with the files on disk, one row per tracked image); only the syntheticrefs/calibration fills are gitignored. See its README for the collection protocol and verification oracles.
Configuration
- GPU/ML modules (invisible_engine, ctrlregen, watermark_remover) are optional — guard imports with
is_available()checks - Optional detection extras:
detect(imwatermark — open SD/SDXL/FLUX watermark) andtrustmark(Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded byis_available()and skipped byidentifywhen absent. - Tests for the model-running paths are limited to availability checks (multi-GB downloads). But the pure helpers inside ML-adjacent modules are unit-tested without any download and must stay that way:
_target_size(native-vs-downscale,test_invisible_engine.py), the MPS->CPU fallback control flow via mocked pipelines (test_img2img_runner.py, 100% cover), and the tiling mathtile_positions(now raisesValueErrorwhen not0 <= overlap < tile)/make_blend_weight/resize_center_crop(test_tiling.py;pytest.importorskip("torch")sincetiling.pyimports torch at module top). Don't skip these as "ML, needs a model" — onlyrun_tiled/remove_watermark/the diffusion bodies do.
Key modules
noai/c2pa.py— PNG chunk parser; useextract_c2pa_chunk(path)to get raw caBX payload,has_c2pa_metadata(path)to detect. Do not reimplement chunk parsing.extract_c2pa_info(path)setssynthid_watermark/synthid_vendorswhen the manifest is signed by a SynthID-using vendor, andsoft_binding/soft_binding_vendorswhen ac2pa.soft-bindingalgnames a forensic-watermark vendor (soft_binding_vendors_in(buffer)is the shared byte-scan, used by both the PNG parser and the non-PNG binary path). PNG/caBX chunk reads are clamped to the remaining file size (safe_length = min(length, remaining); skipped chunks use seek) so a malformed hugelengthcannot drive a multi-GB allocation (shared safety discipline matchingisobmff.scan_c2pa_region).noai/constants.py— PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, C2PA_ISSUERS,SYNTHID_C2PA_ISSUERS(issuers that pair SynthID with C2PA: Google, OpenAI), andC2PA_SOFT_BINDINGS(soft-bindingalgprefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new issuer/binding here, not inline.metadata.py—scan_head(path, size=1MB)is the shared input for every C2PA/AIGC/IPTC byte scan: firstsizebytes plus the payloads of any provenance metadata found beyond that window — for ISOBMFF, the late provenance boxes fromisobmff.scan_c2pa_region(catches a manifest after a largemdat); for PNG, the latetEXt/iTXt/zTXt/eXIf/iCCPchunks from_png_late_metadata(catches an XMP/EXIF packet appended after a largeIDAT, e.g. a TC260 AIGC label at ~2.7 MB). Behavior-neutral (f.read(size)) for non-ISOBMFF inputs and for any file that fits withinsize. Use it instead ofopen().read(1MB)for any new marker scan.synthid_source(path)returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker).get_ai_metadatasurfaces the verdict, andmetadata --checkprints it as a callout. Bothget_ai_metadataandhas_ai_metadataguard the PIL open withexcept Exception(HEIC/unknown formats raise non-OSError) and fall through to the binary scan.xai_signature(path)detects xAI/Grok's EXIF-only scheme (ImageDescription=Signature: <base64>+ UUIDArtist); it feedshas_ai_metadata,get_ai_metadata(keyxai_signature), andidentify.iptc_ai_system(path)detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (IPTC_AI_FIELD_MARKERS=AISystemUsed/AISystemVersionUsed/AIPromptInformation/AIPromptWriterName) and returns theAISystemUsedgenerator name (or"fields present").remove_ai_metadataroutes ISOBMFF video (.mp4/.mov/.m4v) through the sameisobmff.strip_c2pa_boxesas AVIF/HEIF (MP4 is ISOBMFF), and_scrub_ai_exifremoves the xAI signature + AI-generator EXIF tags on JPEG output.strip_c2pa_boxesis fail-safe on a malformed box: it returns the original bytes unchanged with a logged warning instead of truncating the tail to EOF (detection-onlyscan_c2pa_regionstill stops at a malformed box)._png_late_metadataclamps each late-chunk read to the remaining file size (safe_length = min(length, remaining)) so a malformedlengthcannot drive a multi-GB allocation.identify.py— the OpenAI rollout caveat is keyed on_vendor_of(synthid) == "OpenAI"(not a raw substring over the issuer + verdict blob).identify(path)aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1AISystemUsed, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature viametadata.xai_signature, the China TC260 AIGC label viametadata.aigc_label, the HuggingFacehf-job-idjob marker viametadata.huggingface_job, the Samsung Galaxy AI editing marker viametadata.samsung_genai, the visible marks — Gemini sparkle plus the ByteDance Doubao 豆包AI生成 / Jimeng 即梦AI text marks via thewatermark_registry— open invisible watermark, Adobe TrustMark viatrustmark_detector) into oneProvenanceReport.is_ai_generatedis True or None (never asserted False — stripped metadata is not proof of clean origin). Thehf_job, visible-mark, and Samsungsamsung_genaisignals are medium confidence: each lifts an otherwise-Unknown verdict to a tentative AI (hf_only/visible_only/samsung_only, parallel branches;visible_onlyfires on anyvisible_*signal) but is excluded from the high-confidenceai_from_metadataset, so none overrides a hard metadata signal. Visible-mark detection (check_visible, signalsvisible_sparkle/visible_doubao/visible_jimeng): the Gemini sparkle keeps its own file-level path (_visible_sparkle→gemini_engine.detect_sparkle_confidence, promoted only at confidence ≥_SPARKLE_THRESHOLD0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49), while Doubao/Jimeng reuse the registry detectors (_visible_text_marks→watermark_registry), each gated by its own engine NCC threshold viaMarkDetection.detected(Doubao 0.4, Jimeng 0.45). Doubao/Jimeng are normally also caught by the TC260 AIGC metadata label, so the visible path is their stripped-metadata fallback. Visible marks setplatformonly when no harder signal already did, and (like the sparkle) are excluded from integrity-clash vendor claims. The cv2 dependency lives in the engines, not here.import identifyis deliberately light (~21 MB; ~36 MB with cv2 loaded by a visible-mark run, ~106 MB for a fullcheck_visiblerun): it imports only the purenoai.c2pa/noai.constantssubmodules, andnoai/__init__is lazy (see "Test and lint"), so torch/diffusers are NOT pulled at import even in a fullgpu/detectinstall — fits a 512 MB host. The heavy paths are opt-in:check_invisible=Trueneeds thedetect/trustmarkextras (each pulls torch; TrustMark also downloads weights), so on a core-only deploy leavecheck_invisibleoff (it is a no-op there anyway). Before the lazy__init__, the mere presence of torch in the env inflatedimport identifyto ~420 MB. C2PA platform attribution is device-token-first, issuer-scan fallback (_device_platformscans manifest bytes for_DEVICE_C2PA_PLATFORMtokens, then_attribute_platform/_ISSUER_PLATFORM). Why, verified on real signed files 2026-05-26: the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. Token distinctiveness is load-bearing: bareb"Truepic"mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAIchatgpt-1.pngfixture), so the token is the specificb"Truepic_Lens"from the Lens SDK claim generator; likewiseb"Pixel Camera"(cert CN) not bareb"Pixel"._DEVICE_C2PA_PLATFORMlists ONLY tokens verified against a real C2PA file: Leica (lc_c2pa/Leica Camera), Nikon (NIKON), Pixel (Pixel Camera-- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (sony.sig/sony.cert-- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (Truepic_Lens). Canon/Bria have no public direct-download C2PA sample (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share thesony.*namespace but are not separately verified. Samsung Galaxy + ASUS Gallery live in a separate_SIGNER_C2PA_PLATFORM(scanned after_device_platform, before the issuer fallback), NOT in_DEVICE_C2PA_PLATFORM— verified on real signed files 2026-05-29. Reason: a Galaxy phone stamps BOTH its device cert AND atrainedAlgorithmicMedia/genAIType AI marker on a Generative-Edit image, so treating it as a "genuine camera capture" would false-fire integrity-clash rule 2 on every Galaxy AI edit. The signer tokens (b"Samsung Galaxy"cert org — distinct from the EXIFSM-xxxxmodel string on ordinary Samsung photos;b"com.asus.gallery"claim generator) only resolve the platform label; the AI verdict still comes from the source-type / genAIType. ASUS Gallery is a C2PA-signed edit with no AI marker, so it attributes the platform without assertingis_ai. Samsung'sgenAIType(in the proprietaryPhotoEditor_Re_Edit_DataJSON) is an undocumented Galaxy-AI editing marker (metadata.samsung_genai, gated on thePhotoEditor_Re_Edit_Datacontainer; non-zero value = AI tool used, values {1,5} observed): medium-confidence because the field has no public spec (verified 2026-05-29: absent from C2PA spec + Samsung docs), but it co-occurred withtrainedAlgorithmicMediain 3/3 verified files that record a source-type and was the SOLE AI marker on a Galaxy S24 file that omits the source type. Camera C2PA marks capture authenticity, not AI (Pixel carriescomputationalCapture, nottrainedAlgorithmicMedia), so these never setis_ai-- that stays driven by digital-source-type.c2pa.cbor_text_after(now public) is best-effort for thegeneratordetail string only and can be None when the manifest keys itclaim_generator_info(Pixel). Issuer→generator mapping isis_ai-gated (_attribute_platform(issuers, is_ai=c2pa_is_ai)): a specific AI-generator platform is named only when the digital-source-type istrainedAlgorithmicMedia; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an unmapped Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute)._attribute_platformdefaultsis_ai=Trueso the mapping stays unit-testable in isolation. Add capture-camera tokens to_DEVICE_C2PA_PLATFORM, editing-app/AI-device signer tokens to_SIGNER_C2PA_PLATFORM, generator/issuer platforms to_ISSUER_PLATFORM, not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (_issuers_in) and generator (_ai_tools_in, reusingC2PA_AI_TOOLS) are recovered by binary-scanning the first MB. EXIFSoftware/Make/Artist/ImageDescriptionand XMPCreatorToolgenerator tags are read bymetadata.exif_generator(PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched againstAI_GENERATOR_TOKENSso ordinary editors (plain "Adobe Photoshop") and real-cameraMake("Apple"/"Canon") are not flagged. Ideogram tags its output with EXIFMake="Ideogram AI"(verified on a real download 2026-05-24) — that's whyMakeis read. Integrity-clash detection (_integrity_clashes, surfaced asProvenanceReport.integrity_clashes, printed in red byidentifyand serialized to--json): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIFMake="Ideogram AI"), and (2) a camera-capture C2PA device (_DEVICE_C2PA_PLATFORM) coexisting with any AI-generation marker. Vendor normalization is_vendor_ofover_AI_VENDOR_TOKENS(so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). High-precision by design: only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTCAISystemUsed, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are excluded (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolvedplatform(a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce zero clashes (false-positive guard intest_identify.py::TestRealSamplesHaveNoClash).watermark_registry.py— single catalog of known visible watermarks, the unified "find known marks in their usual places, recognize, remove" entry. Reverse-alpha based by policy: a mark is listed only once a real alpha map has been captured for it, and removal inverts that map (original = (wm - a*logo)/(1-a)) — Gemini recovers cleanly with no inpaint (its sparkle alpha comes from a pure-black capture, so it is near-exact), while Doubao and Jimeng both add an always-on THIN residual inpaint over the glyph footprint (their text marks re-rasterize + jitter a few px per image, so a single capture cannot pixel-cancel them; the inpaint blends into the reverse-alpha-recovered pixels). Arbitrary-region inpainting still lives inregion_eraser/erase. EachKnownMarkties a key to {usuallocation,in_autoflag,recovery(="reverse-alpha"), adetectadapter → uniformMarkDetection, aremoveadapter}. Entries today:gemini(bottom-right sparkle),doubao(bottom-right "豆包AI生成"), andjimeng(bottom-right "★ 即梦AI").detect_marksscans all;best_auto_markpicks the highest-confidence detection. Cross-engine confidences aren't directly comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold (_GEMINI_AUTO_MIN_CONF) for itsdetectedflag — otherwise the gemini engine's loose internal threshold weakly fires (~0.36) on the Doubao text and hijacksauto. The shape-keyed Doubao/Jimeng NCC detectors don't cross-fire (jimeng scores ~0.22 on the Doubao strip, well under its 0.45 threshold), soautopicks the right one on a Doubao vs Jimeng image.cli.cmd_visibleis registry-driven:--mark auto→best_auto_mark,--mark <key>→ that mark;--markchoices come frommark_keys()._doubao_remove/_jimeng_removeapply reverse-alpha only when the mark is detected ANDreverse_alpha_available; outside that, removal is skipped (not inpainted). Add a new visible mark = oneKnownMarkentry + its engine (with a captured alpha map); do not re-add per-markifbranches in the CLI. Alpha-on-save policy (issue #30):cli._write_bgr_with_alpharejoins the input's alpha plane unchanged — it must NOT zero alpha in the watermark bbox. Reverse-alpha (anderaseinpaint) recover real pixels there, so zeroing alpha punched a transparent hole that renders as a solid white box on any non-transparent viewer (Gemini app exports are opaque RGBA, so every user hit it; regression-guarded bytest_visible_keeps_alpha_opaque_in_watermark_region). The registryremove()still returns its region (used forinpaint_residualpositioning), but the CLI no longer uses it to clear alpha.gemini_engine.py— visible Gemini-sparkle remover/detector (cv2/numpy, no GPU).detect_sparkle_confidence(path)is the file-level entry point used byidentify.py. The public entry points normalize a grayscale (2D) or RGBA (4-channel) input to BGR up front so a non-BGR image does not crash the cv2 pipeline. Removal is reverse-alpha with NO inpaint (remove_watermark→_reverse_alpha_blend): the sparkle alpha is computed (alpha = max(R,G,B)/255) from the bundled sparkle-on-black capturesassets/gemini_bg_{96,48}.png, which are PURE-BLACK so the alpha is near-exact — re-verified clean ondemo_banana_before.png2026-05-31 (the registry's optionalinpaint_residualis a no-op on a clean removal; an earlier "Gemini smears" read was a misjudged soft-fur original, not an artifact). The bg assets are now rebuilt from OUR OWN controlled captures (data/gemini_capture/captures/, committed) byscripts/visible_alpha_solve.py gemini, which locates the 96px sparkle on the black capture and crops it to the two logo sizes; our capture matched the previously third-party-sourcedgemini_bg_96.pngto NCC 0.9998, validating the asset and making it reproducible. Gemini's multi-size fixed-slot model is genuinely different from the Doubao/Jimeng text-strip engines (so it stays a separate engine, not part of the shared-base refactor).doubao_engine.py— visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU).DoubaoEngine.locateanchors a bottom-right box by geometry (mark scales with image WIDTH),extract_maskpulls the light, low-chroma glyphs (the detection candidate) using a per-pixel channel-spread proxysat = roi.max(axis=2) - roi.min(axis=2)(no HSV conversion).detectis shape-consistent: it matches the bundled alpha glyph silhouette (assets/doubao_alpha.png) against the candidate via zero-mean normalized correlation (_template_match_score, cv2TM_CCOEFF_NORMED), gated atDETECT_NCC_THRESHOLD0.4 over a smallDETECT_MIN_COVERAGEfloor. Keying on glyph SHAPE (not coverage heuristics) fixed #23 (corpus FP 7/1243). Removal = reverse-alpha + thin residual inpaint (remove_watermark_reverse_alpha):original = (wm - a*logo)/(1-a)from the bundled alpha map +_ALPHA_LOGO_BGR(pure white) +_ALPHA_*_FRACgeometry, then a deliberately THIN inpaint (_RESIDUAL_*,INPAINT_NS) over the glyph footprint clears leftover edges without smearing. Alpha is rebuilt byscripts/visible_alpha_solve.py(the careful gray-self solve: cubic background fit, mean over channels, full halo, unblurred), same recipe as Jimeng — the captures are committed indata/doubao_capture/captures/. Removal aligns ALWAYS (no_ALPHA_NATIVE_BANDfast-path): it tries fixed geometry AND_aligned_alpha_map'sTM_CCOEFF_NORMEDscale+position search and keeps the lower-residual one — the mark is re-rasterized and a few px off per image, so fixed geometry alone leaves a visible outline even at 2048. The locate box (WM_*) is generous (0.22 wide, margins 0.004) and reaches close to the corner — a tight box (the old 0.185 / margin 0.012) let a corner-ward shift fall OUTSIDE the alignment search, so the align missed and a readable outline survived; regression-guarded bytest_recovers_shifted_mark_on_texture(composes the alpha shifted on a known texture; old box ~29 vs new ~1 mean residual). Issue #13 follow-up defect (found 2026-05-31): the SHIPPED Doubao removal left a clearly READABLE "豆包AI生成" outline on the realdoubao-1.pngsample, whiledetectreturned conf 0.0 (it is fooled by a thin outline) sotest_reverse_alpha_removes_markpassed and the old "56/56 clean" claim was detector-measured, not visual. Root cause: bad alpha (under-estimated, max ~0.65) + fixed-no-inpaint + tight box; the careful rebuild + always-align + thin inpaint + wide box takes it from a readable outline to faint texture-level traces (parity with Jimeng — a single capture cannot pixel-cancel a per-image re-rasterized mark). Lesson: a detector-only removal test is insufficient; assert visual residual (the textured-shift test).extract_maskguards a degenerate ROI (bh < 16 or bw < 16-> empty mask, skips cv2): the always-align removal scores each placement with a residualdetect(out), and on an extremely wide/short image (e.g. 2048x1,test_wide_short_does_not_raise) that fed cv2's GaussianBlur a ~1-px-tall ROI and faulted natively on Windows py3.12 (access violation, non-deterministic — one CI cell went red while a re-run passed); the old at-native path never randetecton degenerate sizes. Real images always clear the guard (theWM_*box floors aremax(16, …)height /max(40, …)width), so it only short-circuits slivers.reverse_alpha_availableis just "asset present"; the registry gates removal ondetect. The shipped third-party_refs/zhengsuanfa_doubao_alpha_120x20.pngis NOT a usable alpha (verified 2026-05-29). Arbitrary-region inpainting isregion_eraser/erase.jimeng_engine.py— visible Jimeng / Dreamina "★ 即梦AI" remover/detector (cv2/numpy, no GPU), built 2026-05-30 from issue #13's solid captures (@powersee). Mirrorsdoubao_engine:locateanchors a bottom-right box by geometry (scales with WIDTH),extract_maskpulls the light low-chroma glyphs (white top-hat + grayish + min-luma),detectmatches the bundled "即梦AI" glyph silhouette (assets/jimeng_alpha.png) viaTM_CCOEFF_NORMEDover a coverage floor. ThresholdDETECT_NCC_THRESHOLD0.45 cleanly separates real Jimeng marks (>=0.81) from the Doubao strip (0.21) and other AI output (0.0), so the two ByteDance marks don't cross-fire in--mark auto. Logo is pure white (255,255,255) (_ALPHA_LOGO_BGR; the white capture + an L-pair-solve confirm254.6); compositing is sRGB, not linear (a linear-light solve tripled the cross-residual). Alpha rebuilt by0.02), unblurred**. Gray (bg ~132) is the deliberate choice over black: it is the best proxy for real content (the mark sits on bright photo areas, not on black), and the careful build drops the gray self-residual to ~1.3. The mask quality, not the method, was the earlier limit — a max-channel / quadratic-bg / blurred / halo-truncated build (and a black-dominated LS) left a visible outline (lesson from issue #13: when reverse-alpha leaves a ghost, suspect the captured alpha map before adding heuristics or switching method). Geometry emitted by the solver atscripts/visible_alpha_solve.pyfrom the GRAY capture (data/jimeng_capture/captures/, the solid captures now committed):a = (I - B)/(255 - B), B a per-capture cubic background fit over the non-glyph pixels, **averaged over channels, full halo extent (down to a_ALPHA_NATIVE_WIDTH2048:_ALPHA_WIDTH_FRAC0.202,_ALPHA_HEIGHT_FRAC0.058, margins ~0.029. Removal = reverse-alpha + a deliberately THIN residual inpaint (remove_watermark_reverse_alpha,_RESIDUAL_DILATE5 over the_RESIDUAL_ALPHA_FLOOR0.05 footprint,_RESIDUAL_INPAINT_RADIUS2,INPAINT_NS): a single 2048 alpha cannot pixel-cancel the mark re-rasterized at another resolution (alpha maps from independent captures correlate 0.998, not 1.0; off-native reverse-alpha alone only halves the mark), so a tight inpaint clears the residual edges WITHOUT the texture/edge smear a wide full-footprint pass caused. Placement ALWAYS tries fixed geometry AND_aligned_alpha_map's NCC scale+position search, keeping the lower-residual — the mark re-rasterizes + jitters a few px per image even at the captured width, so fixed geometry alone misses (there is no_ALPHA_NATIVE_BANDfast-path; the scale search_ALPHA_ALIGN_SEARCHis fine-stepped, and theWM_*locate box is generous so a corner-ward shift stays inside the search — the same widen that fixed Doubao). Verified clean on the solid captures (native 2048; faint self-residual ~1.3 visible only on a dead-flat field, hidden by real texture) and a real 1440-wide Jimeng download (off-native, table edge preserved).reverse_alpha_availableis just "asset present"; the registry gates ondetect. No committed real sample (the real content download stays gitignored; only the solid calibration captures are committed) —tests/test_jimeng_engine.pysynthesizes a mark from the bundled alpha asset, andtest_recovers_shifted_mark_on_textureguards the align-on-shift path that the Doubao defect exposed. Jimeng images are independently caught by the China TC260 AIGC label inmetadata/identify, so this engine is the visible-mark removal path, not a newidentifysignal.region_eraser.py— universal region eraser (eraseCLI).erase(image, boxes=|mask=, backend=)normalizes grayscale (2D) and RGBA (4-channel) inputs up front (erase_cv2splits off any alpha plane and re-attaches it on the result):boxes_to_mask→cv2.inpaint(cv2backend, default, no deps) or big-LaMa via onnxruntime (lamabackend, extralama,Carve/LaMa-ONNXApache-2.0 model downloaded on first use, never bundled).erase_lamacrops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy_get_lama_sessionsingleton;lama_available()guards the optional import. LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU (FFC working set, not arena —enable_cpu_mem_arena=Falsedoes not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal.invisible_watermark.py—detect_invisible_watermark(path)decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via theimwatermarklibrary. Known fixed patterns (verified against upstream source) live in_BITS_48(SDXL 48-bit, FLUX.2 48-bit) and_SD1_STRING("StableDiffusionV1", SD 1.x/2.x). Optional dep (extradetect); returns None when absent. Thedetectextra pulls torch transitively (invisible-watermark declares torch a hard dep, andWatermarkDecodereagerly importsrivaGan->torchat import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separategpuextra required. Unlike SynthID this is locally detectable, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs.trustmark_detector.py—detect_trustmark(path)decodes the OPEN, keyless Adobe TrustMark watermark (the soft binding behind Adobe Durable Content Credentials,algcom.adobe.trustmark.P) via the optionaltrustmarkpackage (extratrustmark; pulls torch, downloads model weights on first use). Mirrorsinvisible_watermark.py(lazy singleton guarded by a double-checkedthreading.Lockso concurrent callers do not double-download the weights, top-of-module pyright pragma, returns None when absent). It detects provenance, not AI origin as such (TrustMark also marks human-authored content), soidentifylists it as a watermark without settingis_ai_generated. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only named via theC2PA_SOFT_BINDINGSscan, not decoded. False-positive gate (added 2026-05-29): TrustMark'swm_presentis a BCH error-correction validity flag that spuriously validates on a content-correlated fraction of un-watermarked images — AI-generated textures trip it far more than camera photos (verified 2026-05-29 on real files: it fires on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with a random-bytes decoded secret, while signal-free camera photos did not trip it). A genuine TrustMark is a durable soft binding engineered to survive re-encoding, sodetect_trustmarkre-decodes after a mild JPEG round-trip (_survives_reencode,_REENCODE_QUALITY95) and requires the same schema both times; every observed false positive collapsed (none survived even q95), so the gate is the durability property the watermark guarantees. The second decode runs only on the rare initial hit, so the cost is negligible. Do NOT remove the gate to "catch more" — a lone TrustMark hit without it is almost always content noise.text_protector.py— text-region protection for theinvisibleSDXL img2img pass (issue #21: CJK/small text deforms at watermark-removal strengths).is_available()gates oncv2.dnn.TextDetectionModel_DB;TextProtector.detect_text_boxes(bgr)runs the PP-OCRv3 DB ONNX detector (~2.4 MB, Apache-2.0, opencv_zoo, returns rotated quad polygons) — downloaded+cached to~/.cache/remove-ai-watermarkson first use via atomic temp-rename, never bundled, no torch (cv2.dnn only). Detection is script-agnostic (DB segments text regions, not characters), so Latin / Cyrillic / CJK / Hangul / Arabic / digits all detect identically — language was never the recall lever, resolution was._detection_input_size(h, w)(pure, unit-tested) detects at the native long side capped at_DET_MAX_LONG_SIDE(1536), never upscaled: the old fixed 736 downscaled large canvases so small text fell below the detector and was missed (issue #14, e.g. ~16 px text on a 2048 image).scripts/text_detection_benchmark.pymeasures recall across scripts × sizes × canvas: the cap fix lifts overall hit-rate 0.91 → 1.00 (worst cell 2048/16 px: 0.06 → 1.00) at ~100 ms CPU. Very large canvases with tiny text may still need tiling (documented limit, not built).build_change_map(boxes, h, w, preserve=0.9, feather=15)paints a Differential-Diffusion change map. Polarity (verified empirically): white(1.0)=PRESERVE original pixels, black(0.0)=MAX change; map is black bg +preserveinside text polygons, Gaussian-feathered edges, clipped to [0,1].preservestays below a hard 1.0 freeze by default so text still scrubs lightly (SynthID survives cropping). Default text protection iswatermark_remover._run_region_hires, NOT the differential change map. Differential Diffusion froze text in latent space (preserve<1.0), so the watermark survived inside text — violating the "remove SynthID everywhere" requirement; and the SDXL VAE's 8px latent cell softens sub-8px strokes regardless ofpreserve(architectural limit, confirmed by the DD authors — seedocs/text-protection-research.md)._run_region_hiresinstead: (1) scrubs the whole image (plain img2img), (2) RE-scrubs each detected text block at HIGH resolution and feather-composites it back.merge_text_regions(boxes,h,w)groups boxes into local blocks; each crop is upscaled by_REGION_HIRES_SCALE3.0 (applied as an integer factor via int(...), capped so a region stays under_REGION_MAX_MEGAPIXELS1.3 to avoid OOM; skipped if it can't reach 2x — very large text areas then fall back to the global scrub, tiling is the future fix), img2img-scrubbed, downscaled, phase-correlated back to the original crop to null the ~1-2px round-trip offset (the shift is applied only on a confident, small correlation --response > 0.3and|shift| < 4-- so a spurious large offset on a flat crop no longer garbles the composite; and after a CPU fallback the generator is dropped before the per-region passes to avoid an MPS-vs-CPU generator device mismatch) (a sub-pixel shift garbles the composite even when text is crisp; integer scale alone did NOT fix it because the diffusion pipeline rounds dims to a multiple of 8), thenfeather_pasted. Every pixel is regenerated, so the watermark is removed everywhere AND small text stays crisp (high-res strokes span >1 latent cell). Validated on synthetic 18px multilingual text: text-region SSIM 0.28 (plain) → 0.48 (region-hires), visually garbled → readable across Latin/Cyrillic/CJK, residual shift ~0.5px. Gated to the SDXLDEFAULT_MODEL_ID+ detector (_can_protect_text); no text → plain global scrub (text-free inputs pay only the cheap cv2 detection). CLI opt-in--protect-textoninvisible/all(OFF by default — see SynthID bullet).merge_text_regions+feather_pasteare pure, unit-tested without a model (tests/test_text_protector.py). The high-res re-scrub can shield SynthID in text regions (verified 2026-06-01: same gpt-image, with--protect-text→ SynthID detected by oracle; without → SynthID removed). The mechanism: the global pass at step 1 removes SynthID everywhere, but the per-region high-res re-scrub at step 2 regenerates those pixels from a higher-resolution crop -- if the per-region strength is insufficient at the effective upscaled resolution, SynthID can reconstitute. Until this is resolved,protect_textis EXPERIMENTAL and OFF by default. The legacy_run_differential/build_change_map/_load_differential_pipeline(communitypipeline_stable_diffusion_xl_differential_img2img,custom_revision="0.38.0") remain in the file but are no longer the default; the diff pipeline upcasts the VAE to fp32 internally, so do not addupcast_vae()/enable_attention_slicingthere (NaN/black on fp16 MPS).build_change_mapis still unit-tested.face_protector.py— YOLO detect + soft-blend pattern; mirror this for any "protect region during diffusion" features. The expensive extract+blend already runs only when a face is found, but the YOLO detector itself always loads+runs to decide; CLI opt-in--protect-facesoninvisible/all(OFF by default, experimental). Face protection has an even more direct SynthID-preservation mechanism than text protection: it extracts face regions from the ORIGINAL (watermarked) image BEFORE the diffusion pass, then blends those original pixels BACK after the global pass. Those restored pixels are the unprocessed originals -- SynthID is guaranteed to survive in face regions (not just possibly, as with text re-scrub). Any image with faces processed with--protect-faceswill have SynthID intact in the face areas regardless of strength.humanizer.py— optional post-process "humanize" effects (cv2/numpy). The chromatic-shift step replicates the border instead of wrapping opposite-edge pixels, so a shifted channel no longer bleeds the far edge into the near one.image_io.py— Unicode-safe cv2 IO (issue #17).imread(path, flags=None)/imwrite(path, img)wrapnp.fromfile+cv2.imdecode/cv2.imencode+tofileso non-ASCII paths work on Windows -- barecv2.imread/cv2.imwriteuse the platform ANSI code-page API there and fail (empty decode +can't open/read file) on Chinese/Cyrillic/accented filenames.imreadkeepscv2.imreadsemantics (defaults toIMREAD_COLOR, returnsNoneon missing/empty/undecodable). Every cv2 file read/write in the package routes through here; do not callcv2.imread/cv2.imwritedirectly.imwritereturnsFalseon an unwritable path (OSErrorcaught) instead of raising, matchingcv2.imwritesemantics. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env.
Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29)
RESOLVED 2026-05-29: black+gray Doubao captures were obtained and a reverse-alpha is built (doubao_engine.remove_watermark_reverse_alpha, assets/doubao_alpha.png; see the doubao_engine.py bullet above). The captures (data/doubao_capture/captures/, now committed) confirmed the alpha-composite model: on black captured = a*logo, logo pure white. UPDATE 2026-05-31 (issue #13 follow-up): the first build was NOT "exact" — it left a readable "豆包AI生成" outline on the real sample (the detector was fooled, conf 0.0). The alpha is now rebuilt by scripts/visible_alpha_solve.py (the careful gray-self solve shared with Jimeng), removal always-aligns + thin-inpaints, and the locate box was widened; see the doubao_engine.py bullet. The notes below (the failed content-image distillation) are retained as the record of why controlled captures were necessary.
Conclusion (historical): pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it. The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- data/spaces/originals/ holds plenty. Curate them with DoubaoEngine.detect + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. 15 pixel-aligned 2048² marks (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean O + weighted-LS (and per-pixel I-on-O regression) for α (+ logo colour) was tried end-to-end and still leaves a persistent ghost outline.
Diagnosed why, empirically (cached stacks, /tmp/doubao_distill): (1) the mark is a clean white overlay with no dark halo -- over glyph pixels ~54% are brighter than the clean bg, only ~4% darker -- so the white-logo model I=(1-α)O+α·255 is correct; (2) but content backgrounds are almost never dark under the mark (median darkest available bg over glyph pixels = 58/255; only ~13% of mark pixels are ever observed on a bg < 40), so on bright backgrounds the equation is ill-conditioned and α is unidentifiable; (3) LaMa's O is a plausible hallucination, not the true pre-mark background, which compounds the error, and per-pixel regression on ~15 obs overfits into colour noise.
Why Gemini's engine is clean (verified in GeminiWatermarkTool src/core/watermark_engine.cpp): its alpha map is the watermark stamped on a PURE-BLACK background, where watermarked = α·255 + (1-α)·0 = α·255, so alpha = capture/255 exactly -- no estimation. (gemini_bg_*.png is literally the sparkle in grey on black.) So the real Doubao unlock is the same controlled capture, not more content images. Black/white/gray seeds exist (data/doubao_capture/seeds/seed_*_1x1_2048x2048.png); a capture run (feed a black seed through doubao.com edit mode, download the original) was requested from the #13 reporter 2026-05-29. With ~2-3 black captures we get α = capture/255 for free, Gemini-quality.
Until black captures arrive, the shipped direction is precise canonical glyph mask + inpaint (cv2 default, lama optional), NOT reverse-alpha. The consensus glyph silhouette across the aligned marks distills cleanly (proto: a tight "豆包AI生成" strip, width ≈ 0.156 × image-width) and is good both as an exact inpaint mask and as an NCC localiser -- the latter also fixes the #23 detector false-positives (match the real glyph shape, not any bright low-saturation corner). Do not retry content-image reverse-alpha: it is data-limited by physics (no dark-background observations), not by effort.
Watermarking landscape (research 2026-05-24)
Who embeds what, and whether it is locally detectable (so we know which gaps are fillable). See identify.py for what we read.
- Locally detectable (open decoder, no key/API): Stable Diffusion / SDXL / FLUX via
imwatermarkDWT-DCT (now covered byinvisible_watermark.py). FLUX uses the same library (black-forest-labs/flux2src/flux2/watermark.py, 48-bit0b001010101111111010000111100111001111010100101110); SDXL is the diffusersWATERMARK_MESSAGE(0b101100111110110010010000011110111011000110011110). Caveat: fragile to re-encoding. - C2PA / IPTC (covered by the issuer/marker scan): OpenAI, Google, Adobe Firefly, Microsoft (Designer + Bing Image Creator — collected 2026-05-24; Bing now runs Microsoft's own MAI-Image model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), and Stability AI (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to
C2PA_ISSUERS). Still unsampled: Canva (its downloads are re-encoded design exports that strip C2PA, so a Canva "positive" is inconclusive — skipped), Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (ourmj-*sample carried only the IPTC tag). Samsung Galaxy AI (Generative Edit / Sketch to Image / Portrait Studio on Galaxy S23 FE / S24 / S25, One UI 7+) signs C2PA as "Samsung Galaxy" with the standardtrainedAlgorithmicMediasource type AND a proprietarygenAITypemarker; verified on real signed files 2026-05-29 (the standard scan catches the source type;genAITypeadditionally catches a Galaxy S24 file that omits it). ASUS Gallery also signs edited photos as C2PA (com.asus.gallery) but with no AI source type — a signer, not an AI marker. Black Forest Labs (FLUX) API output signs C2PA:claim_generator_info "Black Forest Labs API"+ ac2pa.ai_generated_contentassertion +trainedAlgorithmicMedia(issuerb"Black Forest Labs"added toC2PA_ISSUERS, platform "Black Forest Labs (FLUX)"). ByteDance Volcano Engine (Volcengine) — the cloud behind Doubao / Jimeng — signs its AI image output with a cert fromcertificate_center@volcengine.com+trainedAlgorithmicMedia(issuerb"volcengine"→ "ByteDance (Volcano Engine)", platform "ByteDance (Doubao / Jimeng / Volcano Engine)"); note this is the C2PA-signed surface, distinct from the XMP/PNG TC260AIGClabel Doubao also uses. All three verified on real signed files 2026-05-29. - EXIF/XMP generator tag (caught by
exif_generator): Ideogram writes EXIFMake="Ideogram AI"(collected 2026-05-24 — no C2PA, no SynthID, no imwatermark; the Make tag is the only signal). - xAI / Grok — its own EXIF signature scheme, NOT C2PA (DETECTED by
metadata.xai_signature, built 2026-05-26). Grok JPEG downloads (Aurora model) carry no C2PA, no XMP, no SynthID, no IPTC — only EXIFArtist= a UUID and EXIFImageDescription=Signature: <base64>(a crypto signature, unverifiable locally without xAI's public key). This empirically kills the earlier unverified "xAI signs C2PA as xAI" lead — xAI is not even a C2PA member.exif_generatormisses it (neither field holds anAI_GENERATOR_TOKENStoken), so a dedicated detectorxai_signature(path)matches the pair (ImageDescription ~ ^Signature: [A-Za-z0-9+/=]{64,}AND UUIDArtist); wired intohas_ai_metadata,get_ai_metadata(keyxai_signature), andidentify(signalxai_signature, platform "xAI (Grok / Aurora)"). Format confirmed stable across n=3 genuine generations: exactly three EXIF tags (Artist,ExifOffset,ImageDescription),Signature:prefix constant, base64 payload 300-1004 chars. Two capture facts: (a) theArtistUUID equals the public image id in the asset URL (https://imagine-public.x.ai/imagine-public/images/<uuid>.jpg), so it is NOT a private per-user secret — only theSignatureblob is; (b) the Grok web-UI image is a re-encoded WebP with no signature — the EXIF survives only in the original JPEG (download button or that public tokenless URL), which is why screenshots / re-encodes are metadata-stripped. A real fixturedata/samples/grok-1.jpgplus synthetic JPEG fixtures (fake UUID + fakeSignature:blob) cover the detector; never add a real Grok image carrying private content (the repo is public). Stripped on removal too:remove_ai_metadatanow calls_scrub_ai_exifon the JPEG EXIF, which deletes the xAI Signature+UUID-Artist pair and anySoftware/Make/Artist/ImageDescriptiontag holding anAI_GENERATOR_TOKENStoken (so Ideogram'sMake="Ideogram AI"is scrubbed too), while keeping genuine camera/editor EXIF. The shared_is_xai_signature_pairhelper (module-level compiled regexes) is the single source of truth for the pattern, used by bothxai_signatureand_scrub_ai_exif. (AVIF/HEIF/JXL still strip only C2PA boxes viaisobmff, not EXIF — unchanged.) - China TC260 AIGC label (caught by
AIGC_MARKERS/metadata.aigc_label, surfaced byidentifyas theaigcsignal): China-served generators embed an XMP<TC260:AIGC>{"Label":"1","ContentProducer":...}block — China's mandatory AI-content labeling (TC260 namespacetc260.org.cn/ns/AIGC). Doubao (ByteDance) uses it (verified on the real #13 sample 2026-05-25;ContentProducer001191110102MACQD9K64010000, no C2PA/SynthID/imwatermark — the XMP block is the only signal; GitHub attachment upload did NOT strip it). The same standard is mandatory for Jimeng/Kling/Qwen/Ernie etc., so the one marker covers the whole China-AIGC-labeled ecosystem.aigc_labelreads three serializations through a shared_parsehelper: the HTML-entity-encoded XMPTC260:AIGCblock in either RDF form — the nested element<TC260:AIGC>{...}</TC260:AIGC>(Doubao) or the attributeTC260:AIGC="{...}"(PicWish,ContentProducer="picwish", verified on the corpus 2026-05-30) — via a container-agnostic raw-byte scan (any JSON object accepted), a raw-JSON PNGAIGCtEXt chunk (Doubao also writes the label this way, no namespaced marker at all — confirmed on the corpus 2026-05-28,ContentProducer="doubao"), and a bare raw-JSON{"AIGC":{...}}object embedded in JPEG EXIF (UserComment) by some China-served generators, brace-matched from the scan head withjson.JSONDecoder().raw_decode(no namespaced marker, no PNG chunk — confirmed on the corpus 2026-05-30,ContentProducer="001191440300708461136T1308L"). Both generic forms (the PNG chunk and the bare{"AIGC":...}object) are gated on at least one TC260 field (_TC260_FIELDS) so a genericAIGCkey cannot false-positive; the namespaced XMP element is unambiguous and needs no gate. Inidentify,aigcfires on the parsed label or theAIGC_MARKERSbyte scan (the latter preserves the laundering-tell case where the JSON payload is truncated). - HuggingFace-hosted job (caught by
metadata.huggingface_job, surfaced byidentifyas thehf_jobsignal, MEDIUM confidence): HuggingFace Jobs / Spaces stamp generated PNGs with anhf-job-idtEXt chunk holding the job UUID (3 on the corpus 2026-05-28, no other signal). It marks the hosting job, not a model — most commonly diffusion output — so it lifts an Unknown verdict to a tentative AI viahf_only(parallel to the visible sparkle) but never overrides a hard metadata signal;_HF_JOB_CAVEATstates the limit (job, not model; not proof of AI pixels). Stripped on removal (the PNG save whitelist keeps onlySTANDARD_METADATA_KEYS, sohf-job-idand theAIGCchunk are both dropped). The exact writer is not authoritatively documented (HF Jobs are generic GPU jobs), hence medium not high. - No detectable signal on download (correctly reported
unknown): Recraft (PNG export is a re-encoded design export — strips everything), Krea hosting FLUX 2 (no imwatermark despite FLUX — the host omits the encoder, same as Stability's hosted SDXL), and Midjourney (embeds nothing). Lesson: the imwatermark detector only fires on pristine output from a pipeline that runs the encoder (diffusers default, official BFL), not from re-hosts (Krea/Stability) or re-encoded exports (Recraft/Canva). - Invisible but NOT locally detectable (proprietary, API/oracle only — same wall as SynthID): Amazon Titan Image Generator + Nova Canvas (Bedrock
DetectGeneratedContentAPI), Kakao (new SynthID image adopter, May 2026), NVIDIA Cosmos (SynthID video). No local detector possible; treat like SynthID. - C2PA 2.4 "Durable Content Credentials" (April 2026; verified against the spec) raise the bar for metadata stripping. 2.4 defines soft bindings (an invisible watermark or a content fingerprint) plus a server-side manifest repository and a new
c2pa.repository-receiptassertion. Per the spec: "if a C2PA manifest is removed from an asset, but a copy of that manifest remains in a provenance store elsewhere, the manifest and asset may be matched using available soft bindings." So our localmetadata --removedeletes the embedded manifest, but a fingerprint/watermark soft binding can still re-link the image to its manifest in a repository server-side. Stripping the file is becoming necessary-but-not-sufficient against durable provenance. (Our parsers target the stable embedded-manifest format documented in C2PA 2.1 §11; that format is unchanged in 2.4 -- the new pieces are repository/soft-binding infra, not the on-file box layout, so no parser change is implied.) Spec: https://spec.c2pa.org/specifications/specifications/2.4/specs/C2PA_Specification.html We now READ the soft-bindingalg(C2PA_SOFT_BINDINGS/soft_binding_vendors_in) to name the forensic-watermark vendor, and locally DECODE the one open scheme, Adobe TrustMark (trustmark_detector); the rest (Digimarc/Imatag/Steg.AI/...) stay name-only (proprietary decoders). - Built 2026-05-26 (this batch): soft-binding
algvendor detection; IPTC Photo Metadata 2025.1 AI-disclosure fields (AISystemUsedetc.); video C2PA metadata detect + strip for MP4/MOV/M4V (free —isobmff.pyis format-agnostic, MP4 is ISOBMFF); Adobe TrustMark open decoder. NOT done (out of cheap reach, per the feasibility review): visible video-logo removal (needs a video frame pipeline) and audio (SynthID/ElevenLabs/Resemble/Suno all oracle-only or unmarked). Box detection window — now handled (v0.6.8): detection no longer relies on a fixed first-MB read.metadata.scan_head(path, size)reads the firstsizebytes and, for ISOBMFF, appends the payloads of late provenance boxes found byisobmff.scan_c2pa_region(a file-seeking top-level box walker that skips pastmdatby size without reading it), so a C2PA/AIGC/IPTC manifest placed AFTER a largemdatin a streaming/non-faststart MP4 is now caught. Every C2PA/marker byte scan (has_ai_metadata,aigc_label,iptc_ai_system,synthid_source,exif_generatorXMP,get_ai_metadatasoft-binding, andidentify) goes throughscan_head; it is behavior-neutral for non-ISOBMFF inputs (exactlyf.read(size)). Meta-box XMP removal — now handled (v0.6.9): an AI-label XMP packet stored as a meta-boxmimeitem (HEIF/AVIF; out of reach of the top-level box stripper) is blanked in place byisobmff.blank_ai_xmp_packets— it locates the packet by its<?xpacket begin … end?>delimiters and, if it carries an AI marker (_AI_LABEL_MARKERS), overwrites it with spaces of the SAME length, so box sizes /ilocoffsets stay valid and the coded image is untouched (selective: plain non-AI XMP is left alone, mirroring the top-level uuid logic). Wired intoremove_ai_metadata's ISOBMFF branch afterstrip_c2pa_boxes. The remaining gap is anExifmeta-box item (rare; the AI labels are XMP) — still needsiinf/ilocsurgery or exiftool. - Regulatory driver (context, not a code change): AI-content labeling mandates are expanding, which pushes more generators toward exactly the C2PA + watermark signals we read. The full per-jurisdiction table lives in README "## Legal" -- keep it there, not duplicated here. Newly added + primary-source verified 2026-05-26: EU AI Act Article 50 machine-readable marking applicable 2026-08-02 (verified against the article text); South Korea AI Framework Act Art. 31(3) in force since 22 January 2026 (verified via Kim & Chang + FPF/Korea Times; Enforcement Decree accepts an invisible-watermark label); California AB 853 (amends the CA AI Transparency Act) latent-disclosure duty operative 2026-08-02, requiring a disclosure "permanent or extraordinarily difficult to remove" (verified against the leginfo bill text -- this is the exact disclosure our tool strips); India IT Amendment Rules 2026 in force 2026-02-20 (verified via Chambers), which prominently-label + permanent-provenance-id all synthetic media AND expressly prohibit removing/suppressing the label or metadata -- the first major all-content removal ban outside China. Removal liability (README "## Legal" disclaimer): the tool is lawful general-purpose software; liability sits with the remover and is intent-gated -- downstream acts (fraud/deception/IP), plus US DMCA 17 USC 1202 (removing copyright-management info to conceal infringement), plus the removal-as-such bans in China + India. When extending the README table, verify each date/article against the statute/bill text before committing, not against search summaries.
Known limitations
-
invisiblepipeline processes at native resolution by default (max_resolution=0), matching the hosted raiw.cc backend (fal fast-sdxl, no pre-downscale). The old forced downscale-to-1024 -> upscale-back round-trip was the main quality loss (issue #10) and is gone; at strength ~0.05 SDXL img2img does not need the ~1024 downscale.--max-resolution Nre-introduces an opt-in long-side cap purely to bound GPU/MPS memory on very large inputs (it reintroduces the lossy round-trip). For huge images that OOM at native, tile-based diffusion is still the proper long-term fix. Concrete MPS data points (the OOM is memory-tier-dependent, NOT a hard MPS limit): on a ~24 GB unified-memory machine (verified 2026-05-25, 1254x1254 gpt-image SDXL, fp32) native res OOMs at the UNet step (peak ~17 GiB), not only the VAE decode, and the auto-fallback inimg2img_runnerreloads on CPU and finishes (slow, ~13 min) -- the output is still weight-identical and defeats SynthID, so "looks hung/crashed" on Mac is usually this CPU fallback, not a pipeline error. On a 32 GB unified-memory machine the same default SDXL pass runs entirely on MPS with no CPU fallback (verified 2026-05-31, 1122x1402 gpt-image,all/default, ~155 s end-to-end), so 32 GB clears the native-res UNet peak that 24 GB could not. Addingenable_vae_tiling()alone does NOT prevent the 24 GB OOM (the peak is the UNet, not the VAE). The fast Mac workarounds for memory-constrained machines are fp16 on MPS (roughly halves memory) or--max-resolutionto cap the long side; neither is wired as the default. ctrlregen is compute-bound, not memory-bound, on MPS: the clean-noise profile tiles to 512px (e.g. ~12 tiles for a 1122x1402 image) and runs the full step count per tile (strength 1.0 -> ~50 effective steps/tile), so on a base-tier Apple-GPU laptop it is ~25-30 min/image after the one-time DINOv2-giant download, regardless of having 32 GB -- a discrete CUDA GPU (fp16, no tiling) is the right place for ctrlregen, while the default SDXL pass is comfortable on a 32 GB Mac. The native-vs-downscale decision lives in the pure helperinvisible_engine._target_size(w, h, max_resolution)(returnsNonefor native, a clamped target tuple otherwise) so it is unit-tested (tests/test_invisible_engine.py::TestTargetSize, the #10/#15 regression guard) without loading the model -- keep that logic in the helper, don't re-inline it. -
fp16 VAE black-output fix (issue #29, 2026-05-30): on a CUDA/XPU fp16 backend the stock SDXL VAE overflows to NaN and the plain img2img path decodes to an all-black image (reproduced on the raiw.cc result: a 1086x1448 input -> a uniformly black 4.6 KB PNG, mean 0).
watermark_remover._load_pipelinenow swaps in the fp16-fixed SDXL VAE (madebyollin/sdxl-vae-fp16-fix=_SDXL_FP16_VAE_ID) when_needs_fp16_vae_fix(model_id, DEFAULT_MODEL_ID, is_fp16)is true -- only the default SDXL checkpoint on fp16. cpu/mps run fp32 (the stock VAE is fine there, which is why the bug never reproduces on Mac), and the differential / region-hires pipeline already upcasts the VAE itself (see thetext_protectorbullet). A custom non-SDXLmodel_idkeeps its own VAE (the fp16-fix VAE is SDXL-architecture-specific). The decision is a pure helper, unit-tested without a download (tests/test_platform.py::TestFp16VaeFix); the actual black->clean recovery needs a CUDA GPU and was NOT verifiable on this MPS machine -- confirm on the backend / an NVIDIA box. -
Pyright first run is slow (2-3 min) due to ML deps (torch/diffusers/transformers stubs); full-project
uv run pyrightcan stall for many minutes — scope it to changed files. -
ultralyticsmonkey-patchesPIL.Image.openand tries to autoloadpi_heif. Whenpi_heifis missing, opening files raisesModuleNotFoundError, notUnidentifiedImageError. Code that opens user-supplied or unknown-format files shouldexcept Exception, not justOSError/UnidentifiedImageError. -
rich was dropped (CLI + scripts print plain text via
click.echo).cli.pyrenders through small_Console/_Table/_Progressshims; the analysis scripts (scripts/synthid_corpus.py,synthid_pixel_probe.py,text_detection_benchmark.py,corpus_gap_scan.py) importConsole/Tablefrom the sharedscripts/_plain_console.pyshim (markup like[bold]/[/]is stripped, tables render aligned). Consequences: (1)richis NOT a dependency, so anything that imports it breaks a cleanuv sync --frozen(CI installs core+dev only) — this exact gap red-failed CI after the refactor when those 4 scripts still imported rich; if you add a script, use the_plain_consoleshim, not rich. (2) The old[gpu]-bracket-eaten bug (#19) is gone — plainclick.echoprintspip install 'remove-ai-watermarks[gpu]'verbatim, no escaping needed (regression-guarded bytests/test_cli.py::TestGpuHintMarkup). (3) No Unicode glyphs / colors / progress bars in CLI output by design. -
Metadata detection for AVIF/HEIF/JPEG-XL relies on a binary scan for
C2PA_UUID+IPTC_AI_MARKERS, plus EXIFSoftware/ XMPCreatorToolgenerator tags viametadata.exif_generator(validated with synthesized AVIF/JPEG fixtures + an XMP raw-scan fixture). C2PA removal in those containers is implemented vianoai/isobmff.py(top-leveluuid/jumbbox stripper, no re-encoding), which now also drops a top-level XMPuuidbox that carries an AI label (matched by AI-marker content, not by the XMP UUID, so byte-order-robust) and covers MP4/MOV/M4V/M4A by content sniff. Non-ISOBMFF audio/video removal is via ffmpeg (_FFMPEG_STRIP_EXTS->_strip_with_ffmpeg): WebM/Matroska (EBML), MP3 (ID3), WAV/FLAC/OGG (RIFF/Vorbis) are stripped losslessly withffmpeg -map_metadata -1 -map_chapters -1 -c copy(codec data untouched). Requires ffmpeg on PATH; raisesRuntimeErrorif absent or if ffmpeg can't parse the file. Verified end-to-end (a real ffmpeg-made WAV/MP3 with atitle=Suno AItag -> tag gone, audio bytes preserved). Meta-box XMP now handled (isobmff.blank_ai_xmp_packets, v0.6.9): an AI-label XMP packet stored as a meta-boxmimeitem (AVIF/HEIF) is blanked in place (overwritten with spaces of the same length, soilocoffsets and the coded image stay valid). Still NOT built: anExifitem inside themetabox (rare -- AI labels are XMP) needs fulliinf/ilocsurgery (offset rewrite) with corruption risk -- exiftool (R/W/C for HEIC/AVIF EXIF+XMP, verified on exiftool.org 2026-05-27) would do it but is a non-installed binary dep, so it stays a documented gap. Audio watermark DETECTION (Resemble PerTh) was evaluated and NOT built (2026-05-26):resemble-perth'sPerthImplicitWatermarker.get_watermark()returns a raw bit-array with no presence/confidence flag (clean audio decodes to arbitrary bits too), so reliably distinguishing watermarked-from-clean needs either Resemble's fixed payload or a confidence API -- neither is public, and there's no real Resemble sample to calibrate against. Same wall-class as the SynthID pixel detector: the decode exists, reliable presence-detection does not. (perth's top-levelPerthImplicitWatermarkeris also gated to None unlesslibrosais importable.) -
SynthID technical reference:
docs/synthid.md— primary-source-cited doc covering mechanism (post-hoc encoder/decoder pair, 136-bit payload at 512x512, pixel-space, model weights NOT modified), robustness numbers (arXiv:2510.09263: ~99.98% TPR@0.1%FPR across 30 transforms including JPEG/crop/resize/color/noise), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable at >98% TPR@1%FPR), detectability limits (no public decoder, metadata-proxy only), oracle scope, and adoption landscape. Read that doc first before adding notes here. -
SynthID detection is metadata-only. There is no reliable local detector of the SynthID pixel watermark — Google's decoder is proprietary, no public spec or API (only a waitlisted portal). Authoritative confirmation: Google DeepMind's own paper "SynthID-Image: Image watermarking at internet scale" (Gowal et al., arXiv:2510.09263) states the verification service is restricted to "trusted testers" and does not release detector weights or a reproducible algorithm — so a local pixel detector is infeasible by design, not just unbuilt. https://arxiv.org/abs/2510.09263 We detect SynthID by its C2PA companion (
synthid_source/SYNTHID_C2PA_ISSUERS), which is reliable while the manifest is intact but says nothing once C2PA is stripped. Surface-dependent blind spot (verified 2026-05-24): the same Google model emits different metadata per surface -- the Gemini app wraps outputs in Google C2PA, but the API/playground (AI Studio, Nano Banana / gemini-2.5-flash-image) emits the SynthID pixel watermark (confirmed via the Gemini-app oracle) + the visible sparkle but no C2PA/IPTC at all, sosynthid_sourcereturns None despite SynthID being present. Only the pixel oracle or the visible-sparkle detector catches those. (Meta AI is another surface mismatch: it writes the IPTCdigitalSourceType=trainedAlgorithmicMediamarker, not C2PA and not SynthID.) Google→SynthID is long-standing; OpenAI→SynthID is confirmed by OpenAI's Help Center (ChatGPT/Codex/API "include both C2PA metadata and SynthID watermarks", updated 2026-05-21) but time-gated (pre-rollout OpenAI images carry C2PA without SynthID), so the OpenAI verdict is hedged "likely". Oracles: Gemini app "Verify with SynthID" (Google), openai.com/verify (OpenAI). Each vendor's oracle detects only its OWN content (verified on the page 2026-05-31):openai.com/research/verifystates verbatim "OpenAI generation signals will only be detected if the image was generated with our tools" and "Content could also still be AI-generated by another company's model, which the tool currently does not detect" -- SynthID is shared tech but the verifier is keyed to its own vendor's payload, so a Google-SynthID image reads clean on OpenAI's verifier and vice-versa. This explains the recurring "oracle says clean butidentifystill flags SynthID" report (#14): the oracle reads the pixel watermark (gone after our SDXL pass), whileidentifyreads the C2PA-metadata proxy (still present if the manifest survived). Different signals, not a contradiction -- strip the metadata too (metadata --remove/all) and the proxy goes quiet, but a quiet proxy is not proof the pixel watermark is gone. SynthID is durable to JPEG re-encode by design, so a GitHub-recompressed issue attachment is still a valid SynthID test subject (verified 2026-06-01 on issue #14's pic3: the GitHub-served JPEG survived re-encoding and openai.com/verify still detected SynthID). Do NOT dismiss issue-attachment JPEGs as "not faithful originals" when reproducing a SynthID-survival report: the recompression strips the C2PA metadata (soidentifyreads Unknown on the attachment) but NOT the pixel watermark that openai.com/verify reads. A true byte-original only matters for the metadata/C2PA path, not for the pixel-SynthID-removal test. (Contrast the open imwatermark above, which IS fragile to JPEG.) The spectral phase-coherence approach fromgithub.com/aloshdenny/reverse-SynthIDwas evaluated (May 2026) and does not work for real-content detection: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation 0.92, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24,scripts/synthid_pixel_probe.py, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate content (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content. Correction (deeper re-examination 2026-05-25): the carrier IS real on solid fills — the earlier "no carrier" was a method artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed phase at specific low frequencies, so the right metric is per-bin phase coherence. On 8 whitegemini-2.5-flash-imagefills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers(0,±7..±12,±20..±23)= 0.86 vs 0.31 random; single-image leave-one-out phase-match +0.83 vs real photos -0.24. (Black2.5-flashfills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.) But it does not generalize: (a) carriers are model-version + resolution + color specific — the repo's v4 codebook (built forgemini-3.1-flash-image-preview+nano-banana-pro-preview) scores ~0.527 on my 2.5-flash white fills, indistinguishable from negatives (~0.50), i.e. carriers shift across model versions and need a per-model codebook; (b) on real content (302.5-flashimages) the carrier collapses — set phase coherence at carriers 0.37 ≈ random 0.42, and the repo's v4 detector gives content 0.518 ≈ negatives 0.504 (no separation; a faint +0.24 single-image lean is likely a brightness confound). Net: the spectral/phase approach is a real controlled-fill characterizer, NOT an arbitrary-real-content detector, and is brittle to model version. Metadata proxy + visible sparkle + online oracles remain the ceiling for real content. -
External AI-vs-real classifier models are out of scope (decided 2026-05-24). Generic HuggingFace detectors (
Organika/sdxl-detectorSwin Transformer,umm-maybe/AI-image-detector, and fine-tunes) exist and report ~0.98 on their own SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency. -
SynthID v2 vs default pipeline: CORRECTION (2026-05-31, oracle-verified GPU study, SUPERSEDES the 0.10 claim below): the current Gemini SynthID survives 0.10/0.15/0.2 and is REMOVED only at strength 0.3 (Modal A100, native res, Gemini-app "Verify with SynthID", n=3 FRESH Gemini images,
protect_text/facesOFF; 0.2 still present, 0.3 removed).DEFAULT_STRENGTHwas raised 0.10 -> 0.30 to match. The "0.10 removes it" finding below was n=1 and is now stale -- the threshold has climbed 0.05 -> 0.10 -> ~0.3 as Google hardens SynthID, so re-test against fresh Gemini periodically (moving target). 0.3 costs SSIM ~0.97 vs original (modest) but softens dense/fine typography, and is overkill for non-SynthID sources (OpenAI/ChatGPT carry C2PA, NOT Google SynthID -- their pixels read negative on the Gemini verifier at every strength, so 0.10 is plenty there; a per-source strength viaidentifywas considered and deferred in favour of the simpler single 0.3 default).protect_textis OFF by default (CORRECTED 2026-06-01, supersedes earlier A/B findings): verified on a gpt-image at 1600x1600 (issue #14, June 2026 oracle study): same image, with--protect-text→ SynthID detected by openai.com/verify; without → SynthID removed. The 2026-05-31 A/B finding ("protect_text does not block removal") was Gemini-SynthID-only and did not generalize to OpenAI gpt-image at 1600x1600. Mechanism: the global pass removes SynthID everywhere, but the per-region hires re-scrub regenerates those pixels from an upscaled crop -- at that effective resolution the per-region pass may be insufficient to re-destroy the payload. Bothprotect_textandprotect_facesare now EXPERIMENTAL, opt-in (--protect-text/--protect-faces), OFF by default. Oracle scope (load-bearing): the Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle (it detects Google's mark on any image);openai.com/verifyis scoped to OpenAI provenance (its own C2PA) and is NOT a SynthID oracle, so a negative there is meaningless for SynthID and the older "OpenAI cleared at 0.05 on openai.com/verify" notes below are about provenance, not a pixel-SynthID measurement. CORRECTION (2026-05-30): strength 0.05 does NOT remove the CURRENT Google SynthID (Nano Banana / Gemini 3). Re-verified via the Gemini "Verify with SynthID" oracle on a real image: at 0.05 SynthID is still detected; at 0.10 it is removed (OpenAI's SynthID was already cleared at 0.05). So the default strength was raised 0.05 -> 0.10 (DEFAULT_STRENGTHinwatermark_profiles.py; CLI--strengthdefaults to 0.10), and that higher strength is exactly why text protection (_run_region_hires) runs by default (text deforms more at 0.10). Caveat: n=1 Google + n=1 OpenAI image so far -- broad oracle validation across the corpus is pending (different images may need a different strength). Resolution dependence confirmed by a user report (#14, qw1212ss, 2026-05-31): on 1600x1600 gpt-image outputs checked via openai.com/verify, 0.05 left SynthID detected on 7/8 images, while small images (376x429) cleared at ~100% -- so "OpenAI cleared at 0.05" was a low-resolution result; a larger canvas carries a stronger watermark and needs more strength. Policy (do NOT chase a single magic number or build resolution/vendor-adaptive defaults): 0.10 is the default because it is what clears the watermark today; if the oracle still reads SynthID, the guidance is simply to raise--strength(0.12, then 0.15), using the lowest value that verifies clean. There is no local SynthID detector, so the tool cannot self-check and auto-tune; both vendors tighten the watermark over time, so any fixed value is a moving target. README "Removing SynthID" documents the strength-ladder guidance for users. The original claim below (0.05 defeats SynthID v2) held for the specific May-2026 Gemini output tested then but is stale for current Google SynthID. Verified end-to-end (May 2026): local SDXL run on a Gemini 3 Pro output, checked via the Gemini app's "Verify with SynthID" feature, returned "no SynthID watermark detected". Also confirmed against OpenAI's SynthID (2026-05-23): a fresh ChatGPT/gpt-image output read "SynthID detected" on openai.com/verify before the local SDXL run and "SynthID not detected" after (corpus regression chain: pos4ef377bd-> cleaned47188e88). The same configuration is used in raiw-app production (fal-ai/fast-sdxl/image-to-image, strength 0.05, steps 50, guidance 7.5, no pre-downscale). fal's ownllms.txtforfast-sdxlnames the base checkpoint asstabilityai/stable-diffusion-xl-base-1.0(verified 2026-05-25) -- the exact checkpoint the local CLI defaults to (DEFAULT_MODEL_ID). So the localinvisibledefault is weight-for-weight identical to prod; "fast-sdxl" is fal's optimized serving, not different weights. After the native-resolution fix the local pipeline matches prod on weights + strength + steps + guidance + resolution. SD-1.5 dreamshaper at 768 px was previously the default and does NOT defeat v2 — verified empirically against the same feature (strength 0.04, 0.10, and elastic warp α∈{5,8} all flagged positive). That SD-1.5 path was removed; onlydefault(SDXL) andctrlregenprofiles remain. Scope of the claim: defeating the SynthID verifier is NOT the same as forensic invisibility. "Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal" (arXiv:2605.09203, 2026-05) shows that six removal attacks across four families (UnMarker, CtrlRegen+, WatermarkAttacker, etc.) all leave forensic traces: independent detectors flag removal-processed images vs genuinely-clean ones at >98% TPR at 1% FPR. So our SDXL pass makes the oracle read "SynthID not detected," but the output can still be classifiable as "an image that went through a removal pipeline." Do not over-claim "indistinguishable from a real photo." https://arxiv.org/abs/2605.09203 -
CtrlRegen profile uses a clean-noise default strength, NOT the SDXL 0.10 (fixed 2026-05-31). CORRECTION (2026-05-31, same oracle-verified GPU study): ctrlregen at its clean-noise strength DESTROYS real images -- smooth/background regions fill with hallucinated micro-text garbage; the pipeline is binary (low strength = no-op, high = destroy, no usable middle) and heavy (~8.5 min / ~$0.30 vs ~25 s / ~$0.02 for SDXL). So the literature's "clean-noise is the lever" (detailed below) did NOT survive empirical testing on real content. ctrlregen is now flagged EXPERIMENTAL (CLI
--pipelinehelp, README, and thewatermark_profilescomment) and is NOT for production -- SDXL img2img at ~0.3 is the shippable path. The clean-noise-default plumbing below is kept (so the profile at least does real work if anyone opts in), but do not recommend ctrlregen.--pipeline ctrlregenno longer inherits the SDXL img2img--strengthdefault.resolve_strength(strength, profile)(watermark_profiles.py, pure + unit-tested intest_platform.py::TestResolveStrength) resolves an unset--strengthtoCTRLREGEN_DEFAULT_STRENGTH(1.0) for ctrlregen andDEFAULT_STRENGTH(0.10) for the SDXL default; an explicit--strengthalways wins (including0.0-- the resolver checksis None, not falsiness, so it does not repeat the oldstrength or DEFAULTbug). CLI--strengthforinvisible/allnow defaults to None (batch already did); the display (cli.py) and the engine (watermark_remover.remove_watermark) both route throughresolve_strengthso they never disagree. Why (deep-research pass 2026-05-31, primary sources): CtrlRegen's removal power comes from regenerating from (near) clean Gaussian noise, not the light partial-noise img2img the SDXL pass uses. CtrlRegen (ICLR 2025, arXiv:2410.05470) diagnoses verbatim that prior partial-noise regeneration "struggles with high-perturbation watermarks" because a small noise step "retains" watermark info that diffuses back into the output; the fix is a clean-noise start, which withStableDiffusionControlNetImg2ImgPipelinemaps to strength ~1.0 (image structure held by the canny ControlNet + DINOv2 IP-Adapter, not by the watermarked latent). Before the fix--pipeline ctrlregenran at 0.10 -- a near-identity pass that loaded ControlNet + DINOv2-giant and then barely changed the image (a removal no-op). NOT yet oracle-verified that clean-noise ctrlregen clears the stubborn high-texture gpt-image class that 0.20 SDXL img2img could not (issue #14, qw1212ss: pic3/6/7 survived SynthID through 0.05->0.20); that is the pending controlled test (via openai.com/verify with the IP-country rate-limit bypass). Forensic-stealth caveat applies harder here: regeneration-family removal is the MOST detectable as "an image that went through a removal pipeline" (CtrlRegen+ 99.97% TPR@1%FPR, arXiv:2605.09203). Two #14-investigation hypotheses the literature did NOT confirm: (1) our "VAE round-trip drives removal, denoising strength does not" framing is only PARTIALLY supported -- arXiv:2510.09263 confirms SynthID was hardened against weak VAE re-generation (explaining survival) but does not name the VAE round-trip as the removal vector; (2) our "survival correlates with high-frequency CONTENT texture (Laplacian 466 vs 236)" is unconfirmed by any primary source -- the literature establishes watermark-perturbation-strength dependence (a different axis), so the texture correlation stays our own unverified observation, not a literature-backed fact.