The text-protection detector scaled every image to a fixed 736 px long side, so small text on large canvases (e.g. ~16 px on 2048) was downscaled below the detector and missed -> deformed by the SDXL pass (issue #14). Detect at the native long side capped at 1536, never upscaled (_detection_input_size, a pure unit-tested helper). Detection is script-agnostic (DB segments regions, not characters), so this is language-agnostic: a new benchmark (scripts/text_detection_benchmark.py) measures recall across Latin/Cyrillic/CJK/ Hangul/Arabic/digits x sizes x canvas -> overall hit-rate 0.91 -> 1.00, worst cell (2048/16 px) 0.06 -> 1.00. Docs updated. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
49 KiB
Remove-AI-Watermarks
You are a principal Python engineer maintaining a CLI tool and library for removing visible and invisible AI watermarks from images.
How to run
uv run remove-ai-watermarks all <image.png> -o <output.png>uv run remove-ai-watermarks visible <image.png> -o <out.png>— visible-mark removal, CPU, no GPU.--mark auto(default) routes between the Gemini sparkle and the Doubao "豆包AI生成" text strip by detector confidence;--mark gemini/--mark doubaoforce one.uv run remove-ai-watermarks erase <image.png> --region x,y,w,h -o <out.png>— universal region eraser (any logo/object, any position).--backend cv2(default, no deps) or--backend lama(big-LaMa via onnxruntime, extralama);--regionis repeatable.uv run remove-ai-watermarks identify <image>— provenance verdict (platform + watermark inventory + confidence);--jsonfor machine output,--no-visibleto skip the cv2 sparkle detectoruv run remove-ai-watermarks metadata <image.png> --check— inspect AI metadata (C2PA, EXIF, PNG chunks)uv run remove-ai-watermarks metadata <image.png> --remove -o <out.png>— strip all AI metadata
Test and lint
- CI (
.github/workflows/test.yml): runs on push tomain+ every PR. Alintjob (ubuntu:ruff check+ruff format --check) plus atestmatrix (ubuntu/macos/windows x py3.10/3.12) that doesuv sync --frozen --extra devthenpytest. The matrix installs only core + dev (nogpuextra), so the GPU/model-running tests skip there and it exercises the metadata/identify/visible/cv2-eraser surface on all three OSes. Keepuv.lockvalid (don't break--frozen) when editingpyproject.toml.publish.ymlstays release-only. bash maintain.sh— uv-outdated, uv-secure, ruff check/fix, ruff format, pyright, pytest -n auto- Strict pyright is clean across
src/(0 errors). The cv2/torch/diffusers boundary files (gemini_engine,region_eraser,doubao_engine,face_protector,humanizer,invisible_engine,noai/watermark_remover, and the wholenoai/ctrlregen/subpackage) carry a documented per-file# pyright:relax pragma (or, forctrlregen, atool.pyright.executionEnvironmentsentry) that turns off only the unknown-type / untyped-third-party rules — those libs ship no usable types, so strict typing there fights the ecosystem. Pure-logic files stay fully strict;typings/piexif/__init__.pyiis a local stub sometadata.py/extractor.pyresolve piexif. Public ndarray-returning signatures on the relaxed engines are still annotatedNDArray[Any]so strict consumers (cli.py) stay clean. When touching a relaxed file, prefer fixing real issues over widening the pragma; keep the pragma scoped to genuinely-untyped boundaries. (uv-secureis clean since idna was bumped 3.11 -> 3.16, fixing GHSA-65pc-fj4g-8rjx.) - Full-project
uv run pyright(no path) OOMs/crashes node on this ML-heavy repo (emits alibnodestack frame, no summary) — a known environment limit, not a code error. Gate withuv run --extra dev --extra gpu pyright src/(completes, authoritative) or scope to changed files; also runuv run ruff checkanduv run pytestdirectly. - Run
uv runfrom the repo root — from another cwd it falls back to a bare env without numpy/cv2/torch. - To add a dev tool (pytest/ruff/pyright) into the env, use
uv sync --frozen --extra dev --extra gpu, neveruv pip install—uv pip installre-resolves and rewritesuv.lock, which silently bumpedtransformersto a build incompatible with the pinneddiffusers(cannot import name 'Qwen3VLForConditionalGeneration') and broke everyidentify/metadata import. Recovery:git checkout uv.lock && uv sync --frozen --extra gpu --extra dev. Thegpuextra holdsdiffusers/transformers/torch, so a bareuv sync(no extras) removes them andnoai/__init__(eager pipeline import) then fails.maintain.sh'suv sync --all-extrasalso pulls the heavytrustmark/lamawheels (pytorch-lightning, onnxruntime) — fine on a good connection, but on flaky DNS sync only--extra gpu --extra devand run the lint/test steps by hand. - Metadata/C2PA tests assert against real committed fixtures in
data/samples/(chatgpt-*.png= OpenAI C2PA,firefly-1.png= Adobe,mj-*= Midjourney IPTC,doubao-1.png= ByteDance Doubao with the China TC260<TC260:AIGC>XMP label and a visible "豆包AI生成" text mark bottom-right;grok-1.jpg= xAI Grok with its EXIF-onlySignature:blob + UUIDArtistand no C2PA/SynthID/IPTC); synthetic byte blobs cover the JPEG/ISOBMFF format paths. The "non-AI / clean photo" control is no longer indata/samples/-- theclean_photoconftest fixture serves a verified-negative image from the corpusneg/set (skips if the corpus is absent). - SynthID reference corpus:
scripts/synthid_corpus.pyingests labeled images intodata/synthid_corpus/. The labeledimages/(pos/neg/cleaned/) are committed (public repo -- review every image for private content before adding;manifest.csvis kept in sync with the files on disk, one row per tracked image); only the syntheticrefs/calibration fills are gitignored. See its README for the collection protocol and verification oracles.
Configuration
- GPU/ML modules (invisible_engine, ctrlregen, watermark_remover) are optional — guard imports with
is_available()checks - Optional detection extras:
detect(imwatermark — open SD/SDXL/FLUX watermark) andtrustmark(Adobe TrustMark decoder; pulls torch + downloads weights). Both are guarded byis_available()and skipped byidentifywhen absent. - Tests for the model-running paths are limited to availability checks (multi-GB downloads). But the pure helpers inside ML-adjacent modules are unit-tested without any download and must stay that way:
_target_size(native-vs-downscale,test_invisible_engine.py), the MPS->CPU fallback control flow via mocked pipelines (test_img2img_runner.py, 100% cover), and the tiling mathtile_positions/make_blend_weight/resize_center_crop(test_tiling.py;pytest.importorskip("torch")sincetiling.pyimports torch at module top). Don't skip these as "ML, needs a model" — onlyrun_tiled/remove_watermark/the diffusion bodies do.
Key modules
noai/c2pa.py— PNG chunk parser; useextract_c2pa_chunk(path)to get raw caBX payload,has_c2pa_metadata(path)to detect. Do not reimplement chunk parsing.extract_c2pa_info(path)setssynthid_watermark/synthid_vendorswhen the manifest is signed by a SynthID-using vendor, andsoft_binding/soft_binding_vendorswhen ac2pa.soft-bindingalgnames a forensic-watermark vendor (soft_binding_vendors_in(buffer)is the shared byte-scan, used by both the PNG parser and the non-PNG binary path).noai/constants.py— PNG_SIGNATURE, C2PA_CHUNK_TYPE, C2PA_SIGNATURES, C2PA_ISSUERS,SYNTHID_C2PA_ISSUERS(issuers that pair SynthID with C2PA: Google, OpenAI), andC2PA_SOFT_BINDINGS(soft-bindingalgprefix → forensic-watermark vendor: Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...). Add a new issuer/binding here, not inline.metadata.py—scan_head(path, size=1MB)is the shared input for every C2PA/AIGC/IPTC byte scan: firstsizebytes plus, for ISOBMFF, the late provenance-box payloads fromisobmff.scan_c2pa_region(catches a manifest after a largemdat); behavior-neutral (f.read(size)) for non-ISOBMFF. Use it instead ofopen().read(1MB)for any new marker scan.synthid_source(path)returns the vendor name(s) if the C2PA manifest implies a SynthID pixel watermark, else None. Format-agnostic: PNG via the caBX parser, JPEG/WebP/AVIF/HEIF/JXL via a binary scan (C2PA marker + SynthID issuer + AI-source marker).get_ai_metadatasurfaces the verdict, andmetadata --checkprints it as a callout. Bothget_ai_metadataandhas_ai_metadataguard the PIL open withexcept Exception(HEIC/unknown formats raise non-OSError) and fall through to the binary scan.xai_signature(path)detects xAI/Grok's EXIF-only scheme (ImageDescription=Signature: <base64>+ UUIDArtist); it feedshas_ai_metadata,get_ai_metadata(keyxai_signature), andidentify.iptc_ai_system(path)detects the IPTC Photo Metadata 2025.1 AI-disclosure XMP properties (IPTC_AI_FIELD_MARKERS=AISystemUsed/AISystemVersionUsed/AIPromptInformation/AIPromptWriterName) and returns theAISystemUsedgenerator name (or"fields present").remove_ai_metadataroutes ISOBMFF video (.mp4/.mov/.m4v) through the sameisobmff.strip_c2pa_boxesas AVIF/HEIF (MP4 is ISOBMFF), and_scrub_ai_exifremoves the xAI signature + AI-generator EXIF tags on JPEG output.identify.py—identify(path)aggregates every locally-readable signal (C2PA issuer→platform, C2PA soft-binding forensic-watermark vendor, IPTC "Made with AI" + IPTC 2025.1AISystemUsed, embedded SD/ComfyUI params, SynthID proxy, xAI/Grok EXIF signature viametadata.xai_signature, the China TC260 AIGC label viametadata.aigc_label, the HuggingFacehf-job-idjob marker viametadata.huggingface_job, visible Gemini sparkle, open invisible watermark, Adobe TrustMark viatrustmark_detector) into oneProvenanceReport.is_ai_generatedis True or None (never asserted False — stripped metadata is not proof of clean origin). Thehf_joband visible-sparkle signals are medium confidence: each lifts an otherwise-Unknown verdict to a tentative AI (hf_only/visible_only, parallel branches) but is excluded from the high-confidenceai_from_metadataset, so neither overrides a hard metadata signal. Visible-sparkle is promoted only at confidence ≥_SPARKLE_THRESHOLD(0.5; corpus-tuned to separate Gemini sparkles ≥0.56 from non-sparkle ≤0.49). The cv2 dependency lives ingemini_engine.detect_sparkle_confidence, not here. C2PA platform attribution is device-token-first, issuer-scan fallback (_device_platformscans manifest bytes for_DEVICE_C2PA_PLATFORMtokens, then_attribute_platform/_ISSUER_PLATFORM). Why, verified on real signed files 2026-05-26: the old issuer-only byte-scan matched ANY issuer substring anywhere, so multi-entity manifests mis-attributed -- Leica→"Truepic" (a signing authority in the trust chain), Nikon→"Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Pixel→"Google (Gemini)" ("Google LLC" cert org), Truepic→"Google". A distinctive device token wins instead. Token distinctiveness is load-bearing: bareb"Truepic"mis-fires (it appears in unrelated trust chains -- it mis-attributed the OpenAIchatgpt-1.pngfixture), so the token is the specificb"Truepic_Lens"from the Lens SDK claim generator; likewiseb"Pixel Camera"(cert CN) not bareb"Pixel"._DEVICE_C2PA_PLATFORMlists ONLY tokens verified against a real C2PA file: Leica (lc_c2pa/Leica Camera), Nikon (NIKON), Pixel (Pixel Camera-- from a real Pixel 10 Pro file attached to c2pa-rs issue #1609/#1554), Sony (sony.sig/sony.cert-- Sony's own C2PA assertion namespace, verified on a real Sony PXW-Z300 file; NOT bare "Sony" which is a common EXIF Make), Truepic (Truepic_Lens). Canon/Samsung/Bria have no public direct-download C2PA sample (checked exhaustively: GitHub issue/PR attachments, contentcredentials gallery, HF datasets -- all upload-to-verify or token-gated; Canon's only public file was a self-signed hobbyist CR3, not factory), so they stay unmapped until a real file is captured (same fixture discipline as Grok/Doubao). The Sony sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share thesony.*namespace but are not separately verified. Camera C2PA marks capture authenticity, not AI (Pixel carriescomputationalCapture, nottrainedAlgorithmicMedia), so these never setis_ai-- that stays driven by digital-source-type.c2pa.cbor_text_after(now public) is best-effort for thegeneratordetail string only and can be None when the manifest keys itclaim_generator_info(Pixel). Issuer→generator mapping isis_ai-gated (_attribute_platform(issuers, is_ai=c2pa_is_ai)): a specific AI-generator platform is named only when the digital-source-type istrainedAlgorithmicMedia; on a non-AI source an issuer substring is treated as incidental (an "Adobe XMP" toolkit string in an unmapped Canon/Sony capture would otherwise mislabel it "Adobe Firefly"), so it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type, so it is unaffected (verified: chatgpt-1.png→OpenAI, firefly-1.png→Adobe Firefly still attribute)._attribute_platformdefaultsis_ai=Trueso the mapping stays unit-testable in isolation. Add device tokens to_DEVICE_C2PA_PLATFORM, generator/issuer platforms to_ISSUER_PLATFORM, not inline. For non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) the caBX parser returns nothing, so issuer (_issuers_in) and generator (_ai_tools_in, reusingC2PA_AI_TOOLS) are recovered by binary-scanning the first MB. EXIFSoftware/Make/Artist/ImageDescriptionand XMPCreatorToolgenerator tags are read bymetadata.exif_generator(PIL+piexif for any format PIL opens incl. AVIF, plus a container-agnostic XMP raw-byte scan that also covers HEIF/JXL), matched againstAI_GENERATOR_TOKENSso ordinary editors (plain "Adobe Photoshop") and real-cameraMake("Apple"/"Canon") are not flagged. Ideogram tags its output with EXIFMake="Ideogram AI"(verified on a real download 2026-05-24) — that's whyMakeis read. Integrity-clash detection (_integrity_clashes, surfaced asProvenanceReport.integrity_clashes, printed in red byidentifyand serialized to--json): contradictions between independent generator stamps are a laundering/spoofing tell. Two rules: (1) two or more distinct AI-origin vendors named by independent signals (e.g. C2PA OpenAI + EXIFMake="Ideogram AI"), and (2) a camera-capture C2PA device (_DEVICE_C2PA_PLATFORM) coexisting with any AI-generation marker. Vendor normalization is_vendor_ofover_AI_VENDOR_TOKENS(so a C2PA "Google (Gemini)" issuer and a SynthID-Google proxy agree, while different vendors clash). High-precision by design: only hard generator stamps feed it (C2PA-issuer when source is AI, SynthID, EXIF/XMP generator, IPTCAISystemUsed, xAI, AIGC); the fuzzy visible sparkle and the open invisible watermark are excluded (the latter can be a by-product of our own SDXL removal pass). The c2pa vendor is classified from the issuer attribution / generator, NOT the resolvedplatform(a camera label like "Google Pixel" would mis-normalize to "Google"). All real single-origin fixtures (chatgpt/firefly/doubao/grok/mj) verified to produce zero clashes (false-positive guard intest_identify.py::TestRealSamplesHaveNoClash).gemini_engine.py— visible Gemini-sparkle remover/detector (cv2/numpy, no GPU).detect_sparkle_confidence(path)is the file-level entry point used byidentify.py.doubao_engine.py— visible Doubao "豆包AI生成" remover/detector (cv2/numpy, no GPU).DoubaoEngine.locateanchors a bottom-right box by geometry (mark scales with image WIDTH, fractions in module constants; no bundled template),extract_maskpulls the light low-saturation glyphs with a polarity-aware white top-hat (brighter-than-blurred-local-bg, so white-paper documents are left untouched instead of smeared),detectthresholds glyph coverage (DETECT_MIN_COVERAGE0.16 separates real marks ≥0.20 from corner noise, which stays ≤0.06 on large images but can spike to ~0.15 on tiny ones),remove_watermarkinpaints (cv2 Telea/NS) and bails when coverage >MAX_INPAINT_COVERAGE0.50 (dense-text background → would smear). Wired intovisible --markviacli._run_doubao_if_selected. Logo is near-white (~253), not the gray some third-party tools assume. Best on photo/illustration backgrounds; high-contrast edges leave faint residue (cv2-inpaint limit). Clean per-pixel reverse-alpha (Gemini-style) needs a black-background capture (alpha = capture/255), not more content images -- content-image distillation was tried and fails; see "Doubao clean-reverse-alpha distillation" below.region_eraser.py— universal region eraser (eraseCLI).erase(image, boxes=|mask=, backend=):boxes_to_mask→cv2.inpaint(cv2backend, default, no deps) or big-LaMa via onnxruntime (lamabackend, extralama,Carve/LaMa-ONNXApache-2.0 model downloaded on first use, never bundled).erase_lamacrops a padded region around the mask, runs LaMa at its fixed 512² input, pastes only masked pixels back (untouched areas stay pixel-exact). Lazy_get_lama_sessionsingleton;lama_available()guards the optional import. LaMa-ONNX costs ~3.5-4 GB peak RAM and ~5-6 s/call on CPU (FFC working set, not arena —enable_cpu_mem_arena=Falsedoes not help), so it does NOT fit a minimal droplet; the cv2 backend (tens of MB, ~30 ms) does. LaMa quality at low RAM = serverless/GPU, mirroring how raiw.cc offloads SDXL to fal.invisible_watermark.py—detect_invisible_watermark(path)decodes the OPEN DWT-DCT watermarks (public decoder, no key) embedded by Stable Diffusion / SDXL / FLUX via theimwatermarklibrary. Known fixed patterns (verified against upstream source) live in_BITS_48(SDXL 48-bit, FLUX.2 48-bit) and_SD1_STRING("StableDiffusionV1", SD 1.x/2.x). Optional dep (extradetect); returns None when absent. Thedetectextra pulls torch transitively (invisible-watermark declares torch a hard dep, andWatermarkDecodereagerly importsrivaGan->torchat import time), so detection needs torch present even though dwtDct runs CPU-only on cv2/numpy/pywavelets — no GPU and no separategpuextra required. Unlike SynthID this is locally detectable, but the watermark is fragile (does not survive JPEG re-encode/resize — verified gone after JPEG q90), so it confirms origin only on pristine files. Add new known patterns here. The file carries a top-of-module pyright pragma because imwatermark/cv2 ship no type stubs.trustmark_detector.py—detect_trustmark(path)decodes the OPEN, keyless Adobe TrustMark watermark (the soft binding behind Adobe Durable Content Credentials,algcom.adobe.trustmark.P) via the optionaltrustmarkpackage (extratrustmark; pulls torch, downloads model weights on first use). Mirrorsinvisible_watermark.py(lazy singleton, top-of-module pyright pragma, returns None when absent). It detects provenance, not AI origin as such (TrustMark also marks human-authored content), soidentifylists it as a watermark without settingis_ai_generated. Other soft-binding vendors (Digimarc/Imatag/Steg.AI/...) have no public decoder — they are only named via theC2PA_SOFT_BINDINGSscan, not decoded.text_protector.py— text-region protection for theinvisibleSDXL img2img pass (issue #21: CJK/small text deforms at watermark-removal strengths).is_available()gates oncv2.dnn.TextDetectionModel_DB;TextProtector.detect_text_boxes(bgr)runs the PP-OCRv3 DB ONNX detector (~2.4 MB, Apache-2.0, opencv_zoo, returns rotated quad polygons) — downloaded+cached to~/.cache/remove-ai-watermarkson first use via atomic temp-rename, never bundled, no torch (cv2.dnn only). Detection is script-agnostic (DB segments text regions, not characters), so Latin / Cyrillic / CJK / Hangul / Arabic / digits all detect identically — language was never the recall lever, resolution was._detection_input_size(h, w)(pure, unit-tested) detects at the native long side capped at_DET_MAX_LONG_SIDE(1536), never upscaled: the old fixed 736 downscaled large canvases so small text fell below the detector and was missed (issue #14, e.g. ~16 px text on a 2048 image).scripts/text_detection_benchmark.pymeasures recall across scripts × sizes × canvas: the cap fix lifts overall hit-rate 0.91 → 1.00 (worst cell 2048/16 px: 0.06 → 1.00) at ~100 ms CPU. Very large canvases with tiny text may still need tiling (documented limit, not built).build_change_map(boxes, h, w, preserve=0.9, feather=15)paints a Differential-Diffusion change map. Polarity (verified empirically): white(1.0)=PRESERVE original pixels, black(0.0)=MAX change; map is black bg +preserveinside text polygons, Gaussian-feathered edges, clipped to [0,1].preservestays below a hard 1.0 freeze by default so text still scrubs lightly (SynthID survives cropping). Wired intowatermark_remover._run_differentialvia the communitypipeline_stable_diffusion_xl_differential_img2img(loaded withcustom_revision="0.38.0"— HF resolves the PyPI version string, not thev0.38.0git tag); gated to the SDXLDEFAULT_MODEL_IDonly (_can_protect_text), falls back to plain img2img otherwise. Autonomous by default (protect_text=Trueininvisible_engine/watermark_remover, mirroringprotect_faces): the detector runs per image and_run_differentialfalls back to plain img2img when no boxes are found, so text-free inputs pay only the cheap cv2 detection (no differential-pipeline load). CLI exposes a single off-switch--no-protect-textoninvisible/all(passed asprotect_text=not no_protect_text); the unavailable-model case logs at debug, not warning, since it is now the default path. The diff pipeline upcasts the VAE to fp32 internally, so do not addupcast_vae()/enable_attention_slicing(both produced NaN/black on fp16 MPS).build_change_mapis unit-tested without any model download (tests/test_text_protector.py).face_protector.py— YOLO detect + soft-blend pattern; mirror this for any "protect region during diffusion" featuresimage_io.py— Unicode-safe cv2 IO (issue #17).imread(path, flags=None)/imwrite(path, img)wrapnp.fromfile+cv2.imdecode/cv2.imencode+tofileso non-ASCII paths work on Windows -- barecv2.imread/cv2.imwriteuse the platform ANSI code-page API there and fail (empty decode +can't open/read file) on Chinese/Cyrillic/accented filenames.imreadkeepscv2.imreadsemantics (defaults toIMREAD_COLOR, returnsNoneon missing/empty/undecodable). Every cv2 file read/write in the package routes through here; do not callcv2.imread/cv2.imwritedirectly. macOS/Linux already accept UTF-8 paths, so it is behavior-neutral there (the bug only reproduces on Windows). cv2/numpy are imported lazily inside the functions, so the module is cheap to import in a bare env.
Doubao clean-reverse-alpha distillation (re-investigated 2026-05-29)
Conclusion: pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it. The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- data/spaces/originals/ holds plenty. Curate them with DoubaoEngine.detect + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. 15 pixel-aligned 2048² marks (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean O + weighted-LS (and per-pixel I-on-O regression) for α (+ logo colour) was tried end-to-end and still leaves a persistent ghost outline.
Diagnosed why, empirically (cached stacks, /tmp/doubao_distill): (1) the mark is a clean white overlay with no dark halo -- over glyph pixels ~54% are brighter than the clean bg, only ~4% darker -- so the white-logo model I=(1-α)O+α·255 is correct; (2) but content backgrounds are almost never dark under the mark (median darkest available bg over glyph pixels = 58/255; only ~13% of mark pixels are ever observed on a bg < 40), so on bright backgrounds the equation is ill-conditioned and α is unidentifiable; (3) LaMa's O is a plausible hallucination, not the true pre-mark background, which compounds the error, and per-pixel regression on ~15 obs overfits into colour noise.
Why Gemini's engine is clean (verified in GeminiWatermarkTool src/core/watermark_engine.cpp): its alpha map is the watermark stamped on a PURE-BLACK background, where watermarked = α·255 + (1-α)·0 = α·255, so alpha = capture/255 exactly -- no estimation. (gemini_bg_*.png is literally the sparkle in grey on black.) So the real Doubao unlock is the same controlled capture, not more content images. Black/white/gray seeds exist (data/doubao_capture/seeds/seed_*_1x1_2048x2048.png); a capture run (feed a black seed through doubao.com edit mode, download the original) was requested from the #13 reporter 2026-05-29. With ~2-3 black captures we get α = capture/255 for free, Gemini-quality.
Until black captures arrive, the shipped direction is precise canonical glyph mask + inpaint (cv2 default, lama optional), NOT reverse-alpha. The consensus glyph silhouette across the aligned marks distills cleanly (proto: a tight "豆包AI生成" strip, width ≈ 0.156 × image-width) and is good both as an exact inpaint mask and as an NCC localiser -- the latter also fixes the #23 detector false-positives (match the real glyph shape, not any bright low-saturation corner). Do not retry content-image reverse-alpha: it is data-limited by physics (no dark-background observations), not by effort.
Watermarking landscape (research 2026-05-24)
Who embeds what, and whether it is locally detectable (so we know which gaps are fillable). See identify.py for what we read.
- Locally detectable (open decoder, no key/API): Stable Diffusion / SDXL / FLUX via
imwatermarkDWT-DCT (now covered byinvisible_watermark.py). FLUX uses the same library (black-forest-labs/flux2src/flux2/watermark.py, 48-bit0b001010101111111010000111100111001111010100101110); SDXL is the diffusersWATERMARK_MESSAGE(0b101100111110110010010000011110111011000110011110). Caveat: fragile to re-encoding. - C2PA / IPTC (covered by the issuer/marker scan): OpenAI, Google, Adobe Firefly, Microsoft (Designer + Bing Image Creator — collected 2026-05-24; Bing now runs Microsoft's own MAI-Image model, signs C2PA as "Microsoft", NOT OpenAI/DALL-E), and Stability AI (collected from Brand Studio / DreamStudio successor; signs C2PA as "Stability AI Ltd", no SynthID, no imwatermark on its current Stable Image model — issuer added to
C2PA_ISSUERS). Still unsampled: Canva (its downloads are re-encoded design exports that strip C2PA, so a Canva "positive" is inconclusive — skipped), Getty, Shutterstock. Midjourney embeds NO C2PA and no invisible watermark (ourmj-*sample carried only the IPTC tag). - EXIF/XMP generator tag (caught by
exif_generator): Ideogram writes EXIFMake="Ideogram AI"(collected 2026-05-24 — no C2PA, no SynthID, no imwatermark; the Make tag is the only signal). - xAI / Grok — its own EXIF signature scheme, NOT C2PA (DETECTED by
metadata.xai_signature, built 2026-05-26). Grok JPEG downloads (Aurora model) carry no C2PA, no XMP, no SynthID, no IPTC — only EXIFArtist= a UUID and EXIFImageDescription=Signature: <base64>(a crypto signature, unverifiable locally without xAI's public key). This empirically kills the earlier unverified "xAI signs C2PA as xAI" lead — xAI is not even a C2PA member.exif_generatormisses it (neither field holds anAI_GENERATOR_TOKENStoken), so a dedicated detectorxai_signature(path)matches the pair (ImageDescription ~ ^Signature: [A-Za-z0-9+/=]{64,}AND UUIDArtist); wired intohas_ai_metadata,get_ai_metadata(keyxai_signature), andidentify(signalxai_signature, platform "xAI (Grok / Aurora)"). Format confirmed stable across n=3 genuine generations: exactly three EXIF tags (Artist,ExifOffset,ImageDescription),Signature:prefix constant, base64 payload 300-1004 chars. Two capture facts: (a) theArtistUUID equals the public image id in the asset URL (https://imagine-public.x.ai/imagine-public/images/<uuid>.jpg), so it is NOT a private per-user secret — only theSignatureblob is; (b) the Grok web-UI image is a re-encoded WebP with no signature — the EXIF survives only in the original JPEG (download button or that public tokenless URL), which is why screenshots / re-encodes are metadata-stripped. A real fixturedata/samples/grok-1.jpgplus synthetic JPEG fixtures (fake UUID + fakeSignature:blob) cover the detector; never add a real Grok image carrying private content (the repo is public). Stripped on removal too:remove_ai_metadatanow calls_scrub_ai_exifon the JPEG EXIF, which deletes the xAI Signature+UUID-Artist pair and anySoftware/Make/Artist/ImageDescriptiontag holding anAI_GENERATOR_TOKENStoken (so Ideogram'sMake="Ideogram AI"is scrubbed too), while keeping genuine camera/editor EXIF. The shared_is_xai_signature_pairhelper (module-level compiled regexes) is the single source of truth for the pattern, used by bothxai_signatureand_scrub_ai_exif. (AVIF/HEIF/JXL still strip only C2PA boxes viaisobmff, not EXIF — unchanged.) - China TC260 AIGC label (caught by
AIGC_MARKERS/metadata.aigc_label, surfaced byidentifyas theaigcsignal): China-served generators embed an XMP<TC260:AIGC>{"Label":"1","ContentProducer":...}block — China's mandatory AI-content labeling (TC260 namespacetc260.org.cn/ns/AIGC). Doubao (ByteDance) uses it (verified on the real #13 sample 2026-05-25;ContentProducer001191110102MACQD9K64010000, no C2PA/SynthID/imwatermark — the XMP block is the only signal; GitHub attachment upload did NOT strip it). The same standard is mandatory for Jimeng/Kling/Qwen/Ernie etc., so the one marker covers the whole China-AIGC-labeled ecosystem.aigc_labelreads two serializations through a shared_parsehelper: the HTML-entity-encoded XMP<TC260:AIGC>block (container-agnostic raw-byte scan, any JSON object accepted) and a raw-JSON PNGAIGCtEXt chunk — Doubao also writes the label this way, with no namespaced marker at all (confirmed on the corpus 2026-05-28,ContentProducer="doubao"). The PNG-chunk path is gated on at least one TC260 field (_TC260_FIELDS) so a genericAIGCkey cannot false-positive. Inidentify,aigcfires on the parsed label or theAIGC_MARKERSbyte scan (the latter preserves the laundering-tell case where the JSON payload is truncated). - HuggingFace-hosted job (caught by
metadata.huggingface_job, surfaced byidentifyas thehf_jobsignal, MEDIUM confidence): HuggingFace Jobs / Spaces stamp generated PNGs with anhf-job-idtEXt chunk holding the job UUID (3 on the corpus 2026-05-28, no other signal). It marks the hosting job, not a model — most commonly diffusion output — so it lifts an Unknown verdict to a tentative AI viahf_only(parallel to the visible sparkle) but never overrides a hard metadata signal;_HF_JOB_CAVEATstates the limit (job, not model; not proof of AI pixels). Stripped on removal (the PNG save whitelist keeps onlySTANDARD_METADATA_KEYS, sohf-job-idand theAIGCchunk are both dropped). The exact writer is not authoritatively documented (HF Jobs are generic GPU jobs), hence medium not high. - No detectable signal on download (correctly reported
unknown): Recraft (PNG export is a re-encoded design export — strips everything), Krea hosting FLUX 2 (no imwatermark despite FLUX — the host omits the encoder, same as Stability's hosted SDXL), and Midjourney (embeds nothing). Lesson: the imwatermark detector only fires on pristine output from a pipeline that runs the encoder (diffusers default, official BFL), not from re-hosts (Krea/Stability) or re-encoded exports (Recraft/Canva). - Invisible but NOT locally detectable (proprietary, API/oracle only — same wall as SynthID): Amazon Titan Image Generator + Nova Canvas (Bedrock
DetectGeneratedContentAPI), Kakao (new SynthID image adopter, May 2026), NVIDIA Cosmos (SynthID video). No local detector possible; treat like SynthID. - C2PA 2.4 "Durable Content Credentials" (April 2026; verified against the spec) raise the bar for metadata stripping. 2.4 defines soft bindings (an invisible watermark or a content fingerprint) plus a server-side manifest repository and a new
c2pa.repository-receiptassertion. Per the spec: "if a C2PA manifest is removed from an asset, but a copy of that manifest remains in a provenance store elsewhere, the manifest and asset may be matched using available soft bindings." So our localmetadata --removedeletes the embedded manifest, but a fingerprint/watermark soft binding can still re-link the image to its manifest in a repository server-side. Stripping the file is becoming necessary-but-not-sufficient against durable provenance. (Our parsers target the stable embedded-manifest format documented in C2PA 2.1 §11; that format is unchanged in 2.4 -- the new pieces are repository/soft-binding infra, not the on-file box layout, so no parser change is implied.) Spec: https://spec.c2pa.org/specifications/specifications/2.4/specs/C2PA_Specification.html We now READ the soft-bindingalg(C2PA_SOFT_BINDINGS/soft_binding_vendors_in) to name the forensic-watermark vendor, and locally DECODE the one open scheme, Adobe TrustMark (trustmark_detector); the rest (Digimarc/Imatag/Steg.AI/...) stay name-only (proprietary decoders). - Built 2026-05-26 (this batch): soft-binding
algvendor detection; IPTC Photo Metadata 2025.1 AI-disclosure fields (AISystemUsedetc.); video C2PA metadata detect + strip for MP4/MOV/M4V (free —isobmff.pyis format-agnostic, MP4 is ISOBMFF); Adobe TrustMark open decoder. NOT done (out of cheap reach, per the feasibility review): visible video-logo removal (needs a video frame pipeline) and audio (SynthID/ElevenLabs/Resemble/Suno all oracle-only or unmarked). Box detection window — now handled (v0.6.8): detection no longer relies on a fixed first-MB read.metadata.scan_head(path, size)reads the firstsizebytes and, for ISOBMFF, appends the payloads of late provenance boxes found byisobmff.scan_c2pa_region(a file-seeking top-level box walker that skips pastmdatby size without reading it), so a C2PA/AIGC/IPTC manifest placed AFTER a largemdatin a streaming/non-faststart MP4 is now caught. Every C2PA/marker byte scan (has_ai_metadata,aigc_label,iptc_ai_system,synthid_source,exif_generatorXMP,get_ai_metadatasoft-binding, andidentify) goes throughscan_head; it is behavior-neutral for non-ISOBMFF inputs (exactlyf.read(size)). Meta-box XMP removal — now handled (v0.6.9): an AI-label XMP packet stored as a meta-boxmimeitem (HEIF/AVIF; out of reach of the top-level box stripper) is blanked in place byisobmff.blank_ai_xmp_packets— it locates the packet by its<?xpacket begin … end?>delimiters and, if it carries an AI marker (_AI_LABEL_MARKERS), overwrites it with spaces of the SAME length, so box sizes /ilocoffsets stay valid and the coded image is untouched (selective: plain non-AI XMP is left alone, mirroring the top-level uuid logic). Wired intoremove_ai_metadata's ISOBMFF branch afterstrip_c2pa_boxes. The remaining gap is anExifmeta-box item (rare; the AI labels are XMP) — still needsiinf/ilocsurgery or exiftool. - Regulatory driver (context, not a code change): AI-content labeling mandates are expanding, which pushes more generators toward exactly the C2PA + watermark signals we read. The full per-jurisdiction table lives in README "## Legal" -- keep it there, not duplicated here. Newly added + primary-source verified 2026-05-26: EU AI Act Article 50 machine-readable marking applicable 2026-08-02 (verified against the article text); South Korea AI Framework Act Art. 31(3) in force since 22 January 2026 (verified via Kim & Chang + FPF/Korea Times; Enforcement Decree accepts an invisible-watermark label); California AB 853 (amends the CA AI Transparency Act) latent-disclosure duty operative 2026-08-02, requiring a disclosure "permanent or extraordinarily difficult to remove" (verified against the leginfo bill text -- this is the exact disclosure our tool strips); India IT Amendment Rules 2026 in force 2026-02-20 (verified via Chambers), which prominently-label + permanent-provenance-id all synthetic media AND expressly prohibit removing/suppressing the label or metadata -- the first major all-content removal ban outside China. Removal liability (README "## Legal" disclaimer): the tool is lawful general-purpose software; liability sits with the remover and is intent-gated -- downstream acts (fraud/deception/IP), plus US DMCA 17 USC 1202 (removing copyright-management info to conceal infringement), plus the removal-as-such bans in China + India. When extending the README table, verify each date/article against the statute/bill text before committing, not against search summaries.
Known limitations
invisiblepipeline processes at native resolution by default (max_resolution=0), matching the hosted raiw.cc backend (fal fast-sdxl, no pre-downscale). The old forced downscale-to-1024 -> upscale-back round-trip was the main quality loss (issue #10) and is gone; at strength ~0.05 SDXL img2img does not need the ~1024 downscale.--max-resolution Nre-introduces an opt-in long-side cap purely to bound GPU/MPS memory on very large inputs (it reintroduces the lossy round-trip). For huge images that OOM at native, tile-based diffusion is still the proper long-term fix. Concrete MPS data point (verified 2026-05-25 on a 1254x1254 gpt-image SDXL run, fp32, 20 GB MPS ceiling): native res OOMs at the UNet step (peak ~17 GiB), not only the VAE decode, and the auto-fallback inimg2img_runnerreloads on CPU and finishes (slow, ~13 min) -- the output is still weight-identical and defeats SynthID, so "looks hung/crashed" on Mac is usually this CPU fallback, not a pipeline error. Addingenable_vae_tiling()alone does NOT prevent it (the peak is the UNet, not the VAE). The fast Mac workarounds are fp16 on MPS (roughly halves memory) or--max-resolutionto cap the long side; neither is wired as the default. The native-vs-downscale decision lives in the pure helperinvisible_engine._target_size(w, h, max_resolution)(returnsNonefor native, a clamped target tuple otherwise) so it is unit-tested (tests/test_invisible_engine.py::TestTargetSize, the #10/#15 regression guard) without loading the model -- keep that logic in the helper, don't re-inline it.- Pyright first run is slow (2-3 min) due to ML deps (torch/diffusers/transformers stubs); full-project
uv run pyrightcan stall for many minutes — scope it to changed files. ultralyticsmonkey-patchesPIL.Image.openand tries to autoloadpi_heif. Whenpi_heifis missing, opening files raisesModuleNotFoundError, notUnidentifiedImageError. Code that opens user-supplied or unknown-format files shouldexcept Exception, not justOSError/UnidentifiedImageError.- rich
console.printparses[word]as a style tag and silently drops unknown ones. A literal bracketed token in a print string disappears:pip install 'remove-ai-watermarks[gpu]'rendered as...remove-ai-watermarks'(the[gpu]extra eaten), which sent users a broken install command (surfaced via #19). Escape the literal bracket as\[gpu](in a normal Python string that is"\\[gpu]") in any rich string carrying user-facing brackets. Regression-guarded bytests/test_cli.py::TestGpuHintMarkup. - Metadata detection for AVIF/HEIF/JPEG-XL relies on a binary scan for
C2PA_UUID+IPTC_AI_MARKERS, plus EXIFSoftware/ XMPCreatorToolgenerator tags viametadata.exif_generator(validated with synthesized AVIF/JPEG fixtures + an XMP raw-scan fixture). C2PA removal in those containers is implemented vianoai/isobmff.py(top-leveluuid/jumbbox stripper, no re-encoding), which now also drops a top-level XMPuuidbox that carries an AI label (matched by AI-marker content, not by the XMP UUID, so byte-order-robust) and covers MP4/MOV/M4V/M4A by content sniff. Non-ISOBMFF audio/video removal is via ffmpeg (_FFMPEG_STRIP_EXTS->_strip_with_ffmpeg): WebM/Matroska (EBML), MP3 (ID3), WAV/FLAC/OGG (RIFF/Vorbis) are stripped losslessly withffmpeg -map_metadata -1 -map_chapters -1 -c copy(codec data untouched). Requires ffmpeg on PATH; raisesRuntimeErrorif absent or if ffmpeg can't parse the file. Verified end-to-end (a real ffmpeg-made WAV/MP3 with atitle=Suno AItag -> tag gone, audio bytes preserved). Meta-box XMP now handled (isobmff.blank_ai_xmp_packets, v0.6.9): an AI-label XMP packet stored as a meta-boxmimeitem (AVIF/HEIF) is blanked in place (overwritten with spaces of the same length, soilocoffsets and the coded image stay valid). Still NOT built: anExifitem inside themetabox (rare -- AI labels are XMP) needs fulliinf/ilocsurgery (offset rewrite) with corruption risk -- exiftool (R/W/C for HEIC/AVIF EXIF+XMP, verified on exiftool.org 2026-05-27) would do it but is a non-installed binary dep, so it stays a documented gap. Audio watermark DETECTION (Resemble PerTh) was evaluated and NOT built (2026-05-26):resemble-perth'sPerthImplicitWatermarker.get_watermark()returns a raw bit-array with no presence/confidence flag (clean audio decodes to arbitrary bits too), so reliably distinguishing watermarked-from-clean needs either Resemble's fixed payload or a confidence API -- neither is public, and there's no real Resemble sample to calibrate against. Same wall-class as the SynthID pixel detector: the decode exists, reliable presence-detection does not. (perth's top-levelPerthImplicitWatermarkeris also gated to None unlesslibrosais importable.) - SynthID detection is metadata-only. There is no reliable local detector of the SynthID pixel watermark — Google's decoder is proprietary, no public spec or API (only a waitlisted portal). Authoritative confirmation: Google DeepMind's own paper "SynthID-Image: Image watermarking at internet scale" (Gowal et al., arXiv:2510.09263) states the verification service is restricted to "trusted testers" and does not release detector weights or a reproducible algorithm — so a local pixel detector is infeasible by design, not just unbuilt. https://arxiv.org/abs/2510.09263 We detect SynthID by its C2PA companion (
synthid_source/SYNTHID_C2PA_ISSUERS), which is reliable while the manifest is intact but says nothing once C2PA is stripped. Surface-dependent blind spot (verified 2026-05-24): the same Google model emits different metadata per surface -- the Gemini app wraps outputs in Google C2PA, but the API/playground (AI Studio, Nano Banana / gemini-2.5-flash-image) emits the SynthID pixel watermark (confirmed via the Gemini-app oracle) + the visible sparkle but no C2PA/IPTC at all, sosynthid_sourcereturns None despite SynthID being present. Only the pixel oracle or the visible-sparkle detector catches those. (Meta AI is another surface mismatch: it writes the IPTCdigitalSourceType=trainedAlgorithmicMediamarker, not C2PA and not SynthID.) Google→SynthID is long-standing; OpenAI→SynthID is confirmed by OpenAI's Help Center (ChatGPT/Codex/API "include both C2PA metadata and SynthID watermarks", updated 2026-05-21) but time-gated (pre-rollout OpenAI images carry C2PA without SynthID), so the OpenAI verdict is hedged "likely". Oracles: Gemini app "Verify with SynthID" (Google), openai.com/verify (OpenAI). The spectral phase-coherence approach fromgithub.com/aloshdenny/reverse-SynthIDwas evaluated (May 2026) and does not work for real-content detection: on its own shipped codebook + validation set, watermarked and cleaned images were indistinguishable (conf within noise, cleaned often higher); it only fires on pure-black 1024x1024 reference images at exact resolution (the controlled case it was calibrated on). The README's "90% / conf=0.91" reproduces only in that lab condition. Do not build a production detector on it; if revisited, it is experimental/diagnostic only and needs a per-resolution, per-model reference corpus. A from-scratch gpt-image pilot (2026-05-24) confirmed this independently: 5 independent solid-black gpt-image outputs share a near-identical fixed signature (pairwise residual correlation 0.92, avg-template retains 97% energy), so the watermark/carrier IS strongly present and consistent on flat content — but the carrier frequencies extracted from it do NOT discriminate real content (carrier-to-random ratio: cleaned 1.86 > watermarked 1.53; a non-gpt-image image scored highest at 3.67). The signature drowns in content texture. Net: a perfectly consistent solid-color signature still yields no real-content pixel detector with magnitude/carrier methods. A corpus discrimination test (2026-05-24,scripts/synthid_pixel_probe.py, raw zero-mean residual NCC) independently re-confirms this: at matched resolution, SynthID positives do NOT cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg >= pos-vs-pos). The only high correlations were near-duplicate content (5 ChatGPT renders of one prompt at ~0.92, while a distinct ChatGPT image scored ~0 against them) — content, not a carrier. The probe is solid-fills-only and EXPERIMENTAL/DIAGNOSTIC; do not use it on real content. Correction (deeper re-examination 2026-05-25): the carrier IS real on solid fills — the earlier "no carrier" was a method artifact of using spatial / FFT-magnitude NCC, which can't see it. The carrier is a fixed phase at specific low frequencies, so the right metric is per-bin phase coherence. On 8 whitegemini-2.5-flash-imagefills (generated via the reverse-SynthID trick: identity-edit prompt "Recreate this image exactly as it is" on a synthetic pure-white PNG — this bypasses the recitation block that rejects text prompts for pure colors), phase coherence at the white carriers(0,±7..±12,±20..±23)= 0.86 vs 0.31 random; single-image leave-one-out phase-match +0.83 vs real photos -0.24. (Black2.5-flashfills clip to std≈0 — SynthID can't push values below 0, so no carrier in black; the repo's dark carriers come from nano-banana-pro.) But it does not generalize: (a) carriers are model-version + resolution + color specific — the repo's v4 codebook (built forgemini-3.1-flash-image-preview+nano-banana-pro-preview) scores ~0.527 on my 2.5-flash white fills, indistinguishable from negatives (~0.50), i.e. carriers shift across model versions and need a per-model codebook; (b) on real content (302.5-flashimages) the carrier collapses — set phase coherence at carriers 0.37 ≈ random 0.42, and the repo's v4 detector gives content 0.518 ≈ negatives 0.504 (no separation; a faint +0.24 single-image lean is likely a brightness confound). Net: the spectral/phase approach is a real controlled-fill characterizer, NOT an arbitrary-real-content detector, and is brittle to model version. Metadata proxy + visible sparkle + online oracles remain the ceiling for real content. - External AI-vs-real classifier models are out of scope (decided 2026-05-24). Generic HuggingFace detectors (
Organika/sdxl-detectorSwin Transformer,umm-maybe/AI-image-detector, and fine-tunes) exist and report ~0.98 on their own SDXL-vs-real validation sets, but they are per-generator and the model cards themselves note degraded accuracy off-distribution; they are untested on gpt-image / Gemini Nano Banana (the metadata-stripped surfaces we care about), and our own light SDXL pass would likely defeat them the same way it defeats SynthID. Detection here stays local + signal-based (metadata + visible sparkle); do not add a bundled classifier dependency. - SynthID v2 vs default pipeline: the SDXL-based default profile (since May 2026) defeats SynthID v2. Verified end-to-end (May 2026): local SDXL run on a Gemini 3 Pro output, checked via the Gemini app's "Verify with SynthID" feature, returned "no SynthID watermark detected". Also confirmed against OpenAI's SynthID (2026-05-23): a fresh ChatGPT/gpt-image output read "SynthID detected" on openai.com/verify before the local SDXL run and "SynthID not detected" after (corpus regression chain: pos
4ef377bd-> cleaned47188e88). The same configuration is used in raiw-app production (fal-ai/fast-sdxl/image-to-image, strength 0.05, steps 50, guidance 7.5, no pre-downscale). fal's ownllms.txtforfast-sdxlnames the base checkpoint asstabilityai/stable-diffusion-xl-base-1.0(verified 2026-05-25) -- the exact checkpoint the local CLI defaults to (DEFAULT_MODEL_ID). So the localinvisibledefault is weight-for-weight identical to prod; "fast-sdxl" is fal's optimized serving, not different weights. After the native-resolution fix the local pipeline matches prod on weights + strength + steps + guidance + resolution. SD-1.5 dreamshaper at 768 px was previously the default and does NOT defeat v2 — verified empirically against the same feature (strength 0.04, 0.10, and elastic warp α∈{5,8} all flagged positive). That SD-1.5 path was removed; onlydefault(SDXL) andctrlregenprofiles remain. Scope of the claim: defeating the SynthID verifier is NOT the same as forensic invisibility. "Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal" (arXiv:2605.09203, 2026-05) shows that six removal attacks across four families (UnMarker, CtrlRegen+, WatermarkAttacker, etc.) all leave forensic traces: independent detectors flag removal-processed images vs genuinely-clean ones at >98% TPR at 1% FPR. So our SDXL pass makes the oracle read "SynthID not detected," but the output can still be classifiable as "an image that went through a removal pipeline." Do not over-claim "indistinguishable from a real photo." https://arxiv.org/abs/2605.09203