mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-07-23 16:10:51 +02:00

T

Victor Kuznetsov 1708857772 fix(gemini): expand sparkle search area 256 -> 512px from corner

The 256px limit caused misses when Gemini places the sparkle further from the
corner than the standard 160px (margin 64 + logo 96). Observed variant at ~300px
reported in issue #30. 512px covers all known Gemini margin variations with room
to spare; matchTemplate on a 512x512 region is still fast on CPU.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-01 10:42:04 -07:00

.claude

refactor: enforce strict linting and type checking across codebase

2026-04-01 11:42:42 -07:00

.github

fix: harden metadata parsers and engines; sync docs (full-repo review)

2026-05-30 18:00:39 -07:00

data

feat(visible): add Jimeng remover, fix Doubao outline defect, reproducible mask build

2026-05-31 12:20:19 -07:00

docs

fix(invisible): disable protect-text/protect-faces by default; add docs/synthid.md

2026-06-01 10:28:34 -07:00

scripts

fix(scripts): drop rich import from analysis scripts (red CI after rich removal)

2026-05-31 15:41:50 -07:00

src/remove_ai_watermarks

fix(gemini): expand sparkle search area 256 -> 512px from corner

2026-06-01 10:42:04 -07:00

tests

feat(identify): detect visible Doubao/Jimeng marks; keep identify import torch-free

2026-05-31 20:43:52 -07:00

typings/piexif

chore(types): clear strict-pyright debt across src (0 errors)

2026-05-28 14:00:15 -07:00

.env.example

fix: harden metadata parsers and engines; sync docs (full-repo review)

2026-05-30 18:00:39 -07:00

.gitattributes

Add project files, tests, and documentation for GitHub release

2026-03-25 11:15:05 -07:00

.gitignore

feat(visible): add Jimeng remover, fix Doubao outline defect, reproducible mask build

2026-05-31 12:20:19 -07:00

CLAUDE.md

fix(invisible): disable protect-text/protect-faces by default; add docs/synthid.md

2026-06-01 10:28:34 -07:00

demo_banana_after.png

Add project files, tests, and documentation for GitHub release

2026-03-25 11:15:05 -07:00

demo_banana_before.png

Add project files, tests, and documentation for GitHub release

2026-03-25 11:15:05 -07:00

LICENSE

Add project files, tests, and documentation for GitHub release

2026-03-25 11:15:05 -07:00

maintain.sh

chore: add release.sh and update maintain.sh

2026-04-01 11:48:49 -07:00

pyproject.toml

fix(gemini): expand sparkle search area 256 -> 512px from corner

2026-06-01 10:42:04 -07:00

README.md

fix(invisible): disable protect-text/protect-faces by default; add docs/synthid.md

2026-06-01 10:28:34 -07:00

uv.lock

fix(gemini): expand sparkle search area 256 -> 512px from corner

2026-06-01 10:42:04 -07:00

README.md

Remove-AI-Watermarks

Remove visible and invisible AI watermarks from images generated by Google Gemini (Nano Banana), ChatGPT / DALL-E, Stable Diffusion, Adobe Firefly, Midjourney, and other AI models.

Strips SynthID, C2PA Content Credentials, EXIF/XMP "Made with AI" labels, and visible sparkle overlays — all in one command.

Try it online: raiw.cc

No Python, no GPU, no setup. Visible-watermark and metadata removal are free. Invisible-watermark removal (SynthID / SDXL regeneration) normally needs a local GPU and ~2 GB of models. On raiw.cc it runs on cloud GPUs in one click for a small per-image fee.

If this tool saves you time, consider sponsoring its development.

Intended for lawful use only. Publishing and running this software is lawful; responsibility for any downstream use, and for compliance with local law, rests entirely with the user. Some jurisdictions restrict removing an AI label as such (see Legal). The authors do not condone use for deception, fraud, or any unlawful activity.

Features

Visible watermark removal — a registry of known marks in their usual places: the Gemini / Nano Banana sparkle, the Doubao "豆包AI生成" text strip, and the Jimeng "★ 即梦AI" wordmark. Each is removed by reverse-alpha blending against a captured alpha map (original = (wm − α·logo)/(1−α)), recovering the true pixels rather than inpainting a guess. The Gemini sparkle recovers cleanly on its own; the Doubao and Jimeng text marks re-rasterize slightly per image, so a thin residual inpaint over the glyph footprint clears the leftover edges (the alpha maps are reproducibly rebuilt from controlled captures by scripts/visible_alpha_solve.py). Fast, offline, no GPU. visible --mark auto finds and removes the strongest detected mark. (For arbitrary logos/objects, see erase.)
Universal region eraser (erase) — remove any logo / watermark / object inside boxes you specify, regardless of position or colour. Default cv2 inpainting (CPU, instant); optional big-LaMa via onnxruntime (lama extra) for higher quality
Invisible watermark removal — SynthID, StableSignature, TreeRing via diffusion-based regeneration (needs a local GPU, or run it with no setup on raiw.cc)
AI metadata stripping — EXIF, PNG text chunks, C2PA provenance manifests (PNG / JPEG / AVIF / HEIF / JPEG-XL, MP4 / MOV / M4V / M4A at the container level, and WebM / MP3 / WAV / FLAC / OGG losslessly via ffmpeg), XMP DigitalSourceType
"Made with AI" label removal — removes the AI-disclosure metadata that platforms read to apply automatic labels (useful for clearing a false-positive label from a human-edited photograph)
Analog Humanizer — optional film grain and chromatic aberration post-processing
Smart Face Protection — automatic extraction and blending of human faces to prevent AI distortion
Batch processing — process entire directories
Detection — three-stage NCC watermark detection with confidence scoring
Provenance detection (identify) — aggregate C2PA issuer, the C2PA soft-binding forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, ...), IPTC "Made with AI" plus the IPTC 2025.1 AISystemUsed field, embedded SD/ComfyUI params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace hf-job-id job marker, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" text marks), the open SD/SDXL/FLUX invisible watermark, and (with the trustmark extra) the open Adobe TrustMark watermark into one origin-platform + watermark-inventory verdict (--json for machine output)

Examples

Before (Watermarked)	After (Cleaned)

Supported models

AI model	Visible watermark	Invisible watermark	Metadata	Our approach
Google Gemini / Nano Banana / Gemini 3 Pro	✅ Sparkle logo	✅ SynthID v1 + v2 (default SDXL pipeline, native resolution)	✅ C2PA + EXIF	Alpha reversal + diffusion + metadata strip
OpenAI DALL-E 3 / ChatGPT	—	—	✅ C2PA manifest	Metadata strip
OpenAI ChatGPT Images 2.0 (gpt-image-2)	—	✅ SynthID + content-specific pixel watermark (since May 2026; no local decoder, openai.com/verify oracle)	✅ C2PA manifest (verified)	Diffusion regeneration + metadata strip
Stable Diffusion / SDXL (AUTOMATIC1111, ComfyUI)	—	✅ DWT-DCT (imwatermark — locally detectable)	✅ PNG text chunks	Diffusion regeneration + metadata strip
Black Forest Labs FLUX	—	✅ DWT-DCT (imwatermark — locally detectable)	✅ C2PA (FLUX.2 Pro)	Diffusion regeneration + metadata strip
Adobe Firefly	—	—	✅ Content Credentials (C2PA)	Metadata strip
Stability AI (DreamStudio / Stable Image)	—	—	✅ C2PA ("Stability AI Ltd")	Metadata strip
Microsoft Designer / Bing Image Creator	—	✅ SynthID via DALL-E backend (Designer)	✅ C2PA (Bing runs MAI-Image, signed "Microsoft")	Metadata strip
xAI Grok (Aurora)	—	—	✅ EXIF signature scheme (no C2PA): `Signature:` blob + UUID `Artist`	Detected (`identify`); metadata strip
Midjourney	—	—	✅ EXIF + XMP (prompt, model, seed)	Metadata strip
Meta AI	—	—	✅ IPTC "Made with AI" (digitalSourceType)	Metadata strip (removes the label)
Doubao (ByteDance) / China AIGC generators	✅ "豆包AI生成" text strip (bottom-right)	—	✅ TC260 AIGC label (`<TC260:AIGC>` XMP, `AIGC` PNG chunk, or EXIF JSON) + C2PA signed by ByteDance Volcano Engine (`volcengine`)	Reverse-alpha (captured α map) + thin residual inpaint, NCC-aligned across resolutions, + metadata strip
Jimeng / Dreamina (即梦AI, ByteDance)	✅ "★ 即梦AI" wordmark (bottom-right)	—	✅ TC260 AIGC label + C2PA (Volcano Engine)	Reverse-alpha (captured α map) + residual inpaint over the glyph footprint, NCC-aligned across resolutions, + metadata strip
Samsung Galaxy AI (Generative Edit, Sketch to Image, ...)	—	—	✅ C2PA (signer "Samsung Galaxy") + `trainedAlgorithmicMedia` / proprietary `genAIType` marker	Detected (`identify`) + metadata strip
Black Forest Labs (FLUX API)	—	—	✅ C2PA (`Black Forest Labs API` + `c2pa.ai_generated_content` + `trainedAlgorithmicMedia`)	Metadata strip
StableSignature (Meta)	—	✅ In-model watermark	—	Diffusion regeneration
TreeRing	—	✅ Latent space watermark	—	Diffusion regeneration

Visible overlays are used by Google Gemini / Nano Banana (sparkle logo) and by ByteDance's Doubao ("豆包AI生成" corner text) and Jimeng / Dreamina ("★ 即梦AI" wordmark). All are removed on CPU by reverse-alpha against a captured alpha map (Jimeng adds a residual inpaint over the glyph footprint, since its mark re-rasterizes per image). Other services rely on invisible watermarks and/or metadata; our diffusion-based regeneration works against any invisible watermark in pixel or frequency domain. For a visible mark from any other source (any position, any colour), use the universal erase --region command.

Detection: remove-ai-watermarks identify <image> reports the origin platform and watermark inventory for all the signals above — C2PA issuer, the C2PA soft-binding forensic-watermark vendor (TrustMark / Digimarc / Imatag / ...), IPTC "Made with AI" plus the IPTC 2025.1 AISystemUsed field, the China TC260 AIGC label (XMP, PNG chunk, or EXIF), the HuggingFace hf-job-id job marker, embedded generation params, EXIF/XMP generator tags, the xAI/Grok EXIF signature, the SynthID metadata proxy, the visible marks (Gemini sparkle plus the Doubao "豆包AI生成" / Jimeng "即梦AI" text marks), and (with the [detect] / [trustmark] extras) the open SD/SDXL/FLUX and Adobe TrustMark invisible watermarks. SynthID and the proprietary soft-binding watermarks (Digimarc etc.) have no local decoder, so they are reported by metadata proxy / vendor name only.

How it works

Removing the Gemini / Nano Banana sparkle watermark

Google Gemini (internally codenamed Nano Banana) adds a visible sparkle logo to generated images using alpha blending:

watermarked = α × logo + (1 − α) × original

We reverse this with a known alpha map (extracted from Gemini / Nano Banana output on a pure-black background):

original = (watermarked − α × logo) / (1 − α)

A three-stage NCC (Normalized Cross-Correlation) detector finds the watermark position and scale dynamically, so it works even if the image was resized or cropped. After removal, residual sparkle-edge artifacts are cleaned via gradient-masked inpainting.

Speed: ~0.05s per image. No GPU needed.

Removing the Doubao "豆包AI生成" text watermark

Doubao (ByteDance) stamps every output with a light, semi-transparent "豆包AI生成" text strip in the bottom-right corner — the visible AIGC label mandated by China's TC260 standard. It is a fixed semi-transparent white overlay, so it is removed by reverse-alpha blending: original = (watermarked - α·logo) / (1 - α), recovering the true pixels instead of hallucinating them. The α map is solved from controlled black/gray captures (rebuildable with scripts/visible_alpha_solve.py). Like the Jimeng mark, Doubao re-rasterizes its text slightly per image, so reverse-alpha is followed by a thin residual inpaint over the glyph footprint to clear the leftover edges, and the α template is NCC-aligned to the actual mark (handling per-image scale/position jitter). Detection matches the same glyph silhouette against the corner (normalized correlation), so it keys on the "豆包AI生成" shape, not on textured corners.

Speed: ~0.05s, no GPU needed.

Removing the Jimeng "★ 即梦AI" wordmark

Jimeng / Dreamina (即梦AI, also ByteDance, distinct from Doubao) stamps a "★ 即梦AI" wordmark — a four-point sparkle followed by the 即梦AI characters — in the bottom-right corner. It is a fixed semi-transparent pure-white overlay, solved from controlled black / gray / white captures the same way as Doubao. visible --mark auto detects and removes it (or force it with --mark jimeng). One difference from Doubao: Jimeng re-rasterizes its mark slightly differently per image, so a single alpha map does not cancel it pixel-for-pixel — reverse-alpha knocks the mark down and a residual inpaint over the glyph footprint clears the remaining outline. The two ByteDance marks do not confuse auto: detection keys on each mark's own glyph shape (the Jimeng detector scores far below its threshold on a Doubao strip, and vice versa).

remove-ai-watermarks visible jimeng.png -o clean.png            # --mark auto picks Jimeng
remove-ai-watermarks visible jimeng.png --mark jimeng -o clean.png

Universal region eraser

For any visible mark the dedicated engines do not cover — a logo anywhere, any colour — erase --region x,y,w,h inpaints the box you specify. The default cv2 backend is instant and dependency-free; the optional lama backend (big-LaMa via onnxruntime, lama extra, ~200 MB model downloaded on first use) gives much cleaner fills on textured regions at the cost of ~3-4 GB RAM per call.

Removing SynthID and other invisible watermarks

Google embeds SynthID into every image generated by Gemini / Nano Banana. Other AI services use StableSignature, TreeRing, and similar schemes. These imperceptible frequency-domain patterns survive cropping, resizing, and JPEG compression.

The removal pipeline (default profile, SDXL):

image → encode to latent space (VAE) at native resolution
      → add controlled noise (forward diffusion)
      → denoise (reverse diffusion, ~50 steps at strength 0.30)
      → decode back to pixels (VAE)

Native resolution avoids shrinking the input to 1024 px first; that down-then-up round-trip was the main quality loss (issue #10). Use --max-resolution N only to cap GPU/MPS memory on very large inputs.

Default strength is 0.30, tuned to remove the current Google SynthID. An oracle-verified study (fresh Gemini images, "Verify with SynthID") found the current SynthID survives 0.10/0.15/0.20 and clears only at 0.30. SynthID is a moving target (the threshold has climbed 0.05 → 0.10 → ~0.30 as Google hardens it), and there is no local SynthID detector, so the tool cannot self-check and auto-tune. If the oracle still reads SynthID, raise --strength further; if you care more about preserving fine text, lower it. 0.30 softens dense typography somewhat, so use the lowest value that comes back clean on the oracle.

Text and face protection are OFF by default. The high-resolution text re-scrub can shield SynthID in text regions, leaving the watermark intact there even after the global pass clears it everywhere else (verified June 2026: same image, with --protect-text → SynthID detected; without → SynthID removed). Both features are opt-in with --protect-text / --protect-faces and considered experimental. If you enable them, verify the result with the oracle.

OpenAI / ChatGPT images do not carry Google SynthID (they use C2PA metadata, stripped by the metadata step), so 0.30 is overkill there; --strength 0.10 preserves quality and the metadata strip is what matters.

--pipeline ctrlregen is experimental and not recommended. On paper CtrlRegen (ICLR 2025) regenerates from near-clean Gaussian noise to defeat robust watermarks, but in testing on real images it destroys content — smooth and background regions fill with hallucinated micro-text — and it is heavy (several GB of extra models, minutes per image). It has no usable middle setting (too low removes nothing, high enough to remove wrecks the image), so the shippable path is the default SDXL pipeline at ~0.30. CtrlRegen stays available for experimentation only.

SDXL is the default since May 2026: empirically defeats SynthID v2 on Gemini 3 Pro outputs, where the older SD-1.5 pipeline at 768 px did not. The SD-1.5 path was removed once it was verified not to handle v2. Note the scope: this defeats the SynthID verifier, which is not the same as being forensically indistinguishable from a real photo. Recent work (arXiv:2605.09203) shows watermark-removal pipelines leave detectable traces, so a separate "this image was processed" classifier can still flag the output.

Oracle vs identify can disagree, and that is expected. An online verifier reads the actual SynthID pixel watermark and detects only its own vendor's content — openai.com/research/verify states "OpenAI generation signals will only be detected if the image was generated with our tools". Our identify cannot decode the pixel watermark (no vendor ships a local decoder), so it infers SynthID from the C2PA metadata instead. So after the SDXL pass the oracle can read "no SynthID" (pixel watermark gone) while identify still reports SynthID from a surviving C2PA manifest. They measure different signals. Run metadata --remove (or all) to also strip the manifest; note that a quiet metadata proxy is not proof the pixel watermark itself is gone.

Technical deep-dive: see docs/synthid.md for a primary-source-cited breakdown of how SynthID works mechanically (post-hoc encoder/decoder, 136-bit payload, pixel-space embedding), what it empirically survives (JPEG, crop, resize: ~99.98% TPR at 0.1% FPR from arXiv:2510.09263), what removes it, and the forensic-stealth tradeoff (all known removal attacks are detectable at >98% TPR@1%FPR per arXiv:2605.09203).

Face Protection (experimental, opt-in --protect-faces): before diffusion, YOLO detects people in the image and extracts them; after diffusion the original faces are blended back. Off by default — enable only when face fidelity matters more than SynthID removal completeness.

Analog Humanizer: optional film grain and chromatic aberration injection that mimics a photo of a screen, raising the bar for AI-generated image classifiers. (It frustrates generic classifiers but does not guarantee forensic invisibility — see the arXiv:2605.09203 note above.)

Text Protection (experimental, opt-in --protect-text): re-scrubs detected text blocks at high resolution after the global pass to keep small glyphs crisp. Off by default because the high-resolution re-scrub can preserve SynthID in text regions even after the global pass removes it elsewhere. Enable only when text fidelity matters more than watermark removal completeness, and verify the oracle result. SDXL pipeline only.

Stripping C2PA, EXIF, and "Made with AI" metadata

AI tools embed generation metadata that social platforms use to show "Made with AI" labels:

EXIF tags — prompt, seed, model hash, sampler settings (Stable Diffusion, Midjourney)
XMP DigitalSourceType — trainedAlgorithmicMedia tag used by Instagram, Facebook, and X (Twitter) to show "Made with AI"
PNG text chunks — ComfyUI workflows, AUTOMATIC1111 parameters
C2PA Content Credentials — cryptographic provenance manifests from Google Imagen, OpenAI DALL-E, Adobe Firefly

The cleaner parses each layer, removes AI-related fields, and preserves standard metadata (Author, Copyright, Title).

Installation

Install from repository

Prerequisites: Python 3.10+ and pip (or uv).

# 1. Clone the repository
git clone https://github.com/wiltodelta/remove-ai-watermarks.git
cd remove-ai-watermarks

# 2. Install the package in editable mode
pip install -e .

# Or, if you use uv:
uv pip install -e .

After installation the remove-ai-watermarks command is available system-wide.

Note

: The base install covers visible watermark removal and metadata stripping. For invisible watermark removal (SynthID etc.), install GPU dependencies:
pip install -e ".[gpu]"   # or: uv pip install -e ".[gpu]"
To let identify decode the open Stable Diffusion / SDXL / FLUX invisible watermarks, install the detect extra (adds the invisible-watermark decoder):
pip install -e ".[detect]"   # or: uv pip install -e ".[detect]"
To also decode the open Adobe TrustMark watermark (behind Adobe Durable Content Credentials), install the trustmark extra (pulls torch and downloads model weights on first use):
pip install -e ".[trustmark]"   # or: uv pip install -e ".[trustmark]"

Invisible watermark removal

Invisible removal uses diffusion models and a GPU for reasonable speed.

# On first run, the model (~2 GB) will be downloaded automatically.
# Device is auto-detected: CUDA (Linux/Windows) > MPS (macOS) > CPU.
# To force a device: --device cuda / --device mps / --device cpu

# Optional: set a HuggingFace token for gated/private models
cp .env.example .env
# Edit .env and set HF_TOKEN=hf_your_token_here

Developer setup

# Install with dev dependencies (pytest, ruff, pyright)
pip install -e ".[dev]"
# Or with uv:
uv pip install -e ".[dev]"

# Run tests
pytest

# Run linters
./maintain.sh

Usage

CLI

# Remove all watermarks from a single image (visible + invisible + metadata)
remove-ai-watermarks all image.png -o clean.png

# Process an entire directory
remove-ai-watermarks batch ./images/ --mode all

Individual commands

# Identify provenance: where an image was made + its watermark inventory.
# Aggregates C2PA, IPTC "Made with AI", embedded SD/ComfyUI params, EXIF/XMP
# generator tags (incl. inside AVIF/HEIF), the SynthID proxy, the visible Gemini
# sparkle, and (with the [detect] extra) the open SD/SDXL/FLUX invisible
# watermark into one verdict. Reports "unknown"
# (never "clean") when no signal is found, since stripped metadata is not proof
# of a clean origin. Add --json for machine-readable output.
remove-ai-watermarks identify image.png

# Visible watermark only — fast, offline, CPU. --mark auto (default) finds the
# strongest known mark (Gemini sparkle / Doubao "豆包AI生成" / Jimeng "即梦AI"); force
# one with --mark gemini / doubao / jimeng. Removed by reverse-alpha (true-pixel recovery).
remove-ai-watermarks visible image.png -o clean.png

# Erase arbitrary region(s) — universal, any logo/watermark/object, any position.
# Default cv2 inpainting (CPU). --backend lama uses big-LaMa (extra 'lama').
remove-ai-watermarks erase image.png --region 1640,1930,400,100 -o clean.png

# Invisible watermark only (SynthID etc.) — requires GPU
remove-ai-watermarks invisible image.png -o clean.png --humanize 4.0
# Runs at native resolution by default. On a very large image that OOMs the
# GPU/MPS, cap the long side: --max-resolution 2048
# Text / CJK glyphs are preserved automatically; disable with --no-protect-text

# Check / strip AI metadata (C2PA, EXIF, "Made with AI" labels)
# --check also flags SynthID-bearing sources: a C2PA manifest signed by
# Google or OpenAI implies an invisible SynthID watermark in the pixels
# (both vendors pair the two). Adobe Firefly / Microsoft sign C2PA without
# SynthID, so they are reported as C2PA only.
remove-ai-watermarks metadata image.png --check
remove-ai-watermarks metadata image.png --remove

# Batch with a specific mode
remove-ai-watermarks batch ./images/ --mode visible

Python API

from remove_ai_watermarks.gemini_engine import GeminiEngine
import cv2

engine = GeminiEngine()
image = cv2.imread("watermarked.png")

# Detect
result = engine.detect_watermark(image)
print(f"Detected: {result.detected} (confidence: {result.confidence:.1%})")

# Remove
clean = engine.remove_watermark(image)
cv2.imwrite("clean.png", clean)

Metadata stripping

from remove_ai_watermarks.metadata import has_ai_metadata, remove_ai_metadata
from pathlib import Path

if has_ai_metadata(Path("image.png")):
    remove_ai_metadata(Path("image.png"), Path("clean.png"))

Requirements

Python ≥ 3.10
Visible removal / metadata: CPU only, no GPU required
Invisible removal: GPU recommended (CUDA or MPS), works on CPU (slow)

Troubleshooting

SSL certificate error (CERTIFICATE_VERIFY_FAILED):

# Install certifi (the tool auto-detects it)
pip install certifi

# macOS only: run the Python certificate installer
/Applications/Python\ 3.*/Install\ Certificates.command

First run is slow — this is expected. The tool downloads model weights (~2 GB) on first launch. Subsequent runs use cached models.

Credits

noai-watermark by mertizci — invisible watermark removal engine
GeminiWatermarkTool by Allen Kuo (MIT) — visible watermark removal algorithm
CtrlRegen by Liu et al. (ICLR 2025) — controllable regeneration pipeline
NeuralBleach (MIT) — analog humanizer technique

Roadmap

Tracked but not yet implemented:

SynthID-Image v2 automated regression test. The default SDXL profile defeats v2 per manual checks against the Gemini app's "Verify with SynthID" feature on a Gemini 3 Pro output (May 2026). An automated end-to-end test would need either programmatic access to the SynthID Detector portal (waitlist for media professionals and researchers) or an offline surrogate detector. The spectral phase-coherence surrogate from reverse-SynthID was evaluated and does not separate watermarked from cleaned real-content images (it only fires on controlled solid-color references at exact resolution), so it is not a usable oracle. Open.
Local SynthID pixel detector. Not feasible today: Google's decoder is proprietary, and magnitude/carrier spectral methods do not separate real content (confirmed by three independent evaluations, including a from-scratch gpt-image pilot; see CLAUDE.md). Blocked on either (a) a programmatic generation path (OpenAI / Gemini API) to build a per-(model, resolution) labeled corpus at scale, or (b) a raw watermarked-output dataset. If data arrives, the next approach to try is a learned classifier on diverse content rather than a fixed carrier codebook.
Grow the SynthID reference corpus (data/synthid_corpus/) with oracle-labeled samples per model and resolution (Gemini app for Google, openai.com/verify for OpenAI). Prerequisite for any pixel-detector attempt and for an automated removal-regression set.
Real non-PNG C2PA fixtures. SynthID-source detection for JPEG / WebP / AVIF is currently covered only by synthetic byte blobs; replace with real vendor-emitted files to ground the binary-scan path.
Maintenance debt. Strict pyright is now clean across src/ (0 errors): pure-logic files are fully typed, the cv2 / torch / diffusers boundary files carry a documented per-file relax pragma, and a local typings/piexif stub covers piexif. Remaining: full-project pyright (no path) still OOMs node on this ML-heavy repo, so it must be scoped to src/; narrowing the boundary pragmas back toward full strict (as upstream stubs improve) is the long tail. (uv-secure is already clean since idna was bumped to 3.16.)
AVIF / HEIF Exif item inside the meta box. An AI-label XMP packet in a meta-box item is now blanked in place (v0.6.9), but EXIF stored as a meta-box Exif item is still not removed — it needs full iinf/iloc surgery (offset rewrite, corruption risk) or exiftool (a non-bundled binary dependency). Low priority: the AI labels we target are XMP, not EXIF, so an EXIF-only meta-box case is rare.
More C2PA device signers. Leica, Nikon, Google Pixel, Sony, and Truepic capture cameras are mapped (each verified against a real signed file); Samsung Galaxy AI, Black Forest Labs (FLUX), and ByteDance Volcano Engine (Doubao / Jimeng) are now attributed too (verified on real signed files). Canon is still deferred until a real signed sample surfaces — no public direct-download C2PA file exists for it today (upload-to-verify / news-agency-licensed only).
Resemble PerTh audio detection — evaluated, not feasible with the public API: get_watermark() returns a raw bit array with no presence/confidence flag, so watermarked vs. clean audio can't be reliably separated without Resemble's fixed payload or a confidence service. Same wall as the SynthID pixel detector.
Video pipeline (noai-video): per-frame inpainting and tracking for Sora 2 dynamic logo, Veo 3.1 badge, Kling, Runway. Separate package, not folded into this repo.

Won't fix:

Nightshade / Glaze / PhotoGuard removal. These are defensive perturbations used by artists to protect their work from being scraped into AI training sets. Removing them attacks artists, not AI provenance. Out of scope.

Legal

Watermarking and provenance for AI-generated content is now regulated in several jurisdictions. The table below summarises the May 2026 status. None of this is legal advice.

Jurisdiction	Instrument	Status (May 2026)	Relevance
EU	AI Act, Article 50	Transparency duties apply from 2 August 2026. Legacy generative systems (placed on the market before that date) get a grandfathering period to 2 December 2026 for the Article 50(2) marking duty, under the Digital Omnibus (Commission proposal Nov 2025; co-legislator political agreement 7 May 2026). Article 50 guidelines and a marking Code of Practice are being finalised through 2026.	Removing mandated provenance markers with intent to deceive may be sanctioned under national implementations.
US (federal)	COPIED Act (S. 1396, 119th Cong.)	Reintroduced April 2025; not enacted (referred to Senate Commerce Committee).	If passed, would set NIST provenance standards and prohibit tampering with / removing provenance information. The tool itself is lawful; usage may not be.
US (state)	CA AB 2655, TX SB 751 (2019), similar	TX SB 751 (2019) in force; CA AB 2655 struck down by a federal court (E.D. Cal., Aug 2025, Kohls v. Bonta) as preempted by Section 230; the court did not reach the First Amendment (the companion law AB 2839 was separately enjoined on First Amendment grounds).	Content-specific (election deepfakes, sexual deepfakes). Not tool-specific.
US (state)	CA AB 853 (amends the California AI Transparency Act)	Core provider duties operative 2 August 2026 (delayed from 1 January 2026); large platforms 1 January 2027; capture devices 1 January 2028.	Covered providers (1M+ monthly users) must embed a latent disclosure that is "permanent or extraordinarily difficult to remove" and offer a free detection tool. Removing that disclosure is what this tool does.
South Korea	AI Framework Act (Basic Act on AI), Article 31	In force since 22 January 2026 (one-year transition after promulgation).	Art. 31(3): AI output "difficult to distinguish from reality" must be labeled so users "clearly recognize" it; the draft Enforcement Decree accepts a machine-readable (invisible-watermark) label. Artistic/creative works get a presentation exception.
China	Measures for Labeling AI-Generated Content (+ GB 45438-2025)	In force since 1 September 2025.	Mandatory explicit (visible) + implicit (metadata) labels across image / audio / video; tampering with, forging, or removing labels is prohibited.
India	IT (Intermediary Guidelines and Digital Media Ethics Code) Amendment Rules, 2026	In force since 20 February 2026 (notified 10 February 2026).	All "synthetically generated information" must be prominently labelled and carry permanent metadata / a provenance identifier; the rules expressly prohibit modifying, suppressing, or removing that label or metadata. Covers image, audio, and audio-visual content.
UK	Online Safety Act 2023 / Ofcom guidance	In force, but no statutory AI-provenance or watermarking obligation.	Ofcom encourages watermarking / provenance metadata as voluntary "attribution measures"; platform duties, not user obligations.

Threat model

This tool defends already-distributed AI imagery against automatic detection systems (social-platform "Made with AI" labels, third-party classifiers, content-policy filters). It does not retroactively anonymise generation.

In particular, SynthID (Google DeepMind) is embedded across Google's generative media stack — Imagen (images), Veo (video), Lyria (audio) — and Gemini app image outputs (Nano Banana / Gemini 3 Pro, which we verified positive via the Gemini app's SynthID oracle); Google reported over 10 billion items watermarked by December 2025. It carries a multi-bit payload — the research paper's SynthID-O variant encodes 136-bit payloads in 512x512 images (arxiv 2510.09263). The payload is believed to encode a user / session identifier. If the original watermarked file ever passed through a system controlled by the prompt originator (a saved Gemini account history, a screenshot uploaded to a Google product, a backup), Google retains the ability to link that original to the generating account. Stripping the watermark from a copy you possess does not erase Google's server-side record.

Use cases where the threat model fits:

You generated the image yourself, want to publish it as your own work, and accept the consequences if Google ever publishes their detector logs.
You are running a security / robustness evaluation.
You are preserving art or historical record against false-positive "AI-generated" labels.

Use cases where the threat model does not fit:

Generating an image, expecting that removing the watermark anonymises you to Google. It doesn't.
Distributing AI-generated content while claiming human authorship. The watermark is one of several traceability layers.

This tool is intended for legitimate purposes such as:

Privacy protection (removing metadata that leaks user account identifiers).
Art preservation and fair-use research.
Removing false-positive "Made with AI" labels from human-edited photographs.
Security research and watermark robustness study.

Who bears the liability. This is general-purpose software and is itself lawful to publish and run; legal responsibility attaches to the person who removes a marker and to how the result is then used, and the hinge is intent. Removing AI provenance to pass AI-generated content off as human-made, to commit fraud, to produce non-consensual deepfakes, or to conceal copyright infringement can expose the remover to liability. Two kinds of exposure are worth knowing:

The downstream act. Deception, fraud, defamation, IP infringement, or breaking a platform's terms — judged by intent and harm, not by the act of editing metadata itself. In the US, the DMCA (17 U.S.C. § 1202) specifically bars removing "copyright management information" with intent to conceal or enable infringement.
The removal itself. Some jurisdictions penalise tampering with the label/metadata as such, regardless of downstream use — notably China (Labeling Measures) and India (IT Amendment Rules 2026), which expressly prohibit removing or suppressing the AI label and provenance metadata. The US COPIED Act would do the same if enacted.

Legitimate uses — publishing your own work, privacy (stripping metadata that leaks an account identifier), security / robustness research, or removing a false-positive "Made with AI" label from a human-edited photograph — are generally lawful. Users are solely responsible for ensuring their use complies with all applicable laws. The authors do not condone use of this tool for deception, fraud, or any activity that violates applicable laws or regulations. None of this is legal advice.

License

MIT

README.md Unescape Escape

Remove-AI-Watermarks

Try it online: raiw.cc

Features

Examples

Supported models

How it works

Removing the Gemini / Nano Banana sparkle watermark

Removing the Doubao "豆包AI生成" text watermark

Removing the Jimeng "★ 即梦AI" wordmark

Universal region eraser

Removing SynthID and other invisible watermarks

Stripping C2PA, EXIF, and "Made with AI" metadata

Installation

Recommended

Install from repository

Invisible watermark removal

Developer setup

Usage

CLI

Individual commands

Python API

Metadata stripping

Requirements

Troubleshooting

Credits

Roadmap

Legal

Threat model

License

README.md