remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-05-27 14:42:25 +02:00

Author	SHA1	Message	Date
test-user	93c664f7fb	docs: sync README + corpus map with v0.5.x detection coverage - README Features: add the identify / provenance-detection capability. - README Supported models: add FLUX, Stability AI, Microsoft/Bing (MAI-Image), Meta AI rows; note SD/SDXL/FLUX imwatermark is locally detectable; add a detection note pointing at identify. - corpus README per-platform map: add Stability / Ideogram / Recraft / Krea-FLUX rows + an imwatermark column; correct Bing (MAI-Image, signs 'Microsoft'); note imwatermark fires only on pristine pipeline output, not re-hosts/exports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:32:03 -07:00
test-user	ad3b8ee248	feat(identify): read EXIF Software / XMP CreatorTool generator tags Closes the documented gap where EXIF/XMP fields inside AVIF/HEIF/JXL went unparsed. metadata.exif_generator extracts the EXIF Software/Artist tag (via PIL+piexif, which opens AVIF natively) and the XMP CreatorTool (via a container-agnostic raw-byte scan that also covers HEIF/JXL that PIL can't open), and matches against AI_GENERATOR_TOKENS so only generator names (Firefly, DALL-E, Midjourney, ComfyUI, ...) fire -- a plain 'Adobe Photoshop' or 'GIMP' tag is not flagged. identify() surfaces it as a high-confidence signal and uses it for platform attribution when no C2PA names a platform, so an AVIF/HEIF whose only AI signal is an EXIF/XMP generator tag is now caught. Validated with synthesized fixtures (the 'no positive fixtures' blocker was self-imposed): real AVIF and JPEG written with EXIF Software via PIL, plus an XMP CreatorTool raw-scan fixture. Zero false positives across the 109-image corpus (real iPhone photos carry no AI generator token). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 17:56:39 -07:00
test-user	27ad5b7645	feat(identify): detect open SD/SDXL/FLUX invisible watermark Research found one locally-fillable detection gap: Stable Diffusion, SDXL, and FLUX all embed an open DWT-DCT watermark via the invisible-watermark (imwatermark) library -- a PUBLIC decoder, no secret key, unlike SynthID. New invisible_watermark.py decodes the known fixed patterns (verified against upstream source: diffusers SDXL WATERMARK_MESSAGE, FLUX.2 src/flux2/watermark.py, and the 'StableDiffusionV1' default string) and identify() reports the scheme as a high-confidence signal. Verified locally end-to-end: embedding SDXL's exact 48-bit message and decoding it back recovers 48/48 bits; a clean image and our own fal-SDXL outputs decode to ~21/48 (no match). Caveat baked into the report: the watermark is fragile -- gone after JPEG q90 -- so it confirms origin only on pristine files; absence is never proof. imwatermark is an optional dep (extra 'detect'; pulls non-headless opencv), so the import is guarded and the signal is skipped when absent. CLI --no-visible now means metadata-only (skips both pixel-domain detectors). Also records the broader watermarking landscape in CLAUDE.md: which services are locally detectable (SD/SDXL/FLUX), C2PA-covered (Bing/Canva/ Getty/Shutterstock unsampled), or proprietary-only like SynthID (Amazon Titan/Nova, Kakao). Midjourney embeds neither C2PA nor an invisible mark. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 16:53:59 -07:00
test-user	fa104bcade	feat(identify): provenance command (platform + watermark inventory) New 'identify' command and identify.py module: upload an image, get one ProvenanceReport answering where it was made and what watermarks it carries. Aggregates every locally-readable signal: - C2PA Content Credentials -> generating platform (issuer + generator). - IPTC digitalSourceType 'Made with AI' (Meta and others). - Embedded SD/ComfyUI generation parameters (local pipelines). - SynthID metadata proxy (Google / OpenAI C2PA companion). - Visible Gemini sparkle (cv2 fallback for the stripped-metadata case), promoted only at confidence >= 0.5 (corpus-tuned: Gemini sparkles score >= 0.56, non-sparkle <= 0.49). is_ai_generated is True or None, never asserted False -- stripped metadata leaves no local proof of a clean origin, so absence of signals is reported as 'unknown' with an explicit caveat. The SynthID pixel watermark remains locally undecodable; the report says so. Non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) get the same issuer + generator attribution via a binary scan (the caBX parser is PNG-only). The cv2 dependency is isolated in gemini_engine.detect_sparkle_confidence so identify.py stays type-clean. CLI supports --json and --no-visible. Validated against the 109-image corpus: 14/14 positives flagged AI, 93/94 negatives clean (the one 'neg' flagged is a Meta image that genuinely carries the IPTC tag -- correct), zero true errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 16:19:26 -07:00
test-user	c006f9b8b4	docs(roadmap): record next steps for SynthID detector work Captures the forward plan so a future session picks it up: local pixel detector is blocked pending a generation API or raw watermarked dataset (spectral methods shown insufficient); grow the oracle-labeled corpus; replace synthetic non-PNG C2PA fixtures with real ones; and the maintenance debt (idna bump, strict-pyright cleanup) needed for a green maintain.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:34:43 -07:00
test-user	f07ce10c72	feat(metadata): SynthID-source detection, C2PA parser consolidation, corpus + tests Detect SynthID-bearing images via their C2PA companion: a manifest signed by a SynthID-using vendor (Google/OpenAI) on AI-generated content implies an invisible SynthID pixel watermark. Verified end-to-end against the vendor oracles (openai.com/verify, Gemini "Verify with SynthID"). - metadata: synthid_source() + synthid_watermark verdict in get_ai_metadata, surfaced as a `metadata --check` callout. Format-agnostic (PNG caBX parser + JPEG/WebP/AVIF/HEIF/JXL binary scan). - constants: SYNTHID_C2PA_ISSUERS {Google, OpenAI}; +opened/placed actions. - c2pa: single CBOR-aware parser (_cbor_text_after) replaces glitchy regex (fixes fGPT-4o claim_generator); removed duplicate _scan_png_c2pa_chunk from metadata; shared synthid_verdict / synthid_vendors_in helpers. - corpus: scripts/synthid_corpus.py ingest tool + data/synthid_corpus/ (manifest tracked, images gitignored) for a labeled reference set. - tests: +38 across C2PA parser internals, extract/inject round-trip, ISOBMFF container stripping, all IPTC AI markers, and invisible watermark strength tiers (SynthID/StableSignature/TreeRing/StegaStamp/RingID/RivaGAN/...). Pixel-level SynthID detection remains out of reach locally (Google's decoder is proprietary); a from-scratch spectral pilot confirmed it does not separate real content. See CLAUDE.md for the full evaluation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:32:46 -07:00
test-user	f2fc5e09ab	feat: SDXL default; AVIF/HEIF/JPEG-XL C2PA stripping SD-1.5 dreamshaper at 768 px did not defeat SynthID v2 on Gemini 3 Pro outputs (verified May 2026 via Gemini app's "Verify with SynthID"). Switch the default invisible engine to SDXL at 1024 px, matching the raiw-app production config (strength 0.05, steps 50). Drop the SD-1.5 pipeline. Metadata layer: add C2PA UUID and IPTC AI marker byte-scan detection across all formats, plus an ISOBMFF box walker (noai/isobmff.py) that strips top-level C2PA uuid and JUMBF jumb boxes from AVIF/HEIF/JPEG-XL containers without re-encoding. README gets a Legal table and a Threat-model section about SynthID v2's 136-bit payload. CLAUDE.md tracks the SD-1.5 regression as historical context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 12:54:37 -07:00
test-user	87d02126e3	feat(metadata): parse C2PA JUMBF manifest fields, add Images 2.0 sample, bump to 0.3.4 - metadata --check now shows claim_generator, c2pa_spec, digital_source_type, c2pa_actions, signer instead of empty table for C2PA-only files - reuses existing extract_c2pa_chunk() from noai/c2pa.py — no more duplicate PNG chunk parsing or full-file reads - adds data/samples/openai-images-2/amur-leopard.png: real gpt-image-2 output with C2PA manifest signed by OpenAI OpCo LLC / Trufo CA (spec 2.2.0) - removes stale data/samples/nano-banana-1/2.png (no longer referenced) - updates README: new Images 2.0 row in supported models table - documents known text-degradation limitation in CLAUDE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 17:21:51 -07:00
test-user	bedb1bca08	docs: add raiw.cc mention and move examples section higher	2026-03-27 10:15:40 -07:00
test-user	b41c8e5aba	v0.3.1: Fix opencv conflict, graceful GPU fallback, correct docs - Remove opencv-python from [gpu] extra (conflicts with headless in base deps) - Add graceful fallback in 'invisible' and 'all' commands when GPU deps missing - Cache InvisibleEngine in batch mode (avoid reloading model per image) - Fix --humanize help text (was '0.0-1.0', actual range is 0-6.0+) - Fix stale docstring referencing non-existent [invisible] extra - Add [gpu] extra install instructions to README - Fix broken NeuralBleach placeholder URL in Credits	2026-03-26 10:50:26 -07:00
test-user	1890848ec3	SEO-optimized README, add sample images from multiple AI models - Rewrite README for SEO: Nano Banana, SynthID, Made with AI, C2PA keywords - Add Supported Models table with 7 AI services - Add 'Made with AI' label removal to features - Rename sections for search discoverability - Add samples: ChatGPT/DALL-E, Midjourney, Adobe Firefly - Reorganize data/samples with flat structure and clear naming	2026-03-25 17:23:24 -07:00
test-user	c7c43a55d7	Add 'How it works' section to README	2026-03-25 13:51:51 -07:00
test-user	507757738e	v0.2.2: Unify quality defaults, improve README - Unify 'all' defaults to match 'invisible' (strength=0.02, steps=100) - Reorder CLI docs: 'all' command first, individual commands second - HuggingFace token is now documented as optional - Remove 'additional setup' label from invisible section	2026-03-25 12:28:02 -07:00
test-user	636d11a65e	Make docs platform-neutral (macOS/Linux/Windows)	2026-03-25 12:15:17 -07:00
test-user	345d455a8c	Minor README formatting	2026-03-25 12:04:15 -07:00
test-user	cace97b04e	Bump version to 0.2.0 Changes since 0.1.0: - Fix phantom model param bug in invisible/all commands - Fix macOS SSL certificate issue for YOLO downloads - Use temp file in 'all' pipeline to hide intermediate output - Add legal disclaimer and fix license attribution - Add troubleshooting and upgrade docs to README - Expand test suite to 137 tests covering all CLI modes - Clean up dependencies and pyright config	2026-03-25 12:03:44 -07:00
test-user	1a3d2a448e	Fix macOS SSL cert issue, add troubleshooting and upgrade docs - Add SSL certificate auto-fix in FaceProtector (certifi) - Add Troubleshooting section to README (SSL, first-run downloads) - Add upgrade instructions for pipx/uv tool users	2026-03-25 11:53:02 -07:00
test-user	d7614a7b45	Add legal disclaimer, fix attribution, expand credits - Add disclaimer section to README (research/education purposes) - Remove incorrect Apache-2.0 license claim from ctrlregen docstrings - Expand Credits with CtrlRegen and NeuralBleach attribution - Add license info (MIT) for GeminiWatermarkTool and NeuralBleach	2026-03-25 11:23:28 -07:00
test-user	e5d8970add	Add project files, tests, and documentation for GitHub release - CLI with visible, invisible, all, metadata, and batch commands - Gemini watermark removal via reverse alpha blending - Invisible watermark removal via diffusion regeneration (SynthID, TreeRing) - AI metadata stripping (EXIF, PNG text, C2PA) - Face protection (YOLO/Haar) and analog humanizer - 137 tests covering all CLI modes and core engines - Ruff and Pyright clean	2026-03-25 11:15:05 -07:00

19 Commits