remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-06-05 18:46:34 +02:00

Author	SHA1	Message	Date
test-user	64be9598f2	fix(identify): device-token-first C2PA attribution; add verified Pixel (v0.6.2) Replaces the claim-generator-string match with a distinctive device-token scan of the manifest bytes (_device_platform / _DEVICE_C2PA_PLATFORM), which is more robust: it catches devices where the generator name lives under a non-standard CBOR key (Pixel uses `claim_generator_info`, so it has no `claim_generator`). - Adds Google Pixel, verified against a real Pixel 10 Pro C2PA file (attached to c2pa-rs issue #1609/#1554): cert CN "Pixel Camera", digitalSourceType `computationalCapture` -> capture authenticity, not AI (is_ai stays None). - Token distinctiveness is load-bearing: bare "Truepic" matched the OpenAI chatgpt-1.png fixture (Truepic is a trust-chain signing authority), so the token is the specific "Truepic_Lens"; "Pixel Camera" (cert CN) not "Pixel". - Verified Leica/Nikon/Truepic/Pixel attribute correctly and OpenAI/Adobe/MJ do not regress. Sony/Canon/Samsung/Bria stay unmapped: no public direct- download C2PA sample exists to verify their in-manifest string. - Regression tests: device token beats incidental issuer mentions (Leica, Pixel-vs-Google). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 20:43:40 -07:00
test-user	dda2ee7fbb	fix(identify): attribute C2PA by claim_generator, not incidental issuer tokens (v0.6.1) Verified on real signed files that the issuer byte-scan mis-attributes multi-entity manifests: Leica read as "Truepic" (timestamp authority in the chain), Nikon as "Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Truepic as "Google". Platform attribution now prefers the claim generator (what produced the asset) and falls back to the issuer scan. - New _CLAIM_GENERATOR_PLATFORM map + _platform_from_generator; claim generator read for non-PNG via the now-public c2pa.cbor_text_after. - Device tokens listed only where verified against a real C2PA file (Leica lc_c2pa, Nikon, Truepic Lens); Pixel/Samsung/Sony/Canon/Bria deferred until a real sample confirms the in-manifest string. Camera C2PA marks capture authenticity, so these never set is_ai. - cbor_text_after made public (was _cbor_text_after); call sites + tests updated. - Regression test: claim_generator beats incidental Adobe/Google/Truepic tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 20:10:07 -07:00
test-user	c196a16900	feat: detect soft-binding vendors, IPTC 2025.1, video/audio C2PA, TrustMark (v0.6.0) Broadens metadata provenance coverage at the detection and container-strip level. Detection: - C2PA soft-binding `alg` -> forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...) via C2PA_SOFT_BINDINGS + soft_binding_vendors_in(); names the watermark vendor even when the watermark itself can't be decoded. - IPTC Photo Metadata 2025.1 AI-disclosure XMP fields (AISystemUsed etc.) via iptc_ai_system() + IPTC_AI_FIELD_MARKERS. - Adobe TrustMark open keyless decoder (trustmark_detector.py, optional extra `trustmark`) -- the watermark behind Adobe Durable Content Credentials. Detects provenance, not AI origin, so it does not assert is_ai. Removal / containers: - isobmff.strip_c2pa_boxes now also drops a top-level XMP uuid box that carries an AI label (matched by AI-marker content, byte-order-robust; plain XMP kept). - remove_ai_metadata routes MP4/MOV/M4V/M4A (and any ftyp-sniffed ISOBMFF) through the box stripper; raises a clear error for non-ISOBMFF audio/video (WebM/MP3/WAV) instead of crashing in the image path. Tests: soft-binding scan, IPTC element/attribute/presence, MP4 + M4A detect/ strip, ISOBMFF XMP surgical strip, content-sniff, unsupported-container guard, TrustMark absent-safety + identify integration. ruff clean; pyright clean on all new modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 17:56:48 -07:00
test-user	ba94de8275	feat: strip AI-provenance EXIF tags on removal (v0.5.6) remove_ai_metadata now scrubs AI tags from the JPEG EXIF instead of passing the block through wholesale. Closes the v0.5.5 follow-up: the xAI/Grok Signature + UUID-Artist pair was detected but not removed. - metadata._scrub_ai_exif(): deletes the xAI signature pair and any Software/Make/Artist/ImageDescription tag carrying an AI_GENERATOR_TOKENS token (so Ideogram's Make="Ideogram AI" is scrubbed too), keeping genuine camera/editor EXIF intact. - Shared _is_xai_signature_pair / _exif_text helpers (module-level compiled regexes) are now the single source of truth, used by both xai_signature and _scrub_ai_exif. - Tests: Grok signature stripped on JPEG output, Ideogram Make stripped, real-camera Make ("Apple") preserved. 325 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 14:26:20 -07:00
test-user	74618b91a7	feat: detect xAI/Grok EXIF signature; refresh watermarking landscape (v0.5.5) xAI Grok (Aurora) images carry no C2PA/SynthID/IPTC -- their only provenance signal is an EXIF pair: ImageDescription "Signature: <base64>" + a UUID Artist. Verified stable across 3 genuine generations (a real download previously read as unknown / "no AI metadata"). - metadata.xai_signature(): matches the Signature blob + UUID Artist pair; wired into has_ai_metadata, get_ai_metadata, and identify (platform "xAI (Grok / Aurora)"). - data/samples/grok-1.jpg: real Grok fixture (neutral content; the Artist UUID is the public image id, not PII). - Tests: synthetic-fixture unit tests, real-sample assertion, identify integration (322 passing). Docs (research refresh, May 2026): - C2PA 2.4 Durable Content Credentials (soft-binding re-discovery after the embedded manifest is stripped). - New AI-labeling laws, primary-source verified: EU AI Act Art 50 (2026-08-02), South Korea AI Framework Act Art 31(3), California AB 853. - Hedge removal claims: defeating the SynthID verifier is not forensic invisibility (arXiv:2605.09203); cite SynthID-Image (arXiv:2510.09263). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 14:14:35 -07:00
test-user	d24d8a4b14	Extract _target_size helper + regression-test native resolution (v0.5.4) The native-vs-downscale decision in InvisibleEngine.remove_watermark (the issue #10/#15 fix: max_resolution=0 must not pre-downscale, since any downscale both loses quality and lets SynthID survive) had no test. Extract it into a pure helper invisible_engine._target_size(w, h, max_resolution) and cover it with tests/test_invisible_engine.py::TestTargetSize so a re-introduced forced downscale fails CI instead of silently regressing #15. Also: - Clamp the short side to >=1 in _target_size: extreme aspect ratios (e.g. 5000x3 with --max-resolution 1024) truncated it to 0 and crashed image.resize(). Pre-existing in the inline math; fixed now that it is a named, tested function. - Consolidate the two duplicated temp-file save blocks into one unconditional save (behavior unchanged: the EXIF-transposed image is still always persisted before WatermarkRemover reloads it by path), and drop the now-redundant `_tmp_path is not None` guard in finally. - Bump version 0.5.3 -> 0.5.4 (pyproject, __init__, uv.lock); document the helper as the regression guard in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:09:33 -07:00
test-user	d45f0806a0	chore(release): v0.5.3 — detect China TC260 AIGC label (Doubao) - feat(identify): detect the China TC260 <TC260:AIGC> XMP label (Doubao and other China-served generators); reports platform + ContentProducer. Removal already strips it via the existing metadata cleaner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:30:40 -07:00
test-user	c7f0d71f90	feat(identify): detect China TC260 AIGC label (Doubao et al.) China-served generators embed an XMP <TC260:AIGC>{"Label":"1",...} block (China's mandatory AI-content labeling, TC260 standard). Doubao (ByteDance) uses it -- verified on the real #13 sample. It's none of C2PA / SynthID / imwatermark / IPTC, so identify() previously returned unknown. - metadata: AIGC_MARKERS + aigc_label() (json-decodes the HTML-entity-encoded block); has_ai_metadata + get_ai_metadata now surface it. - identify: new 'aigc' signal -> is_ai True, platform 'China AIGC-labeled generator (TC260; e.g. Doubao)', carries the ContentProducer code. - Container-agnostic raw-byte scan, so it covers the whole China-AIGC ecosystem (Jimeng/Kling/Qwen/Ernie share the standard). - Tests: synthetic TC260 block (metadata + identify). Docs updated. Addresses #13. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:29:51 -07:00
test-user	37769453a9	chore(release): v0.5.2 — native-resolution invisible removal (fixes #10 ) - fix(invisible): process at native resolution by default; the forced downscale-to-1024 -> upscale-back round-trip was the main quality loss (#10). Matches the raiw.cc backend (fal fast-sdxl = sdxl-base-1.0). New --max-resolution opt-in cap for GPU/MPS memory. - docs: verified fal checkpoint, native-res, gpt-image-2 SynthID. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 10:00:25 -07:00
test-user	20a754ef98	chore(release): v0.5.1 — security + bug fixes - security: bump idna 3.11 -> 3.16 (GHSA-65pc-fj4g-8rjx) - fix(ctrlregen): correct module import paths (#11, @neosun100) - fix(cli): preserve alpha channel through visible/all/batch (#8, @rlorenzo) - fix(cli): safer re-exec via -m instead of repr(sys.argv) -c string (#9, @eskibars) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:21:53 -07:00
test-user	0762807e42	chore(release): v0.5.0 — image provenance (identify) + SD/SDXL/FLUX + EXIF/XMP detection New since v0.4.1: - identify command: aggregate C2PA, IPTC, SD/ComfyUI params, SynthID proxy, visible sparkle, open invisible watermark into one provenance verdict (--json, --no-visible). - Open SD/SDXL/FLUX invisible-watermark detection (imwatermark, extra: detect). - EXIF Software / XMP CreatorTool generator-tag reading (incl. AVIF/HEIF). - Stability AI + Microsoft/Bing C2PA issuers; SynthID metadata detection. - SynthID reference corpus + experimental pixel-carrier probe. - Fix: __version__ was stuck at 0.3.4 (banner mismatch), now synced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 08:58:45 -07:00
test-user	27ad5b7645	feat(identify): detect open SD/SDXL/FLUX invisible watermark Research found one locally-fillable detection gap: Stable Diffusion, SDXL, and FLUX all embed an open DWT-DCT watermark via the invisible-watermark (imwatermark) library -- a PUBLIC decoder, no secret key, unlike SynthID. New invisible_watermark.py decodes the known fixed patterns (verified against upstream source: diffusers SDXL WATERMARK_MESSAGE, FLUX.2 src/flux2/watermark.py, and the 'StableDiffusionV1' default string) and identify() reports the scheme as a high-confidence signal. Verified locally end-to-end: embedding SDXL's exact 48-bit message and decoding it back recovers 48/48 bits; a clean image and our own fal-SDXL outputs decode to ~21/48 (no match). Caveat baked into the report: the watermark is fragile -- gone after JPEG q90 -- so it confirms origin only on pristine files; absence is never proof. imwatermark is an optional dep (extra 'detect'; pulls non-headless opencv), so the import is guarded and the signal is skipped when absent. CLI --no-visible now means metadata-only (skips both pixel-domain detectors). Also records the broader watermarking landscape in CLAUDE.md: which services are locally detectable (SD/SDXL/FLUX), C2PA-covered (Bing/Canva/ Getty/Shutterstock unsampled), or proprietary-only like SynthID (Amazon Titan/Nova, Kakao). Midjourney embeds neither C2PA nor an invisible mark. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 16:53:59 -07:00
test-user	0e04c4d388	chore(release): v0.4.1 — security fix (diffusers, urllib3) - Bump diffusers minimum to 0.38.0 (closes GHSA-98h9-4798-4q5v). - Refresh uv.lock to pull urllib3 2.7.0 (closes GHSA-qccp-gfcp-xxvc and GHSA-mf9v-mfxr-j63j via transitive update from requests / huggingface-hub). - Allow pre-releases globally (`[tool.uv] prerelease = "allow"`) because diffusers 0.38.0 declares a dependency on safetensors>=0.8.0rc0. Drop once safetensors 0.8.0 stable is published or diffusers re-pins. uv-secure --ignore-unfixed now reports zero vulnerabilities. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:14:50 -07:00
test-user	1c1ebec148	chore(release): v0.4.0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 12:57:13 -07:00
test-user	eb1f65ae45	fix(gemini): no-op remove_watermark when nothing detected Reverse alpha blending applied at the assumed default position painted a visible inverse-sparkle artifact onto clean or edited images. The function now returns an unmodified copy when detection fails, instead of falling back to the hardcoded Gemini corner. Bump to 0.3.5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:17:25 -07:00
test-user	87d02126e3	feat(metadata): parse C2PA JUMBF manifest fields, add Images 2.0 sample, bump to 0.3.4 - metadata --check now shows claim_generator, c2pa_spec, digital_source_type, c2pa_actions, signer instead of empty table for C2PA-only files - reuses existing extract_c2pa_chunk() from noai/c2pa.py — no more duplicate PNG chunk parsing or full-file reads - adds data/samples/openai-images-2/amur-leopard.png: real gpt-image-2 output with C2PA manifest signed by OpenAI OpCo LLC / Trufo CA (spec 2.2.0) - removes stale data/samples/nano-banana-1/2.png (no longer referenced) - updates README: new Images 2.0 row in supported models table - documents known text-degradation limitation in CLAUDE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 17:21:51 -07:00
test-user	4a7950e054	chore: bump version to 0.3.3	2026-04-01 12:07:47 -07:00
test-user	7eb32fedee	refactor: enforce strict linting and type checking across codebase - Expand ruff rules (B, S, SIM, RET, COM, C4, G, PT, PIE, T20, DTZ, ICN, TCH, RUF, ANN) - Switch pyright to strict mode with relaxed test environment - Replace try-except-pass with contextlib.suppress throughout - Move type-only imports into TYPE_CHECKING blocks - Replace ambiguous Unicode chars (en dash, multiplication sign, Greek alpha) with ASCII - Move color-matcher from base deps to [gpu], remove unused requests dep - Add pyright to dev deps, update dependabot to uv ecosystem - Fix hardcoded version in test_version, unused unpacked vars in tests - Update maintain.sh, CLAUDE.md, .gitignore, .claude/settings.json - Remove obsolete .agents/rules/project.md - Upgrade all dependencies (Pygments vulnerability fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 11:42:42 -07:00
test-user	47caacf5dc	chore: bump version to 0.3.2	2026-03-26 20:31:54 -07:00
test-user	b41c8e5aba	v0.3.1: Fix opencv conflict, graceful GPU fallback, correct docs - Remove opencv-python from [gpu] extra (conflicts with headless in base deps) - Add graceful fallback in 'invisible' and 'all' commands when GPU deps missing - Cache InvisibleEngine in batch mode (avoid reloading model per image) - Fix --humanize help text (was '0.0-1.0', actual range is 0-6.0+) - Fix stale docstring referencing non-existent [invisible] extra - Add [gpu] extra install instructions to README - Fix broken NeuralBleach placeholder URL in Credits	2026-03-26 10:50:26 -07:00
test-user	49b2b43f8d	feat: split heavy GPU deps into optional [gpu] extra Move torch, diffusers, transformers, accelerate, controlnet-aux, ultralytics, and safetensors into [project.optional-dependencies.gpu]. Core install now only includes lightweight deps (~20 MB vs ~1 GB): pillow, piexif, numpy, opencv-python-headless, click, rich. This allows web apps using fal.ai cloud GPU to skip installing 1+ GB of ML packages, reducing Docker images from 3 GB to ~300 MB and deploy times from 14 minutes to ~3-4 minutes. Usage: pip install remove-ai-watermarks # core only (visible + metadata) pip install remove-ai-watermarks[gpu] # full local GPU support pip install remove-ai-watermarks[all] # gpu + dev tools	2026-03-26 09:29:57 -07:00
test-user	2bdc4bceff	Bump version to 0.3.0	2026-03-25 17:27:39 -07:00
test-user	1890848ec3	SEO-optimized README, add sample images from multiple AI models - Rewrite README for SEO: Nano Banana, SynthID, Made with AI, C2PA keywords - Add Supported Models table with 7 AI services - Add 'Made with AI' label removal to features - Rename sections for search discoverability - Add samples: ChatGPT/DALL-E, Midjourney, Adobe Firefly - Reorganize data/samples with flat structure and clear naming	2026-03-25 17:23:24 -07:00
test-user	507757738e	v0.2.2: Unify quality defaults, improve README - Unify 'all' defaults to match 'invisible' (strength=0.02, steps=100) - Reorder CLI docs: 'all' command first, individual commands second - HuggingFace token is now documented as optional - Remove 'additional setup' label from invisible section	2026-03-25 12:28:02 -07:00
test-user	2152ebcd32	v0.2.1: Code review fixes, platform-neutral docs - Fix f-string logging → %-style (face_protector, invisible_engine) - Fix logger name: hardcoded string → __name__ - Add module docstrings to humanizer.py, face_protector.py - Break long warning string into multiple lines (PEP 8) - Make docs platform-neutral (macOS/Linux/Windows) - Rename 'optional' → 'additional setup' in README	2026-03-25 12:19:29 -07:00
test-user	cace97b04e	Bump version to 0.2.0 Changes since 0.1.0: - Fix phantom model param bug in invisible/all commands - Fix macOS SSL certificate issue for YOLO downloads - Use temp file in 'all' pipeline to hide intermediate output - Add legal disclaimer and fix license attribution - Add troubleshooting and upgrade docs to README - Expand test suite to 137 tests covering all CLI modes - Clean up dependencies and pyright config	2026-03-25 12:03:44 -07:00
test-user	e5d8970add	Add project files, tests, and documentation for GitHub release - CLI with visible, invisible, all, metadata, and batch commands - Gemini watermark removal via reverse alpha blending - Invisible watermark removal via diffusion regeneration (SynthID, TreeRing) - AI metadata stripping (EXIF, PNG text, C2PA) - Face protection (YOLO/Haar) and analog humanizer - 137 tests covering all CLI modes and core engines - Ruff and Pyright clean	2026-03-25 11:15:05 -07:00

27 Commits