remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-06-04 18:18:00 +02:00

Author	SHA1	Message	Date
Victor Kuznetsov	8523f48fb6	data(corpus): archive June 2026 SynthID strength-study subjects Back docs/synthid.md section 2.2 with the actual test set: the per-image oracle-verified subjects were only in a local working dir, while the doc claimed they were recorded in data/synthid_corpus/. Ingest the key pos+cleaned pairs so the claim holds. - pos: openai_1/2/3 originals (gpt-image, openai-verify) + gemini_1/2/3/4 originals (Gemini app, gemini-app); all probe as C2PA-SynthID present. - cleaned: OpenAI at strength 0.05 (openai_2 only s010 captured) + Gemini at 0.15 --max-resolution 1536; oracle: SynthID NOT detected. Metadata stripped, so no C2PA on the cleaned rows. - Excluded the third-party issue #14 image (pic3): oracle-verified but not committed to the public corpus. - docs/synthid.md 2.2: state OpenAI n=4 = 3 archived + 1 external-only. - CLAUDE.md: drop the drift-prone "~65 MB" corpus size from the sdist note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 17:09:58 -07:00
Victor Kuznetsov	5ec8269949	chore: mark controlnet pipeline + GFPGAN restore-faces as experimental Both content-preservation features are now flagged EXPERIMENTAL and opt-in. --pipeline controlnet was already opt-in (default=default); --restore-faces flips from on-by-default to OFF by default, matching the repo's prior pattern for experimental preservation passes (the removed protect_text/protect_faces). - cli.py: --restore-faces/--no-restore-faces default False; EXPERIMENTAL in the --restore-faces / --controlnet-scale / --pipeline help; batch default False. - invisible_engine.py: remove_watermark restore_faces default False + docstring. - CLAUDE.md / README.md / docs/synthid.md: label both experimental/opt-in. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 16:59:28 -07:00
Victor Kuznetsov	411ef16ec3	feat: GFPGAN face-identity restoration post-pass Add an optional, commercial-safe face-restoration post-pass that recovers face identity the diffusion removal pass drifts (canny holds structure, not likeness) while still scrubbing the pixel watermark in the face regions. - face_restore.py: GFPGANer singleton (CPU unless CUDA), the basicsr torchvision.transforms.functional_tensor shim, and the pure feather _composite_faces helper (unit-tested without the model). GFPGAN re-synthesizes each face from a StyleGAN2 prior, so composited face pixels are GAN-generated (no watermark, no pixel-copy) -- oracle-clean at weight 0.5 with identity preserved. - InvisibleEngine.remove_watermark: restore_faces / restore_faces_weight, best-effort, auto-skips when the extra is absent or no face is detected. - CLI --restore-faces/--no-restore-faces + --restore-faces-weight on invisible/all/batch (on by default). - restore extra (gfpgan/facexlib/basicsr), numpy<2-pinned (scipy<1.18, numba<0.60) and kept out of `all`; basicsr needs Python <3.13 + setuptools<69 to build, so pin .python-version 3.12. Commercial-safe: GFPGAN Apache-2.0, RetinaFace MIT. The CodeFormer alternative is non-commercial and is not shipped. The earlier IP-Adapter FaceID layer was removed (footgun: needs high strength, corrupts faces at the low removal strength). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 16:59:28 -07:00
Victor Kuznetsov	d90d5d886a	feat: controlnet pipeline for text/face-structure preservation Add `--pipeline controlnet` (SDXL base + xinsir canny ControlNet via StableDiffusionXLControlNetImg2ImgPipeline): the canny edge map conditions the img2img regeneration so text and face STRUCTURE stay sharp, while the watermark is still removed by the regeneration (`strength`) -- no original pixels are copied or frozen, so SynthID does not survive. Oracle-verified clean on OpenAI with better text/structure fidelity than plain img2img at equal strength. `--controlnet-scale` tunes structure preservation; fp32 on mps/cpu (fp16-fixed VAE on cuda/xpu). Shares the img2img runner (live progress + MPS->CPU fallback) and the fp16-VAE-fix / device-move helpers with the default pipeline. Remove the superseded subsystems -- ctrlregen (SD1.5 clean-noise), text-protection (differential / region-hires) and face-protection: they either destroyed real content or shielded the watermark by re-using original pixels. controlnet replaces them by regenerating everything under edge conditioning. Canny preserves face structure but not identity; face IDENTITY is a separate face-restoration post-pass (CodeFormer/GFPGAN), researched + prototyped but not yet shipped. An IP-Adapter FaceID attempt was built and removed (footgun: needs high strength, corrupts faces at removal strength). Docs: docs/controlnet-removal-pipeline-research.md, scripts/controlnet_sweep.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 16:59:28 -07:00
Victor Kuznetsov	96038f960f	feat(invisible): vendor-adaptive default strength (OpenAI 0.10 / Google 0.15) The default img2img strength is now chosen from the detected SynthID vendor (C2PA issuer) instead of a single fixed 0.30: OpenAI gpt-image -> 0.10, Google Gemini -> 0.15, unknown source -> 0.15. Explicit --strength always wins. Basis: an oracle-verified June 2026 controlled study (clean v0.8.6, text/face protection OFF, per-image openai.com/verify or Gemini-app verdict). OpenAI's SynthID clears at 0.05 across 1024-1600 px (n=4, resolution-independent); Google's is ~3x more robust and needs 0.15 on the capped-1536 path (n=4). The dominant factor is the VENDOR, not resolution. The earlier single 0.30 default and the "resolution dependence" lore came from contaminated tests run with the protect-text bug ON (issue #14) -- re-running those same 1600x1600 images clean removes SynthID at 0.05. `vendor_for_strength(path)` reads metadata.synthid_source on the ORIGINAL input and is threaded through cli (invisible/all/batch) -> invisible_engine -> watermark_remover -> resolve_strength(strength, profile, vendor), so display and execution use the same vendor (the engine sees a temp path whose C2PA the visible pass already stripped, so detection must happen in the CLI on the pristine source). Caveat: Google's 0.15 was validated only on --max-resolution 1536; native 2816 Gemini was not locally measurable (OOM on Apple Silicon) and is pending GPU validation on raiw.cc. Docs: docs/synthid.md sections 2.2/4.4/5.2 corrected (the contaminated resolution-dependence findings replaced with the clean oracle-verified table); README and CLAUDE.md updated; CLI --strength help reflects the adaptive default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 19:29:47 -07:00
Victor Kuznetsov	4b0b370ac0	fix(invisible): disable protect-text/protect-faces by default; add docs/synthid.md Both text and face protection were shielding SynthID from removal. The text-protection high-res re-scrub regenerates pixels at an upscaled resolution where the per-region pass may not be strong enough to re-destroy the SynthID payload, allowing it to survive in text areas. Face protection has an even more direct mechanism: it pastes back the original (pre-diffusion, watermarked) face pixels after the global pass, guaranteeing SynthID survives in face regions regardless of strength. Both --protect-text and --protect-faces are now off by default and opt-in. Rename from --no-protect-text / --no-protect-faces to --protect-text / --protect-faces. Extract shared click.option decorators to module-level constants (_protect_text_option, _protect_faces_option) to eliminate copy-paste between cmd_invisible and cmd_all. Add docs/synthid.md: primary-source-cited technical reference for SynthID-Image covering mechanism (post-hoc encoder/decoder, 136-bit payload, pixel-space, no model-weight modification), robustness numbers (arXiv:2510.09263: ~99.98% TPR at 0.1% FPR across 30 transforms), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable >98% TPR@1%FPR), detectability limits, oracle scope, adoption landscape, and practical implications including the protect-text/faces SynthID-preservation finding. Verified June 2026 on gpt-image 1600x1600 via openai.com/verify: with --protect-text SynthID detected; without, SynthID removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 10:28:34 -07:00

6 Commits