remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-05-26 06:07:52 +02:00

Author	SHA1	Message	Date
test-user	03fb460f77	Track the labeled SynthID corpus; complete metadata-source test coverage Corpus images were gitignored (local-only). The negatives were reviewed and cleared for publishing, so the labeled set is now committed (regular git, 65 MB across 25 files) -- making the removal regression set reproducible and CI-able. Corpus: - Track data/synthid_corpus/images/ (pos 9, neg 15, cleaned 1); keep only the synthetic refs/ calibration fills gitignored. - Reconcile manifest.csv to the on-disk files: 117 -> 25 rows (92 dangling rows for removed images pruned; dedup left one cleaned output, f6dd47a5). - Rewrite the corpus README layout/policy (images committed; review every image for private content before adding -- public repo, permanent history). Test fixtures: - Remove data/samples/not-ai-1/2/3 (personal iPhone photos, incl. GPS EXIF). - Add the clean_photo conftest fixture serving a verified-negative image from the corpus neg/ set; repoint the three "non-AI / clean photo" tests onto it (skips if the corpus is absent). Metadata-source coverage (close the last sub-variant gaps): - c2pa digitalSourceType: algorithmicMedia (procedural, not flagged AI) and compositeWithTrainedAlgorithmicMedia (AI + SynthID proxy). - exif_generator: EXIF Artist and ImageDescription fields (Software/Make/XMP CreatorTool were already covered). All 8 metadata-source kinds are now tested at both the unit and identify() level. 313 tests pass. CLAUDE.md updated (corpus tracked, clean_photo fixture). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:46:47 -07:00
test-user	3ebdee57b8	Test the untested pure logic: MPS fallback, tiling, isobmff/c2pa edges Coverage audit (pytest --cov) found real, non-model logic at 0%/low cover. Add unit tests that need no model download: - img2img_runner.py 0% -> 100%: the MPS->CPU fallback orchestration, mocked via injected load_pipeline/reload_on_cpu callables. Guards the production behavior hit this session (native-res SDXL OOMs on MPS, must retry on CPU; non-MPS errors must propagate; "mps"-worded error on a cpu device must not reload). - ctrlregen/tiling.py 0% -> 40%: the pure tile math (tile_positions, make_blend_weight, resize_center_crop) that decides how large images are split and blended. (run_tiled stays model-bound, untested.) - isobmff.py 93% -> 100%: size==0 (box-to-EOF) and truncated 64-bit largesize parsing branches for AVIF/HEIF/JXL C2PA stripping. - c2pa.py: non-PNG-signed .png reads as clean (has_c2pa_metadata / extract_c2pa_chunk) instead of mis-parsing. 309 tests pass (+23). Document in CLAUDE.md that these pure helpers are unit-tested without downloads so future sessions don't skip them as "ML". No src/ change, no release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:21:32 -07:00
test-user	6cef1d59f0	fix(c2pa): drop non-printable claim_generator garbage On some manifests (observed: Microsoft Designer) the first CBOR "name" key precedes a binary hash field, not the generator string, so _cbor_text_after returns control-char garbage. Guard with isprintable() to drop it; issuer detection (byte-search) and the SynthID verdict are unaffected. Adds TestParseChunkGuards covering kept-vs-dropped cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 15:55:07 -07:00
test-user	f07ce10c72	feat(metadata): SynthID-source detection, C2PA parser consolidation, corpus + tests Detect SynthID-bearing images via their C2PA companion: a manifest signed by a SynthID-using vendor (Google/OpenAI) on AI-generated content implies an invisible SynthID pixel watermark. Verified end-to-end against the vendor oracles (openai.com/verify, Gemini "Verify with SynthID"). - metadata: synthid_source() + synthid_watermark verdict in get_ai_metadata, surfaced as a `metadata --check` callout. Format-agnostic (PNG caBX parser + JPEG/WebP/AVIF/HEIF/JXL binary scan). - constants: SYNTHID_C2PA_ISSUERS {Google, OpenAI}; +opened/placed actions. - c2pa: single CBOR-aware parser (_cbor_text_after) replaces glitchy regex (fixes fGPT-4o claim_generator); removed duplicate _scan_png_c2pa_chunk from metadata; shared synthid_verdict / synthid_vendors_in helpers. - corpus: scripts/synthid_corpus.py ingest tool + data/synthid_corpus/ (manifest tracked, images gitignored) for a labeled reference set. - tests: +38 across C2PA parser internals, extract/inject round-trip, ISOBMFF container stripping, all IPTC AI markers, and invisible watermark strength tiers (SynthID/StableSignature/TreeRing/StegaStamp/RingID/RivaGAN/...). Pixel-level SynthID detection remains out of reach locally (Google's decoder is proprietary); a from-scratch spectral pilot confirmed it does not separate real content. See CLAUDE.md for the full evaluation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:32:46 -07:00
test-user	e5d8970add	Add project files, tests, and documentation for GitHub release - CLI with visible, invisible, all, metadata, and batch commands - Gemini watermark removal via reverse alpha blending - Invisible watermark removal via diffusion regeneration (SynthID, TreeRing) - AI metadata stripping (EXIF, PNG text, C2PA) - Face protection (YOLO/Haar) and analog humanizer - 137 tests covering all CLI modes and core engines - Ruff and Pyright clean	2026-03-25 11:15:05 -07:00

5 Commits