remove-ai-watermarks

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-07-25 00:50:48 +02:00

Author	SHA1	Message	Date
Victor Kuznetsov	69559226d7	Clarify metadata command supports video/audio, drop misfiring format warning (#33 ) The `metadata` command handles more than images: `remove_ai_metadata` strips C2PA / AIGC provenance from MP4/MOV/M4V/M4A and from WebM/MP3/WAV/FLAC/OGG via ffmpeg. But the help said "from images" and the shared `_validate_image` call printed "Warning: .mp4 may not be supported" on exactly those supported containers. The argument's `exists=True` already enforces the file exists, so the validation call only added the wrong warning here. Update the docstring to list the real format coverage and drop the image-only validation from this command. The image commands keep it. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 13:20:04 -07:00
Victor Kuznetsov	29da3c52b6	Raise default SynthID-removal strength 0.05 → 0.10 (current Google SynthID) (#32 ) * Raise default SynthID-removal strength 0.05 -> 0.10 (current Google SynthID) The old default (0.04/0.05) no longer removes the CURRENT Google SynthID (Nano Banana / Gemini 3): verified 2026-05-30 via the Gemini 'Verify with SynthID' oracle on a real image -- 0.05 still detected, 0.10 not detected (OpenAI's was already cleared at 0.05). Add DEFAULT_STRENGTH=0.10 in watermark_profiles, route the engine + CLI defaults to it. At 0.10 small text deforms more, which is why text protection (_run_region_hires) runs by default. CLAUDE.md SynthID note corrected. CAVEAT: n=1 Google + n=1 OpenAI; broad corpus oracle validation pending (task tracked). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Drop unused LOW/MEDIUM/HIGH strength profiles; CLI --strength defaults to DEFAULT_STRENGTH The fixed strength presets (and get_recommended_strength) were dead -- nothing in the pipeline used them, only tests. One knob now: DEFAULT_STRENGTH (0.10), overridable per-call via the CLI --strength flag, which now defaults to that constant (single source of truth). Removed the WatermarkRemover.LOW/MEDIUM/HIGH class attrs and the get_recommended_strength tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 13:15:58 -07:00
Victor Kuznetsov	e4f558dccf	Add per-region high-resolution text protection (regenerate crisp, scrub everywhere) (#31 ) Replace the default text-protection path. Differential Diffusion froze text in latent space, which left SynthID intact inside text (violating remove-everywhere) and still softened sub-8px strokes (VAE latent limit). _run_region_hires instead scrubs the whole image, then re-scrubs each detected text block at high resolution and feather-composites it back: every pixel is regenerated (watermark removed everywhere) while small text stays crisp (high-res strokes span >1 latent cell). merge_text_regions + feather_paste are pure and unit-tested; each re-scrubbed patch is phase-correlated back to the original crop to null the ~1-2px round-trip offset. Synthetic 18px multilingual text: text-region SSIM 0.28 -> 0.48, visually garbled -> readable across Latin/Cyrillic/CJK. Legacy _run_differential / build_change_map remain but are no longer the default. Prod use still requires confirming via the SynthID oracle that re-scrubbed text zones read watermark-free. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 12:59:29 -07:00
Victor Kuznetsov	c928ee6e42	chore(release): v0.7.0 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> v0.7.0	2026-05-30 12:35:32 -07:00
Victor Kuznetsov	89f427852f	Fix #30 white box: stop zeroing alpha in the watermark region on save On RGBA inputs the CLI forced the watermark bbox alpha to 0 on save, so the removed-sparkle area became a transparent hole that renders as a solid white box on any non-transparent viewer. The Gemini app exports opaque RGBA, so every user hit it. Reverse-alpha already recovers the real pixels there (and `erase` inpaints them), so there is no artifact to hide -- the hole was the bug, introduced as an over-correction in `d091b9f`. `_write_bgr_with_alpha` now rejoins the input alpha plane unchanged (drops the `clear_region`/`pad` params); the `visible` / `erase` / `all` / `batch` call sites drop the cleared-region argument and the orphaned region bookkeeping. The registry `remove()` still returns the mark bbox (used for inpaint_residual positioning); the CLI just no longer clears alpha with it. Inverts the test that locked in the old behavior into a #30 regression guard (watermark-region alpha stays opaque, no pixel forced transparent). Verified end-to-end on a real Gemini RGBA export: sparkle gone, zero transparent pixels, clean over a white background. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 12:27:37 -07:00
Victor Kuznetsov	25a1acc53b	Detect TC260 AIGC label in JPEG EXIF and late/attribute PNG XMP A corpus audit surfaced China TC260 AIGC-labeled images that `identify` missed. Three detection gaps in `aigc_label`, all fixed: - raw-JSON `{"AIGC":{...}}` in JPEG EXIF (UserComment): brace-matched from the scan head with `json.raw_decode`, gated on a TC260 field like the PNG-chunk path. (Doubao-class output via that export surface.) - XMP attribute form `TC260:AIGC="{...}"` (PicWish): folded into the element regex as a second alternation. - TC260 XMP packet appended after a large `IDAT`, past the 1 MB scan window: `scan_head` now appends late PNG metadata chunks via `_png_late_metadata`, mirroring the existing ISOBMFF late-box scan. Adds `scripts/corpus_gap_scan.py`: runs `identify` over a corpus, writes the per-file report CSV, and flags `unknown` files that carry a known marker in their metadata region (the audit that found these gaps). Scanning only the metadata region — not the whole file — avoids the random short-token collisions inside compressed PNG/JPEG streams. On the local corpus this lifts 3 files from `unknown` to AI (China AIGC) and leaves zero false gap candidates. Synthetic piexif/PngInfo fixtures cover all three forms. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 11:44:53 -07:00
Victor Kuznetsov	58bdf51c59	Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28 ) * fix(trustmark): gate detection on re-encode durability to kill false positives TrustMark's wm_present flag is a BCH validity check that spuriously validates on a content-correlated fraction of un-watermarked images (AI textures trip it more than camera photos). On a 1343-image set all 20 raw detections were false, several on Gemini/OpenAI/Doubao output that cannot carry Adobe's watermark, with random-bytes secrets. A genuine TrustMark is a durable soft binding that survives re-encoding, so detect_trustmark now re-decodes after a mild JPEG round-trip and requires the same schema both times. Every observed false positive collapsed under this gate; the second decode runs only on the rare hit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP Detection extensions verified on real signed files (2026-05-29): - Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type) does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai: the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium- confidence AI-editing signal (samsung_only branch). - Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng) added as C2PA issuers + issuer->platform mappings. - fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify + metadata. Verified: all 535 real C2PA files carry jumb. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): gate detection on text structure to cut ~95% of false positives (#23) Coverage alone over-fired: any textured bottom-right corner cleared the threshold, so the detector false-positived on ~28% of arbitrary images. The real '豆包AI生成' mark is six glyphs in one row, so detect now also requires the text-structure signature (_glyph_structure): many connected components, no single dominant blob, concentration in a thin horizontal band. False positives dropped 343 -> 17 across the corpus while keeping real-mark recall and the doubao-1.png sample. Also accept a no-op force kwarg for remover-interface symmetry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(samsung): add Samsung Galaxy AI visible-badge remover New samsung_engine.py removes the bottom-left sparkle + localized 'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates on a band + left-anchor signature (the Doubao CJK-component gate does not transfer: Latin badge letters connect into few blobs). Explicit-only -- tuned on few real badges with a ~4% FP floor, so it is not used in auto. Synthetic byte-blob fixtures (real badges are user content, not shipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(visible): unified known-watermark registry + LaMa inpaint backend watermark_registry.py is a single catalog of known visible marks, each tying {usual location, in_auto flag, recovery strategy, detect adapter, remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone. Cross-engine confidences are not comparable, so the gemini adapter applies the corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is loose and weakly fired ~0.36 on Doubao text, hijacking auto). --backend auto\|cv2\|lama chooses background reconstruction for the mask-based marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges behind). cv2 stays the zero-dependency fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): exact reverse-alpha removal from captured alpha map The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so given its alpha map the original pixels are recovered exactly: original = (wm - alogo)/(1-a) -- no inpaint hallucination. The alpha map + logo colour were solved from real black+gray Doubao captures on a controlled background: on black captured = alogo, and the black/gray pair solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white); the white capture cross-validates (mark vanishes to a flat fill). Bundled as assets/doubao_alpha.png + geometry constants. remove_watermark_reverse_alpha applies it scaled to image width; exact at the captured width, so the registry routes doubao through it only when reverse_alpha_available (width within the calibrated band) and the mark is detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions to widen exact coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection Per the principle that we only remove/detect what we can do exactly, the visible-mark path is now reverse-alpha only: - Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4) -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP 7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate. - Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove, _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao outside the alpha resolution band is skipped, never inpainted. - Removed samsung_engine.py + tests + --mark samsung (no alpha map captured; Samsung C2PA/genAIType metadata detection in identify is unaffected). - The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region inpainting stays a user-directed tool, separate from the known-mark registry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution A pure width-scale of the captured alpha map is only sub-pixel-accurate at the captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED scale+position search (_aligned_alpha_map) before inverting the blend, so the single 2048 capture works at any resolution -- verified clean on the 1773x2364 (3:4) corpus size, the biggest coverage gap (23 files). reverse_alpha_available is now just 'asset present' (no width band); the registry still gates removal on detect so a clean corner is never touched. Drops the _ALPHA_WIDTH_TOLERANCE gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): keep native recovery exact -- fixed geometry at captured width Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39). remove_watermark_reverse_alpha now uses exact width-relative geometry within _ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best of both: native back to 0.94, other resolutions still aligned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56) On a faint/busy-background mark the NCC alignment peak can wander a few px off the true mark and leave a residual (2/56 real corpus files). Off the captured width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the NCC-aligned alpha map, applies each, and keeps whichever leaves the least residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip now removes 56/56 detected Doubao clean across every corpus resolution (was 54). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(doubao): skip residual inpaint at native width for exact recovery At the captured width the fixed-geometry reverse-alpha is pixel-exact, so inpainting over it only replaced exactly-recovered interior pixels with a cv2 hallucination -- measured worse on a textured background (native error vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on full-footprint inpaint). Native now returns the bare recovery untouched; off-native, where NCC alignment is only sub-pixel-approximate, the footprint inpaint stays to clean the seam. Real round-trip still 56/56 across all corpus resolutions; negatives 0/60, Gemini unaffected. Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression guard. Sync CLAUDE.md + README (the table cell and prose described the pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the session scheduled_tasks.lock, and add the text-protection research note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 19:49:09 -07:00
Victor Kuznetsov	ef6fdaeeec	Detect text at native resolution (capped), fixing small-text recall on large images (#27 ) The text-protection detector scaled every image to a fixed 736 px long side, so small text on large canvases (e.g. ~16 px on 2048) was downscaled below the detector and missed -> deformed by the SDXL pass (issue #14). Detect at the native long side capped at 1536, never upscaled (_detection_input_size, a pure unit-tested helper). Detection is script-agnostic (DB segments regions, not characters), so this is language-agnostic: a new benchmark (scripts/text_detection_benchmark.py) measures recall across Latin/Cyrillic/CJK/ Hangul/Arabic/digits x sizes x canvas -> overall hit-rate 0.91 -> 1.00, worst cell (2048/16 px) 0.06 -> 1.00. Docs updated. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 12:28:30 -07:00
xchacha20-poly1305	0c7ff1874e	feat(device): support xpu backend (#24 ) * feat(device): support xpu backend * Fall back to CPU seed generator when device RNG unsupported (xpu) Some torch-xpu builds have no device-side RNG, so torch.Generator(device="xpu") raises when --seed is used. _make_seed_generator tries the device generator and falls back to a backend-agnostic CPU generator. Adds a fallback unit test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Victor Kuznetsov <kuznetsov.va@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 11:13:23 -07:00
Victor Kuznetsov	1598c499fe	Record Doubao reverse-alpha finding: needs black-background capture, not more content images (#26 ) Session 2026-05-29: content-image reverse-alpha distillation fails (persistent ghost) because the mark is never observed on a dark background (median darkest bg over glyph pixels 58/255), so alpha is unidentifiable; no dark halo (white- logo model is right); LaMa O is a hallucination. Gemini is clean only because its map is the watermark on pure black (alpha=capture/255). Real unlock = black Doubao captures (requested in #13). Shipped direction stays mask + cv2/lama. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 11:12:58 -07:00
Victor Kuznetsov	a46268f6eb	Add cross-platform CI test matrix + PyPI classifiers (#25 ) * Add cross-platform CI test matrix, PyPI classifiers CI: new test.yml runs lint (ubuntu) + a test matrix (ubuntu/macos/windows x py3.10/3.12, core+dev, GPU tests skip) on push to main and PRs, closing the gap where only the release publish.yml ran (ubuntu, no tests). Add PyPI classifiers (OS/Python/topic). README Tests badge, CLAUDE.md CI note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Make availability tests reflect installed deps, not assume gpu extra The new core+dev CI matrix has no diffusers, so the invisible-engine availability tests (asserting is_available() is True unconditionally) and the two mocked invisible CLI tests (whose command gates on is_available before the mock) failed. Assert availability == actual importability of torch+diffusers, and patch the CLI availability gate so the mocked-engine tests run regardless of the gpu extra. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 11:04:12 -07:00
Victor Kuznetsov	96b3653b9e	chore(release): v0.6.12 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> v0.6.12	2026-05-28 18:54:44 -07:00
Victor Kuznetsov	9aaa53fe32	fix(metadata): preserve upload format and quality on strip remove_ai_metadata now writes JPEG at quality 95 with 4:4:4 (no chroma subsampling) instead of the lossy PIL defaults (q75, 4:2:0), and preserves WebP losslessly instead of silently rewriting it as PNG. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 18:46:26 -07:00
Victor Kuznetsov	41e4365cd4	fix(identify): explain the unknown verdict inline (#22 ) A bare "unknown" verdict reads as the tool being broken. Print a one-line note right under the verdict explaining that no locally-readable AI signal was found, that this is not the same as clean (metadata is often stripped), and that SynthID-class pixel watermarks have no local detector. The why was previously only in the caveats section below. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 14:16:14 -07:00
Victor Kuznetsov	888c8c2556	chore(types): clear strict-pyright debt across src (0 errors) Make `pyright src/` strict-clean via a hybrid: pure-logic files are fully typed (piexif gets a local typings/ stub; PIL info-dict loops guard isinstance(key, str); progress returns Callable[..., None]; availability checks use importlib.util.find_spec instead of unused imports), while the irreducibly-untyped cv2/torch/diffusers boundary files carry a documented per-file `# pyright:` relax pragma (or a ctrlregen executionEnvironment) that disables only the unknown-type rules. Public ndarray-returning signatures on the relaxed engines are annotated NDArray[Any] so strict consumers (cli.py) stay clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 14:00:15 -07:00
Victor Kuznetsov	f326bab189	chore(release): v0.6.11 Ships the China TC260 AIGC PNG-chunk and HuggingFace hf-job-id provenance detectors (`223cbcf`). Also syncs src/__init__.__version__, which had drifted to 0.6.9 (not bumped in the 0.6.10 release). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> v0.6.11	2026-05-28 12:43:52 -07:00
Victor Kuznetsov	223cbcf171	feat(metadata): detect China TC260 AIGC PNG chunk and HuggingFace hf-job-id aigc_label now reads the TC260 label from a raw-JSON `AIGC` PNG tEXt chunk (as Doubao/ByteDance write it, with no namespaced XMP marker) in addition to the `<TC260:AIGC>` XMP block, via a shared _parse helper gated on a TC260 field so a generic AIGC key cannot false-positive. New huggingface_job() reads the hf-job-id PNG chunk; identify surfaces it as a medium-confidence hf_job signal (parallel to the visible sparkle, never overriding a hard metadata verdict). Both wired into has_ai_metadata/get_ai_metadata; the PNG save whitelist already strips them on removal. Found by auditing 646 corpus originals: 28 AIGC and 3 hf-job files the library previously reported as Unknown. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 12:40:17 -07:00
Victor Kuznetsov	0eec3001bb	feat(invisible): protect text automatically by default (#21 ) Mirror protect_faces: protect_text defaults to True in invisible_engine and watermark_remover, so the SDXL pipeline detects text per image and switches to Differential Diffusion only when glyphs are found. Text-free inputs fall back to plain img2img with no differential-pipeline load, so the autonomy is free. The CLI now exposes a single off-switch --no-protect-text instead of the positive flag, keeping the interface minimal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> v0.6.10	2026-05-28 12:24:09 -07:00
Victor Kuznetsov	a0bf62e601	feat(invisible): preserve text/CJK via Differential Diffusion (--protect-text) (v0.6.10) SDXL img2img regenerates every pixel, so small text and CJK glyphs deform at the strengths that defeat SynthID (issue #21). With --protect-text a CJK-native PP-OCRv3 detector (2.4 MB ONNX, cv2.dnn, no torch, cached on first use) locates text regions and the pass switches to the SDXL Differential-Diffusion community pipeline: a per-pixel change map keeps text regions largely intact while the background is regenerated to strip the watermark. Gated to the SDXL default model; falls back to plain img2img with a warning when unavailable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 11:59:15 -07:00
Victor Kuznetsov	7db4e231e8	fix(deps): require transformers 5.x with stable tokenizers for SDXL load diffusers 0.38's auto-pipeline registry imports a transformers 5.x-only symbol, so the gpu extra needs transformers>=5. Cap tokenizers to the stable 0.22 line so the global prerelease="allow" no longer drags in the 0.23.0rc0 whose CLIP tokenizer breaks SDXL loading. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 11:58:53 -07:00
test-user	27539c0da9	docs: tidy roadmap to open items only (drop shipped v0.6.7-0.6.9) The Roadmap is the project TODO; shipped features (Integrity Clash, streaming-MP4 scan window, meta-box XMP blanking) no longer belong under "not yet implemented". Removed them and kept the still-open remainder as its own item (AVIF/HEIF Exif item inside the meta box). Net open TODO: SynthID v2 regression test, local SynthID pixel detector, grow the SynthID corpus, real non-PNG C2PA fixtures, pyright maintenance debt, meta-box Exif item, Canon/Samsung device signers, Resemble PerTh (dead end), video pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 18:18:28 -07:00
test-user	5bfed00553	feat(metadata): blank AI-label XMP inside the HEIF/AVIF meta box (v0.6.9) HEIF/AVIF store XMP as a meta-box `mime` item whose bytes live in mdat/idat, out of reach of the top-level uuid/jumb box stripper. An AI-label XMP packet there (TC260 AIGC, IPTC "Made with AI", IPTC 2025.1) was therefore left in place. isobmff.blank_ai_xmp_packets locates each XMP packet by its <?xpacket begin ... end?> delimiters and, if it carries an AI marker (_AI_LABEL_MARKERS), overwrites it with spaces of the SAME length. Equal length means no box size or iloc offset shifts -- the coded image stays bit-for-bit intact, the item stays structurally valid, only the AI label content is destroyed. Plain (non-AI) XMP is left alone, mirroring the top-level XMP-uuid content match. Wired into remove_ai_metadata's ISOBMFF branch after strip_c2pa_boxes. Chosen over exiftool (a non-bundled binary dep) to stay pure-Python and droplet-compatible; over full iinf/iloc surgery to avoid offset-rewrite corruption risk. The AI labels we target are all XMP, so this closes the practical gap. An Exif item inside the meta box (rare) still needs iinf/iloc surgery or exiftool -- documented. 4 new tests (TestMetaBoxXmpBlanking): AI packet blanked (same length, marker gone, surrounding image bytes intact), plain XMP preserved, no-packet no-op, and end-to-end remove_ai_metadata on a .heic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 18:15:48 -07:00
test-user	31f0a82906	feat(metadata): detect C2PA/AIGC/IPTC manifests after a large mdat in MP4 (v0.6.8) Provenance detection no longer relies on a fixed first-MB read. In a streaming / non-faststart MP4 the C2PA manifest sits AFTER a multi-megabyte mdat, beyond the 1 MB scan window, so it was missed. - isobmff.scan_c2pa_region(path): a file-seeking top-level box walker that returns the payloads of uuid/jumb (provenance) boxes, seeking past mdat by size without reading it -- works on multi-GB files. Returns b"" for non-ISOBMFF or on read error. Mirrors the box-size encoding of the existing in-memory _iter_top_level_boxes (largesize / size==0). - metadata.scan_head(path, size): the shared input for every C2PA/AIGC/IPTC byte scan -- first __TEXT __DATA __OBJC others dec hex bytes plus, for ISOBMFF, the late provenance-box payloads. Behavior-neutral (f.read(size)) for non-ISOBMFF inputs. - Routed all six metadata scan sites (has_ai_metadata, aigc_label, iptc_ai_system, synthid_source, exif_generator XMP, get_ai_metadata soft-binding) and identify's head read through scan_head. 6 new tests: late box found by scan_c2pa_region / scan_head, the fixed window provably misses it, non-ISOBMFF -> b"", front-placed (faststart) regression. The remaining gap stays documented: EXIF/XMP stored as items inside the meta box (AVIF/HEIF stills) still needs meta-box surgery or exiftool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:42:29 -07:00
test-user	18160fe269	feat(identify): integrity-clash detection for contradictory provenance (v0.6.7) Surface contradictions between independent provenance signals instead of collapsing to a single verdict -- a strong tell of spoofed, transplanted, or laundered metadata. Inspired by arXiv:2603.02378. Two rules in the new _integrity_clashes helper: - Conflicting AI-origin attributions: two or more distinct AI vendors named by independent generator stamps (e.g. a C2PA OpenAI manifest on an image whose EXIF says Make="Ideogram AI"). - Camera + AI: a camera-capture C2PA device (Pixel/Leica/Sony/Nikon/Truepic) coexisting with an AI-generation marker -- a genuine capture is not AI. High-precision by design: only hard generator stamps feed it (C2PA issuer when the source is AI, SynthID proxy, EXIF/XMP generator, IPTC AISystemUsed, xAI, AIGC). The fuzzy visible sparkle and the open invisible watermark are excluded -- the latter can be a by-product of our own SDXL removal pass. Vendor normalization (_vendor_of over _AI_VENDOR_TOKENS) keeps consistent signals from clashing (C2PA "Google (Gemini)" + SynthID-Google agree); the C2PA vendor is read from the issuer attribution, not the resolved platform, so a camera label like "Google Pixel" cannot mis-normalize to an AI vendor. Surfaced as ProvenanceReport.integrity_clashes (red in the table view, included in --json). 19 new tests; all real single-origin fixtures (chatgpt/firefly/ doubao/grok/mj) verified to produce zero clashes (false-positive guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:27:25 -07:00
dependabot[bot]	6694a79514	chore(deps): bump ultralytics in the minor-and-patch group (#20 ) Bumps the minor-and-patch group with 1 update: [ultralytics](https://github.com/ultralytics/ultralytics). Updates `ultralytics` from 8.4.55 to 8.4.56 - [Release notes](https://github.com/ultralytics/ultralytics/releases) - [Commits](https://github.com/ultralytics/ultralytics/compare/v8.4.55...v8.4.56) --- updated-dependencies: - dependency-name: ultralytics dependency-version: 8.4.56 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-27 12:04:47 -07:00
test-user	7b47fa9f6a	fix(io): Unicode-safe cv2 image IO + un-eat the [gpu] install hint (v0.6.6) Two CLI/IO robustness bugs surfaced by issues #17 and #19. #17 -- non-ASCII image paths (Chinese/Cyrillic/accented) failed on Windows: cv2.imread/imwrite use the platform ANSI code-page API, so the decode came back empty with a "can't open/read file" warning. New image_io.imread/imwrite route through np.fromfile+cv2.imdecode / cv2.imencode+tofile (Unicode-safe, byte- identical output, cv2.imread None-semantics preserved); all 8 cv2 read/write call sites now go through it. Behavior-neutral on macOS/Linux (already accept UTF-8 paths), so the fix is correct-by-construction for the Windows-only bug. #19 (incidental) -- rich parsed the "[gpu]" in the GPU-extra install hint as a style tag and dropped it, so the printed command was the un-installable "pip install 'remove-ai-watermarks'". Escaped as \[gpu] at both call sites. Tests: test_image_io.py (non-ASCII round-trip, alpha, missing/empty/garbage semantics); test_cli.py::TestGpuHintMarkup (install hint keeps the extra). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 11:52:48 -07:00
test-user	d847b39292	docs: lawful-use disclaimer + primary-source Legal-table corrections README: - surface a lawful-use / no-liability disclaimer near the top - reword two feature bullets away from detection-evasion framing ("bypass AI image classifiers" -> neutral post-processing; drop the platform-targeting language from the "Made with AI" bullet) - Legal table, each corrected against the primary text: - CA AB 2655 was struck down on Section 230 ONLY (Kohls v. Bonta, E.D. Cal., Aug 2025); the court did not reach the First Amendment (the companion AB 2839 was separately enjoined on 1A grounds) - COPIED Act: add the bill number (S. 1396, 119th Cong.) - South Korea AI Framework Act: in force 22 January 2026 (exact date) CLAUDE.md: sync the South Korea date to 22 January 2026. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 11:50:20 -07:00
test-user	fee0e139af	docs: refresh README roadmap + metadata-strip coverage for v0.6.x - Metadata-strip feature now lists the audio/video container coverage shipped in v0.6.0-v0.6.4 (MP4/MOV/M4V/M4A via ISOBMFF box walker; WebM/MP3/WAV/FLAC/OGG losslessly via ffmpeg). - Roadmap updated: the AVIF/HEIF item now reflects that top-level XMP/C2PA boxes and non-ISOBMFF audio/video are handled, with only meta-box-item EXIF/XMP left (needs exiftool). Added the open backlog: multi-signal "Integrity Clash" reporting (arXiv:2603.02378), Canon/Samsung device signers pending a real sample, the streaming-MP4 scan-window limit, and Resemble PerTh audio as evaluated-but-infeasible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 10:42:40 -07:00
test-user	e1c99b5937	fix(identify): gate C2PA issuer->generator attribution on AI source type (v0.6.5) Prevents an unmapped C2PA device whose manifest incidentally contains a mapped issuer substring (e.g. the "Adobe XMP" toolkit string in a Canon/Sony camera capture) from being mislabeled as that AI generator ("Adobe Firefly"). _attribute_platform now names a specific AI-generator platform only when the digital-source-type is trainedAlgorithmicMedia; otherwise it degrades to the neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the AI source-type and is unaffected (verified: chatgpt-1.png->OpenAI, firefly-1.png->Adobe Firefly still attribute). Closes the only real downside of leaving Canon/Samsung/Bria device signers unmapped: detection and removal were already unaffected; now the platform label degrades gracefully too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.6.5	2026-05-27 10:29:12 -07:00
test-user	f9cf14c372	feat(metadata): strip container metadata from WebM/MP3/WAV/FLAC/OGG via ffmpeg (v0.6.4) remove_ai_metadata now handles non-ISOBMFF audio/video (which the box walker can't reach) by shelling out to ffmpeg with a lossless stream copy (`-map_metadata -1 -map_chapters -1 -c copy`): codec data is untouched, only container tags/chapters (ID3 / RIFF / Vorbis comments / EBML tags) are dropped. Requires ffmpeg on PATH; raises a clear RuntimeError if absent or if ffmpeg can't parse the input (instead of crashing in the image path). Verified end-to-end: a real ffmpeg-made WAV/MP3 with a "Suno AI" title tag -> tag gone, audio bytes preserved. NOT built (evaluated, deliberate): Resemble PerTh audio detection -- `get_watermark()` returns a raw bit array with no presence/confidence flag, so reliably telling watermarked from clean needs Resemble's fixed payload or a confidence API (neither public; no real sample to calibrate). Same wall as the SynthID pixel detector. AVIF/HEIF meta-box EXIF/XMP stripping also stays a gap (needs exiftool, a non-installed binary). Both documented in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.6.4	2026-05-26 21:39:42 -07:00
test-user	bc3228d387	feat(visible): Doubao text-mark removal + universal region eraser Add deterministic, CPU-only removal of the visible Doubao "豆包AI生成" mark and a position-agnostic region eraser for any other visible watermark/logo. - doubao_engine.py: locate (geometry, scales with width) + polarity-aware white-top-hat glyph mask + cv2 inpaint; coverage-gated detection and a dense-text safety guard. No GPU, ~30ms. - region_eraser.py + `erase` command: inpaint arbitrary --region box(es). Default cv2 backend (no deps); optional big-LaMa via onnxruntime (`lama` extra, Carve/LaMa-ONNX, model downloaded on first use, never bundled). - cli `visible --mark auto\|gemini\|doubao`: auto routes by detector confidence. - tests for both engines; seed previously-unseeded CLI image fixtures to stop the Doubao detector flaking on random corners. - .gitignore: doubao_capture/{seeds,captures} scratch (alpha-map calibration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 21:31:51 -07:00
test-user	9f93d9c0c5	feat(identify): add Sony C2PA device attribution, verified (v0.6.3) Adds Sony to _DEVICE_C2PA_PLATFORM, matching Sony's own `sony.sig` / `sony.cert` C2PA assertion namespace (NOT bare "Sony", which is a common EXIF Make). Verified against a real Sony-signed file (Sony PXW-Z300, signer "Sony Corporation") found in the Security4Media/c2pa-video-player repo. The sample is video (MP4) -- our ISOBMFF C2PA path detects it; Sony Alpha stills likely share the namespace. Verified device set is now Leica, Nikon, Google Pixel, Sony, Truepic. Canon / Samsung / Bria still have no public direct-download C2PA sample to verify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.6.3	2026-05-26 21:13:49 -07:00
test-user	64be9598f2	fix(identify): device-token-first C2PA attribution; add verified Pixel (v0.6.2) Replaces the claim-generator-string match with a distinctive device-token scan of the manifest bytes (_device_platform / _DEVICE_C2PA_PLATFORM), which is more robust: it catches devices where the generator name lives under a non-standard CBOR key (Pixel uses `claim_generator_info`, so it has no `claim_generator`). - Adds Google Pixel, verified against a real Pixel 10 Pro C2PA file (attached to c2pa-rs issue #1609/#1554): cert CN "Pixel Camera", digitalSourceType `computationalCapture` -> capture authenticity, not AI (is_ai stays None). - Token distinctiveness is load-bearing: bare "Truepic" matched the OpenAI chatgpt-1.png fixture (Truepic is a trust-chain signing authority), so the token is the specific "Truepic_Lens"; "Pixel Camera" (cert CN) not "Pixel". - Verified Leica/Nikon/Truepic/Pixel attribute correctly and OpenAI/Adobe/MJ do not regress. Sony/Canon/Samsung/Bria stay unmapped: no public direct- download C2PA sample exists to verify their in-manifest string. - Regression tests: device token beats incidental issuer mentions (Leica, Pixel-vs-Google). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.6.2	2026-05-26 20:43:40 -07:00
test-user	dda2ee7fbb	fix(identify): attribute C2PA by claim_generator, not incidental issuer tokens (v0.6.1) Verified on real signed files that the issuer byte-scan mis-attributes multi-entity manifests: Leica read as "Truepic" (timestamp authority in the chain), Nikon as "Adobe Firefly" (XMP-toolkit "Adobe" + the sample's "Adobe_MAX" name), Truepic as "Google". Platform attribution now prefers the claim generator (what produced the asset) and falls back to the issuer scan. - New _CLAIM_GENERATOR_PLATFORM map + _platform_from_generator; claim generator read for non-PNG via the now-public c2pa.cbor_text_after. - Device tokens listed only where verified against a real C2PA file (Leica lc_c2pa, Nikon, Truepic Lens); Pixel/Samsung/Sony/Canon/Bria deferred until a real sample confirms the in-manifest string. Camera C2PA marks capture authenticity, so these never set is_ai. - cbor_text_after made public (was _cbor_text_after); call sites + tests updated. - Regression test: claim_generator beats incidental Adobe/Google/Truepic tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.6.1	2026-05-26 20:10:07 -07:00
test-user	2676325184	feat(c2pa): expand soft-binding vendor map with registry-verified algs Adds Trufo, Overlai, MarkAny, Mentaport, LumaTrace, VerdaAI, ContentLens, ISCC (io.iscc content code), and Adobe ICN fingerprint to C2PA_SOFT_BINDINGS, and notes AIWatermark wraps Meta PixelSeal. All `alg` prefixes verified against the official c2pa-org/softbinding-algorithm-list registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 18:00:16 -07:00
test-user	c196a16900	feat: detect soft-binding vendors, IPTC 2025.1, video/audio C2PA, TrustMark (v0.6.0) Broadens metadata provenance coverage at the detection and container-strip level. Detection: - C2PA soft-binding `alg` -> forensic-watermark vendor (Adobe TrustMark, Digimarc, Imatag, Steg.AI, Microsoft, ...) via C2PA_SOFT_BINDINGS + soft_binding_vendors_in(); names the watermark vendor even when the watermark itself can't be decoded. - IPTC Photo Metadata 2025.1 AI-disclosure XMP fields (AISystemUsed etc.) via iptc_ai_system() + IPTC_AI_FIELD_MARKERS. - Adobe TrustMark open keyless decoder (trustmark_detector.py, optional extra `trustmark`) -- the watermark behind Adobe Durable Content Credentials. Detects provenance, not AI origin, so it does not assert is_ai. Removal / containers: - isobmff.strip_c2pa_boxes now also drops a top-level XMP uuid box that carries an AI label (matched by AI-marker content, byte-order-robust; plain XMP kept). - remove_ai_metadata routes MP4/MOV/M4V/M4A (and any ftyp-sniffed ISOBMFF) through the box stripper; raises a clear error for non-ISOBMFF audio/video (WebM/MP3/WAV) instead of crashing in the image path. Tests: soft-binding scan, IPTC element/attribute/presence, MP4 + M4A detect/ strip, ISOBMFF XMP surgical strip, content-sniff, unsupported-container guard, TrustMark absent-safety + identify integration. ruff clean; pyright clean on all new modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.6.0	2026-05-26 17:56:48 -07:00
test-user	ba94de8275	feat: strip AI-provenance EXIF tags on removal (v0.5.6) remove_ai_metadata now scrubs AI tags from the JPEG EXIF instead of passing the block through wholesale. Closes the v0.5.5 follow-up: the xAI/Grok Signature + UUID-Artist pair was detected but not removed. - metadata._scrub_ai_exif(): deletes the xAI signature pair and any Software/Make/Artist/ImageDescription tag carrying an AI_GENERATOR_TOKENS token (so Ideogram's Make="Ideogram AI" is scrubbed too), keeping genuine camera/editor EXIF intact. - Shared _is_xai_signature_pair / _exif_text helpers (module-level compiled regexes) are now the single source of truth, used by both xai_signature and _scrub_ai_exif. - Tests: Grok signature stripped on JPEG output, Ideogram Make stripped, real-camera Make ("Apple") preserved. 325 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.5.6	2026-05-26 14:26:20 -07:00
test-user	74618b91a7	feat: detect xAI/Grok EXIF signature; refresh watermarking landscape (v0.5.5) xAI Grok (Aurora) images carry no C2PA/SynthID/IPTC -- their only provenance signal is an EXIF pair: ImageDescription "Signature: <base64>" + a UUID Artist. Verified stable across 3 genuine generations (a real download previously read as unknown / "no AI metadata"). - metadata.xai_signature(): matches the Signature blob + UUID Artist pair; wired into has_ai_metadata, get_ai_metadata, and identify (platform "xAI (Grok / Aurora)"). - data/samples/grok-1.jpg: real Grok fixture (neutral content; the Artist UUID is the public image id, not PII). - Tests: synthetic-fixture unit tests, real-sample assertion, identify integration (322 passing). Docs (research refresh, May 2026): - C2PA 2.4 Durable Content Credentials (soft-binding re-discovery after the embedded manifest is stripped). - New AI-labeling laws, primary-source verified: EU AI Act Art 50 (2026-08-02), South Korea AI Framework Act Art 31(3), California AB 853. - Hedge removal claims: defeating the SynthID verifier is not forensic invisibility (arXiv:2605.09203); cite SynthID-Image (arXiv:2510.09263). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.5.5	2026-05-26 14:14:35 -07:00
test-user	5e29c69e7b	README: lead with hosted raiw.cc CTA, honest free/paid split Move the raiw.cc call-to-action above the sponsor ask and drop the misleading "free web service" framing: visible-watermark and metadata removal are free, invisible removal runs on paid cloud GPUs. Also point no-GPU users to the hosted service from the invisible-removal feature bullet, where the GPU requirement is stated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 12:47:28 -07:00
test-user	8ed4a754ff	Add GitHub Sponsors donation button (FUNDING.yml + README badge) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 16:09:52 -07:00
test-user	03fb460f77	Track the labeled SynthID corpus; complete metadata-source test coverage Corpus images were gitignored (local-only). The negatives were reviewed and cleared for publishing, so the labeled set is now committed (regular git, 65 MB across 25 files) -- making the removal regression set reproducible and CI-able. Corpus: - Track data/synthid_corpus/images/ (pos 9, neg 15, cleaned 1); keep only the synthetic refs/ calibration fills gitignored. - Reconcile manifest.csv to the on-disk files: 117 -> 25 rows (92 dangling rows for removed images pruned; dedup left one cleaned output, f6dd47a5). - Rewrite the corpus README layout/policy (images committed; review every image for private content before adding -- public repo, permanent history). Test fixtures: - Remove data/samples/not-ai-1/2/3 (personal iPhone photos, incl. GPS EXIF). - Add the clean_photo conftest fixture serving a verified-negative image from the corpus neg/ set; repoint the three "non-AI / clean photo" tests onto it (skips if the corpus is absent). Metadata-source coverage (close the last sub-variant gaps): - c2pa digitalSourceType: algorithmicMedia (procedural, not flagged AI) and compositeWithTrainedAlgorithmicMedia (AI + SynthID proxy). - exif_generator: EXIF Artist and ImageDescription fields (Software/Make/XMP CreatorTool were already covered). All 8 metadata-source kinds are now tested at both the unit and identify() level. 313 tests pass. CLAUDE.md updated (corpus tracked, clean_photo fixture). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:46:47 -07:00
test-user	3ebdee57b8	Test the untested pure logic: MPS fallback, tiling, isobmff/c2pa edges Coverage audit (pytest --cov) found real, non-model logic at 0%/low cover. Add unit tests that need no model download: - img2img_runner.py 0% -> 100%: the MPS->CPU fallback orchestration, mocked via injected load_pipeline/reload_on_cpu callables. Guards the production behavior hit this session (native-res SDXL OOMs on MPS, must retry on CPU; non-MPS errors must propagate; "mps"-worded error on a cpu device must not reload). - ctrlregen/tiling.py 0% -> 40%: the pure tile math (tile_positions, make_blend_weight, resize_center_crop) that decides how large images are split and blended. (run_tiled stays model-bound, untested.) - isobmff.py 93% -> 100%: size==0 (box-to-EOF) and truncated 64-bit largesize parsing branches for AVIF/HEIF/JXL C2PA stripping. - c2pa.py: non-PNG-signed .png reads as clean (has_c2pa_metadata / extract_c2pa_chunk) instead of mis-parsing. 309 tests pass (+23). Document in CLAUDE.md that these pure helpers are unit-tested without downloads so future sessions don't skip them as "ML". No src/ change, no release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:21:32 -07:00
test-user	d24d8a4b14	Extract _target_size helper + regression-test native resolution (v0.5.4) The native-vs-downscale decision in InvisibleEngine.remove_watermark (the issue #10/#15 fix: max_resolution=0 must not pre-downscale, since any downscale both loses quality and lets SynthID survive) had no test. Extract it into a pure helper invisible_engine._target_size(w, h, max_resolution) and cover it with tests/test_invisible_engine.py::TestTargetSize so a re-introduced forced downscale fails CI instead of silently regressing #15. Also: - Clamp the short side to >=1 in _target_size: extreme aspect ratios (e.g. 5000x3 with --max-resolution 1024) truncated it to 0 and crashed image.resize(). Pre-existing in the inline math; fixed now that it is a named, tested function. - Consolidate the two duplicated temp-file save blocks into one unconditional save (behavior unchanged: the EXIF-transposed image is still always persisted before WatermarkRemover reloads it by path), and drop the now-redundant `_tmp_path is not None` guard in finally. - Bump version 0.5.3 -> 0.5.4 (pyproject, __init__, uv.lock); document the helper as the regression guard in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.5.4	2026-05-25 14:09:33 -07:00
test-user	28fe13db8f	Document native-res MPS OOM -> CPU-fallback behavior in limitations Concrete data point from the 2026-05-25 gpt-image SDXL run: native 1254x1254 fp32 OOMs at the UNet step (not just VAE) on a 20 GB MPS ceiling, and img2img_runner auto-falls back to CPU and completes (slow, weight-identical, still defeats SynthID). enable_vae_tiling() alone does not prevent it. Fast Mac workarounds: fp16 on MPS or --max-resolution; neither is the default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 13:57:13 -07:00
test-user	59d72c5db7	Record verified gpt-image-2 SynthID-cleaned chain in corpus Add manifest row for the 4ef377bd -> f6dd47a5 chain: a gpt-image-2 sample (openai.com/verify: SynthID + C2PA detected) cleaned via v0.5.3 `all` at native 1254x1254 (prod-equivalent SDXL base, strength 0.05, 50 steps). openai.com/verify reports SynthID NOT detected after the run, re-confirming that the #10 native-resolution default defeats OpenAI SynthID and resolving the #15 root cause (older SD-1.5/768px downscale default did not). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 13:55:10 -07:00
test-user	e27f24f520	test(samples): commit real Doubao fixture + AIGC real-sample test data/samples/doubao-1.png is the real #13 sample: carries the China TC260 <TC260:AIGC> XMP label and a visible '豆包AI生成' text mark (bottom-right). Grounds the AIGC detection on a real file (alongside the synthetic tests) and serves as the fixture for visible-watermark removal work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:37:15 -07:00
test-user	1afc1e60ef	test(samples): add real Doubao TC260 AIGC reference sample 2048x2048 PNG carrying China's TC260 <TC260:AIGC> label; identify reports it as a China AIGC-labeled generator (TC260). Reference fixture for manual re-verification of the TC260 detection path -- the automated tests use synthetic blobs, so nothing depends on this file being present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:36:28 -07:00
test-user	d45f0806a0	chore(release): v0.5.3 — detect China TC260 AIGC label (Doubao) - feat(identify): detect the China TC260 <TC260:AIGC> XMP label (Doubao and other China-served generators); reports platform + ContentProducer. Removal already strips it via the existing metadata cleaner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.5.3	2026-05-25 12:30:40 -07:00
test-user	c7f0d71f90	feat(identify): detect China TC260 AIGC label (Doubao et al.) China-served generators embed an XMP <TC260:AIGC>{"Label":"1",...} block (China's mandatory AI-content labeling, TC260 standard). Doubao (ByteDance) uses it -- verified on the real #13 sample. It's none of C2PA / SynthID / imwatermark / IPTC, so identify() previously returned unknown. - metadata: AIGC_MARKERS + aigc_label() (json-decodes the HTML-entity-encoded block); has_ai_metadata + get_ai_metadata now surface it. - identify: new 'aigc' signal -> is_ai True, platform 'China AIGC-labeled generator (TC260; e.g. Doubao)', carries the ContentProducer code. - Container-agnostic raw-byte scan, so it covers the whole China-AIGC ecosystem (Jimeng/Kling/Qwen/Ernie share the standard). - Tests: synthetic TC260 block (metadata + identify). Docs updated. Addresses #13. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:29:51 -07:00
test-user	768d997ef0	docs: scope SynthID provenance claims to source-verified facts Threat model: replace the unverified deployment list (Gemini 3 Pro / Nano Banana Pro / Imagen 4 / Veo) with the source-verified scope -- SynthID across Imagen / Veo / Lyria plus Gemini app outputs (>10B items by Dec 2025), and attribute the 136-bit payload to the paper's SynthID-O variant. openai-images-2 sample: note the file predates the 19 May 2026 SynthID rollout across ChatGPT / Codex / API, and that openai.com/verify is now the public oracle (still no local decoder). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 12:18:13 -07:00

1 2 3

120 Commits