Commit Graph

80 Commits

Author SHA1 Message Date
test-user 03fb460f77 Track the labeled SynthID corpus; complete metadata-source test coverage
Corpus images were gitignored (local-only). The negatives were reviewed and
cleared for publishing, so the labeled set is now committed (regular git, 65 MB
across 25 files) -- making the removal regression set reproducible and CI-able.

Corpus:
- Track data/synthid_corpus/images/ (pos 9, neg 15, cleaned 1); keep only the
  synthetic refs/ calibration fills gitignored.
- Reconcile manifest.csv to the on-disk files: 117 -> 25 rows (92 dangling rows
  for removed images pruned; dedup left one cleaned output, f6dd47a5).
- Rewrite the corpus README layout/policy (images committed; review every image
  for private content before adding -- public repo, permanent history).

Test fixtures:
- Remove data/samples/not-ai-1/2/3 (personal iPhone photos, incl. GPS EXIF).
- Add the clean_photo conftest fixture serving a verified-negative image from
  the corpus neg/ set; repoint the three "non-AI / clean photo" tests onto it
  (skips if the corpus is absent).

Metadata-source coverage (close the last sub-variant gaps):
- c2pa digitalSourceType: algorithmicMedia (procedural, not flagged AI) and
  compositeWithTrainedAlgorithmicMedia (AI + SynthID proxy).
- exif_generator: EXIF Artist and ImageDescription fields (Software/Make/XMP
  CreatorTool were already covered).

All 8 metadata-source kinds are now tested at both the unit and identify()
level. 313 tests pass. CLAUDE.md updated (corpus tracked, clean_photo fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 14:46:47 -07:00
test-user 3ebdee57b8 Test the untested pure logic: MPS fallback, tiling, isobmff/c2pa edges
Coverage audit (pytest --cov) found real, non-model logic at 0%/low cover.
Add unit tests that need no model download:

- img2img_runner.py 0% -> 100%: the MPS->CPU fallback orchestration, mocked
  via injected load_pipeline/reload_on_cpu callables. Guards the production
  behavior hit this session (native-res SDXL OOMs on MPS, must retry on CPU;
  non-MPS errors must propagate; "mps"-worded error on a cpu device must not
  reload).
- ctrlregen/tiling.py 0% -> 40%: the pure tile math (tile_positions,
  make_blend_weight, resize_center_crop) that decides how large images are
  split and blended. (run_tiled stays model-bound, untested.)
- isobmff.py 93% -> 100%: size==0 (box-to-EOF) and truncated 64-bit largesize
  parsing branches for AVIF/HEIF/JXL C2PA stripping.
- c2pa.py: non-PNG-signed .png reads as clean (has_c2pa_metadata /
  extract_c2pa_chunk) instead of mis-parsing.

309 tests pass (+23). Document in CLAUDE.md that these pure helpers are
unit-tested without downloads so future sessions don't skip them as "ML".
No src/ change, no release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 14:21:32 -07:00
test-user d24d8a4b14 Extract _target_size helper + regression-test native resolution (v0.5.4)
The native-vs-downscale decision in InvisibleEngine.remove_watermark (the
issue #10/#15 fix: max_resolution=0 must not pre-downscale, since any
downscale both loses quality and lets SynthID survive) had no test. Extract
it into a pure helper invisible_engine._target_size(w, h, max_resolution)
and cover it with tests/test_invisible_engine.py::TestTargetSize so a
re-introduced forced downscale fails CI instead of silently regressing #15.

Also:
- Clamp the short side to >=1 in _target_size: extreme aspect ratios (e.g.
  5000x3 with --max-resolution 1024) truncated it to 0 and crashed
  image.resize(). Pre-existing in the inline math; fixed now that it is a
  named, tested function.
- Consolidate the two duplicated temp-file save blocks into one
  unconditional save (behavior unchanged: the EXIF-transposed image is
  still always persisted before WatermarkRemover reloads it by path), and
  drop the now-redundant `_tmp_path is not None` guard in finally.
- Bump version 0.5.3 -> 0.5.4 (pyproject, __init__, uv.lock); document the
  helper as the regression guard in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.5.4
2026-05-25 14:09:33 -07:00
test-user 28fe13db8f Document native-res MPS OOM -> CPU-fallback behavior in limitations
Concrete data point from the 2026-05-25 gpt-image SDXL run: native
1254x1254 fp32 OOMs at the UNet step (not just VAE) on a 20 GB MPS
ceiling, and img2img_runner auto-falls back to CPU and completes
(slow, weight-identical, still defeats SynthID). enable_vae_tiling()
alone does not prevent it. Fast Mac workarounds: fp16 on MPS or
--max-resolution; neither is the default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 13:57:13 -07:00
test-user 59d72c5db7 Record verified gpt-image-2 SynthID-cleaned chain in corpus
Add manifest row for the 4ef377bd -> f6dd47a5 chain: a gpt-image-2 sample
(openai.com/verify: SynthID + C2PA detected) cleaned via v0.5.3 `all` at
native 1254x1254 (prod-equivalent SDXL base, strength 0.05, 50 steps).
openai.com/verify reports SynthID NOT detected after the run, re-confirming
that the #10 native-resolution default defeats OpenAI SynthID and resolving
the #15 root cause (older SD-1.5/768px downscale default did not).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 13:55:10 -07:00
test-user e27f24f520 test(samples): commit real Doubao fixture + AIGC real-sample test
data/samples/doubao-1.png is the real #13 sample: carries the China TC260
<TC260:AIGC> XMP label and a visible '豆包AI生成' text mark (bottom-right).
Grounds the AIGC detection on a real file (alongside the synthetic tests)
and serves as the fixture for visible-watermark removal work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:37:15 -07:00
test-user 1afc1e60ef test(samples): add real Doubao TC260 AIGC reference sample
2048x2048 PNG carrying China's TC260 <TC260:AIGC> label; identify reports
it as a China AIGC-labeled generator (TC260). Reference fixture for manual
re-verification of the TC260 detection path -- the automated tests use
synthetic blobs, so nothing depends on this file being present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:36:28 -07:00
test-user d45f0806a0 chore(release): v0.5.3 — detect China TC260 AIGC label (Doubao)
- feat(identify): detect the China TC260 <TC260:AIGC> XMP label (Doubao
  and other China-served generators); reports platform + ContentProducer.
  Removal already strips it via the existing metadata cleaner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.5.3
2026-05-25 12:30:40 -07:00
test-user c7f0d71f90 feat(identify): detect China TC260 AIGC label (Doubao et al.)
China-served generators embed an XMP <TC260:AIGC>{"Label":"1",...} block
(China's mandatory AI-content labeling, TC260 standard). Doubao (ByteDance)
uses it -- verified on the real #13 sample. It's none of C2PA / SynthID /
imwatermark / IPTC, so identify() previously returned unknown.

- metadata: AIGC_MARKERS + aigc_label() (json-decodes the HTML-entity-encoded
  block); has_ai_metadata + get_ai_metadata now surface it.
- identify: new 'aigc' signal -> is_ai True, platform 'China AIGC-labeled
  generator (TC260; e.g. Doubao)', carries the ContentProducer code.
- Container-agnostic raw-byte scan, so it covers the whole China-AIGC ecosystem
  (Jimeng/Kling/Qwen/Ernie share the standard).
- Tests: synthetic TC260 block (metadata + identify). Docs updated.

Addresses #13.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:29:51 -07:00
test-user 768d997ef0 docs: scope SynthID provenance claims to source-verified facts
Threat model: replace the unverified deployment list (Gemini 3 Pro /
Nano Banana Pro / Imagen 4 / Veo) with the source-verified scope -- SynthID
across Imagen / Veo / Lyria plus Gemini app outputs (>10B items by Dec 2025),
and attribute the 136-bit payload to the paper's SynthID-O variant.

openai-images-2 sample: note the file predates the 19 May 2026 SynthID
rollout across ChatGPT / Codex / API, and that openai.com/verify is now the
public oracle (still no local decoder).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:18:13 -07:00
test-user 37769453a9 chore(release): v0.5.2 — native-resolution invisible removal (fixes #10)
- fix(invisible): process at native resolution by default; the forced
  downscale-to-1024 -> upscale-back round-trip was the main quality loss
  (#10). Matches the raiw.cc backend (fal fast-sdxl = sdxl-base-1.0).
  New --max-resolution opt-in cap for GPU/MPS memory.
- docs: verified fal checkpoint, native-res, gpt-image-2 SynthID.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.5.2
2026-05-25 10:00:25 -07:00
test-user f3ecea348a docs: gpt-image-2 carries SynthID (no local decoder; openai.com/verify oracle)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:59:16 -07:00
test-user fb42295b3a docs: record verified fal fast-sdxl checkpoint + native-resolution updates
- fal's llms.txt confirms fast-sdxl is stabilityai/stable-diffusion-xl-base-1.0,
  the exact checkpoint the local CLI defaults to -> local == prod weights.
  Recorded in CLAUDE.md and README.
- README How it works + sample README: replace the old downscale->upscale
  description with native-resolution processing (matches the #10 fix);
  document --max-resolution as an opt-in OOM cap.
- README roadmap: idna already bumped (uv-secure clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:57:03 -07:00
test-user 18740969ae fix(invisible): process at native resolution by default
The invisible pipeline force-downscaled inputs >1024px to 1024 before
diffusion, then upscaled the result back -- a lossy round-trip that was
the main cause of the quality loss reported in #10. The hosted raiw.cc
backend (fal fast-sdxl) does no pre-downscale, and at strength ~0.05
SDXL img2img doesn't need it.

Default is now native resolution (max_resolution=0). New --max-resolution
flag (invisible / all / batch) re-introduces an opt-in long-side cap only
to bound GPU/MPS memory on very large inputs.

Addresses #10. End-to-end quality/removal not re-verified locally (no GPU
here); matches raiw-app's proven production config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:50:06 -07:00
test-user 93c664f7fb docs: sync README + corpus map with v0.5.x detection coverage
- README Features: add the identify / provenance-detection capability.
- README Supported models: add FLUX, Stability AI, Microsoft/Bing
  (MAI-Image), Meta AI rows; note SD/SDXL/FLUX imwatermark is locally
  detectable; add a detection note pointing at identify.
- corpus README per-platform map: add Stability / Ideogram / Recraft /
  Krea-FLUX rows + an imwatermark column; correct Bing (MAI-Image,
  signs 'Microsoft'); note imwatermark fires only on pristine pipeline
  output, not re-hosts/exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:32:03 -07:00
test-user 20a754ef98 chore(release): v0.5.1 — security + bug fixes
- security: bump idna 3.11 -> 3.16 (GHSA-65pc-fj4g-8rjx)
- fix(ctrlregen): correct module import paths (#11, @neosun100)
- fix(cli): preserve alpha channel through visible/all/batch (#8, @rlorenzo)
- fix(cli): safer re-exec via -m instead of repr(sys.argv) -c string (#9, @eskibars)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.5.1
2026-05-25 09:21:53 -07:00
test-user e60f183d29 style(ctrlregen): sort imports (follow-up to #11)
#11 left the import block un-sorted (ruff I001); reorder so diffusers
precedes the local ctrlregen import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:21:53 -07:00
Rex Lorenzo d091b9f822 fix(cli): preserve alpha channel in visible-watermark pipeline
`cv2.imread(..., IMREAD_COLOR)` was silently stripping the alpha channel
on RGBA inputs, and `cv2.imwrite` then wrote opaque 3-channel PNGs — so
images with transparent backgrounds came back with an opaque-black (or
white) background and the sparkle area baked in as a solid blob.

Read the source with `IMREAD_UNCHANGED`, keep the alpha plane out of the
detection/inpaint path (those still operate on BGR), and rejoin alpha at
save time. The detected watermark bbox is also zeroed in the alpha plane
so the sparkle region becomes transparent rather than an opaque artifact.

Applies to `visible`, `all`, and `batch` modes. RGB-only inputs and JPEG
outputs are unaffected.
2026-05-25 09:18:39 -07:00
test-user e8d698814a fix(cli): re-exec via -m instead of a repr(sys.argv) -c string
Based on #9 by @eskibars. Replaces the os.execl(..., "-c", repr-string)
restart (used after the CUDA-torch auto-install) with os.execv -m, so we
no longer build an exec string from repr(sys.argv). Forwards sys.argv[1:]
only: under -m Python sets argv[0] to the module path, so passing the full
argv would re-inject the program name as a spurious Click argument.

Verified: python -m remove_ai_watermarks.cli --version works; test_cli green.

Closes #9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:15:55 -07:00
Neo 孫 663c8c64ca fix(ctrlregen): correct module import paths (#11)
The CtrlRegen engine module references a non-existent top-level package
'ctrlregen' in four locations:

  src/remove_ai_watermarks/noai/ctrlregen/engine.py
    L39:  from ctrlregen.pipeline import CustomCtrlRegenPipeline
    L57:  from ctrlregen.color import color_match
    L242: from ctrlregen.tiling import resize_center_crop, run_tiled
    L267: from ctrlregen.tiling import resize_center_crop

These should be absolute imports of the package's own subpackage. As a
result, the top-level try/except sets _HAS_DIFFUSERS=False and
_HAS_COLOR_MATCHER=False even when the [gpu] extra is correctly
installed, and is_ctrlregen_available() always returns False.

Effect on users: invoking the ctrlregen profile crashes with

  ImportError: Failed to auto-install missing dependencies:
  controlnet-aux, color-matcher, safetensors

regardless of whether those packages are installed. The auto-install
fallback also fails in uv-managed venvs (uv does not ship pip in the
venv by default), so the error path is unrecoverable.

Reproduction (before fix):
  uv sync --all-extras
  uv run remove-ai-watermarks invisible <image> --pipeline ctrlregen
  # → ImportError as above

Fix: change the four imports to use the package-qualified path
(matching the absolute-import style used elsewhere in the codebase,
e.g. watermark_remover.py).

Verified post-fix on Linux/CUDA (NVIDIA L40S):
  - is_ctrlregen_available() returns True
  - CtrlRegen pipeline loads, downloads weights, and runs end-to-end
  - Tile-based path (image > 512px) processes 6 tiles cleanly
  - 142 existing pytest tests still pass
2026-05-25 09:11:20 -07:00
test-user b45e2a5731 chore(deps): bump idna 3.11 -> 3.16 (GHSA-65pc-fj4g-8rjx)
Fixes the uv-secure abort that stopped maintain.sh: idna 3.11 had
GHSA-65pc-fj4g-8rjx (fix in 3.15). uv lock --upgrade-package idna pulls
3.16; uv-secure now reports no vulnerabilities. Lock-only change, 266
tests still pass. Updates the stale CLAUDE.md note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:04:24 -07:00
test-user 0762807e42 chore(release): v0.5.0 — image provenance (identify) + SD/SDXL/FLUX + EXIF/XMP detection
New since v0.4.1:
- identify command: aggregate C2PA, IPTC, SD/ComfyUI params, SynthID
  proxy, visible sparkle, open invisible watermark into one provenance
  verdict (--json, --no-visible).
- Open SD/SDXL/FLUX invisible-watermark detection (imwatermark, extra: detect).
- EXIF Software / XMP CreatorTool generator-tag reading (incl. AVIF/HEIF).
- Stability AI + Microsoft/Bing C2PA issuers; SynthID metadata detection.
- SynthID reference corpus + experimental pixel-carrier probe.
- Fix: __version__ was stuck at 0.3.4 (banner mismatch), now synced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.5.0
2026-05-25 08:58:45 -07:00
test-user 626f43aec9 docs: correct SynthID spectral-carrier understanding
Deeper re-examination (2026-05-25) of github.com/aloshdenny/reverse-SynthID
on our own data corrects the earlier over-stated dead-end:

- The carrier IS real on solid fills -- measured via per-bin PHASE
  COHERENCE (the prior probe used spatial/FFT-magnitude NCC, which can't
  see a fixed-phase carrier). White gemini-2.5-flash fills: coherence 0.86
  at carriers (0,+/-7..12,20..23) vs 0.31 random; single-image phase-match
  +0.83 vs -0.24 for real photos.
- But it does not generalize: carriers are model-version/resolution/color
  specific (v4 codebook for 3.1-flash/nb-pro scores ~0.5 on 2.5-flash),
  and collapse on real content (coherence ~random; v4 content 0.518 vs
  neg 0.504, no separation).

Net: a controlled-fill characterizer, not a real-content detector.
Metadata proxy + visible sparkle + online oracles remain the ceiling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 08:57:35 -07:00
test-user ede35a3db5 feat(metadata): read EXIF Make tag; collect Ideogram/Recraft/Krea-FLUX
Collected live samples from three popular generators we lacked:

- Ideogram tags its downloads with EXIF Make="Ideogram AI" (no C2PA, no
  SynthID, no imwatermark) -- the Make tag is its only signal. exif_generator
  only read Software/Artist/ImageDescription, so it missed this; now reads
  Make too. Real cameras put "Apple"/"Canon" in Make (no AI token), so this
  stays low-false-positive. 4 originals ingested.
- Recraft (PNG export) and Krea hosting FLUX 2: downloads carry NO detectable
  signal -- no C2PA/EXIF/IPTC, and notably no imwatermark despite Krea running
  FLUX. identify correctly reports 'unknown'. Both ingested as neg fixtures.

Lesson recorded in CLAUDE.md: the imwatermark detector fires only on pristine
output from a pipeline that runs the encoder (diffusers default, official BFL),
not from re-hosts (Krea/Stability) or re-encoded exports (Recraft/Canva).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 18:38:56 -07:00
test-user ad3b8ee248 feat(identify): read EXIF Software / XMP CreatorTool generator tags
Closes the documented gap where EXIF/XMP fields inside AVIF/HEIF/JXL went
unparsed. metadata.exif_generator extracts the EXIF Software/Artist tag
(via PIL+piexif, which opens AVIF natively) and the XMP CreatorTool (via a
container-agnostic raw-byte scan that also covers HEIF/JXL that PIL can't
open), and matches against AI_GENERATOR_TOKENS so only generator names
(Firefly, DALL-E, Midjourney, ComfyUI, ...) fire -- a plain 'Adobe
Photoshop' or 'GIMP' tag is not flagged.

identify() surfaces it as a high-confidence signal and uses it for
platform attribution when no C2PA names a platform, so an AVIF/HEIF whose
only AI signal is an EXIF/XMP generator tag is now caught.

Validated with synthesized fixtures (the 'no positive fixtures' blocker
was self-imposed): real AVIF and JPEG written with EXIF Software via PIL,
plus an XMP CreatorTool raw-scan fixture. Zero false positives across the
109-image corpus (real iPhone photos carry no AI generator token).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 17:56:39 -07:00
test-user 3a1c5427c8 feat(c2pa): recognize Stability AI issuer; fix Microsoft platform label
Collected live C2PA positives from Bing Image Creator and Stability Brand
Studio (DreamStudio successor) and learned two things our scan got wrong:

- Bing now runs Microsoft's own MAI-Image model, not DALL-E, and signs
  C2PA as 'Microsoft'. The scan caught it, but the platform label claimed
  'Microsoft Designer (DALL-E / OpenAI backend)'. Relabeled model-neutral:
  'Microsoft (Bing Image Creator / Designer)'.
- Stability signs C2PA as 'Stability AI' (cert 'Stability AI Ltd'), which
  was not in C2PA_ISSUERS, so it read as 'unknown signer'. Added the issuer
  and a platform mapping. Stability uses no SynthID and (on its current
  Stable Image model) no imwatermark watermark -- verified, both negative.

Both ingested as SynthID-negative corpus fixtures (they are AI but not
SynthID) for issuer-coverage. Canva skipped: its downloads are re-encoded
design exports that strip C2PA, so a Canva sample would be inconclusive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 17:12:42 -07:00
test-user 27ad5b7645 feat(identify): detect open SD/SDXL/FLUX invisible watermark
Research found one locally-fillable detection gap: Stable Diffusion, SDXL,
and FLUX all embed an open DWT-DCT watermark via the invisible-watermark
(imwatermark) library -- a PUBLIC decoder, no secret key, unlike SynthID.
New invisible_watermark.py decodes the known fixed patterns (verified
against upstream source: diffusers SDXL WATERMARK_MESSAGE, FLUX.2
src/flux2/watermark.py, and the 'StableDiffusionV1' default string) and
identify() reports the scheme as a high-confidence signal.

Verified locally end-to-end: embedding SDXL's exact 48-bit message and
decoding it back recovers 48/48 bits; a clean image and our own fal-SDXL
outputs decode to ~21/48 (no match). Caveat baked into the report: the
watermark is fragile -- gone after JPEG q90 -- so it confirms origin only
on pristine files; absence is never proof.

imwatermark is an optional dep (extra 'detect'; pulls non-headless opencv),
so the import is guarded and the signal is skipped when absent. CLI
--no-visible now means metadata-only (skips both pixel-domain detectors).

Also records the broader watermarking landscape in CLAUDE.md: which
services are locally detectable (SD/SDXL/FLUX), C2PA-covered (Bing/Canva/
Getty/Shutterstock unsampled), or proprietary-only like SynthID (Amazon
Titan/Nova, Kakao). Midjourney embeds neither C2PA nor an invisible mark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:53:59 -07:00
test-user 7dcc922617 feat(probe): solid-fill SynthID carrier probe; corpus reconfirms no pixel detector
scripts/synthid_pixel_probe.py is an experimental/diagnostic tool for the
one pixel-domain question that isn't a dead-end: on solid-color fills the
zero-mean residual IS essentially the watermark carrier. Two modes:
'consistency' (mean pairwise NCC of carriers across fills vs random
baseline) and 'removal' (does the pipeline drop the carrier toward
baseline?). Logic validated synthetically (injected carrier correlates,
random noise doesn't, simulated removal collapses it) -- no real fills or
GPU needed.

Running its metric on the corpus independently re-confirms the documented
dead-end for real content: at matched resolution SynthID positives do not
cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg
>= pos-vs-pos). An apparent 0.62 among 1254px ChatGPT positives turned out
to be near-duplicate content (5 renders of one prompt at ~0.92; a distinct
ChatGPT image scored ~0 against them), not a shared carrier. The probe is
solid-fills-only; do not use on real content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:35:39 -07:00
test-user 144b98cf0b docs: record external AI-detector models as out of scope
Generic HuggingFace AI-vs-real classifiers are per-generator, degrade
off-distribution, are untested on the metadata-stripped surfaces we
care about (gpt-image, Gemini Nano Banana), and our own SDXL pass would
likely defeat them as it does SynthID. Detection stays local +
signal-based. Decision 2026-05-24.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:27:00 -07:00
test-user 1a9f3e4fe5 test(identify): cover provenance branches, CLI, sparkle helper
Adds 20 tests around the new provenance path:

- identify(): local SD/ComfyUI params -> local-pipeline attribution;
  visible-sparkle gating at the 0.5 threshold (mocked detector: above,
  below, unavailable, opt-out); metadata verdict not downgraded by a
  sparkle hit; OpenAI/SynthID caveats + dedup; ProvenanceReport is
  JSON-serializable (the CLI --json path); and the honest edge where a
  C2PA manifest without an AI source marker stays 'unknown'.
- CLI 'identify': help, clean PNG, AI PNG platform, valid --json,
  missing file.
- gemini_engine.detect_sparkle_confidence: float in range for a real
  image, None for an unreadable file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:27:00 -07:00
test-user fa104bcade feat(identify): provenance command (platform + watermark inventory)
New 'identify' command and identify.py module: upload an image, get one
ProvenanceReport answering where it was made and what watermarks it
carries. Aggregates every locally-readable signal:

- C2PA Content Credentials -> generating platform (issuer + generator).
- IPTC digitalSourceType 'Made with AI' (Meta and others).
- Embedded SD/ComfyUI generation parameters (local pipelines).
- SynthID metadata proxy (Google / OpenAI C2PA companion).
- Visible Gemini sparkle (cv2 fallback for the stripped-metadata case),
  promoted only at confidence >= 0.5 (corpus-tuned: Gemini sparkles
  score >= 0.56, non-sparkle <= 0.49).

is_ai_generated is True or None, never asserted False -- stripped
metadata leaves no local proof of a clean origin, so absence of signals
is reported as 'unknown' with an explicit caveat. The SynthID *pixel*
watermark remains locally undecodable; the report says so.

Non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) get the same issuer +
generator attribution via a binary scan (the caBX parser is PNG-only).
The cv2 dependency is isolated in gemini_engine.detect_sparkle_confidence
so identify.py stays type-clean. CLI supports --json and --no-visible.

Validated against the 109-image corpus: 14/14 positives flagged AI,
93/94 negatives clean (the one 'neg' flagged is a Meta image that
genuinely carries the IPTC tag -- correct), zero true errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:19:26 -07:00
test-user f36320ff39 fix(metadata): guard get_ai_metadata PIL open against non-OSError
get_ai_metadata opened the file with PIL unguarded, so a HEIC (or any
format PIL can't open without optional plugins) raised
UnidentifiedImageError instead of falling through to the binary scan --
unlike has_ai_metadata, which already guards. Wrap the open in
except Exception and continue to the C2PA/IPTC path. Regression test
feeds an unopenable .heic shell and asserts no raise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:19:15 -07:00
test-user af787fd8d6 docs(corpus): per-platform watermark map + surface-dependent blind spot
Grow the SynthID corpus to 109 originals (91 iPhone-photo negatives,
2 positives) and document what was learned studying 8 platforms:

- README: per-platform watermark map (C2PA issuer / SynthID pixel / IPTC
  / visible sparkle per platform) and an "originals, not previews" note
  (re-encoded previews strip metadata, so a clean preview is not proof).
- CLAUDE.md: surface-dependent blind spot -- the same Google model wraps
  C2PA in the Gemini app but emits the SynthID pixel watermark + sparkle
  with no C2PA/IPTC via the API/playground (AI Studio, Nano Banana), so
  synthid_source returns None despite SynthID being present; only the
  pixel oracle or the visible-sparkle detector catches those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:55:17 -07:00
test-user be853011f3 feat(corpus): read HEIC dimensions via macOS sips
PIL cannot open iPhone HEIC without pillow-heif, so width/height stayed
0 for those negatives. Fall back to sips -g pixelWidth/pixelHeight on
macOS when PIL fails; returns (0,0) elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:55:11 -07:00
test-user 6cef1d59f0 fix(c2pa): drop non-printable claim_generator garbage
On some manifests (observed: Microsoft Designer) the first CBOR "name"
key precedes a binary hash field, not the generator string, so
_cbor_text_after returns control-char garbage. Guard with isprintable()
to drop it; issuer detection (byte-search) and the SynthID verdict are
unaffected. Adds TestParseChunkGuards covering kept-vs-dropped cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:55:07 -07:00
test-user da0edcbddc chore(corpus): grow SynthID reference set + document autonomous Chrome collection
Adds content positives (OpenAI gpt-image: forest, fisherman, tokyo; Google
gemini: fisherman, mug) and SDXL/non-SynthID negatives to the local corpus
manifest. Now spans 4 resolutions across 2 vendors (was solid-black only).

README: documents driving generation via Chrome MCP -- Gemini single-click
download; ChatGPT via in-page fetch+blob (preserves original C2PA bytes,
unlike the flaky UI download / a canvas re-encode).

Images stay gitignored; only the manifest (sha256 + labels + extracted
metadata) and protocol are tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 12:46:46 -07:00
test-user c006f9b8b4 docs(roadmap): record next steps for SynthID detector work
Captures the forward plan so a future session picks it up: local pixel
detector is blocked pending a generation API or raw watermarked dataset
(spectral methods shown insufficient); grow the oracle-labeled corpus;
replace synthetic non-PNG C2PA fixtures with real ones; and the maintenance
debt (idna bump, strict-pyright cleanup) needed for a green maintain.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 11:34:43 -07:00
test-user f07ce10c72 feat(metadata): SynthID-source detection, C2PA parser consolidation, corpus + tests
Detect SynthID-bearing images via their C2PA companion: a manifest signed by a
SynthID-using vendor (Google/OpenAI) on AI-generated content implies an
invisible SynthID pixel watermark. Verified end-to-end against the vendor
oracles (openai.com/verify, Gemini "Verify with SynthID").

- metadata: synthid_source() + synthid_watermark verdict in get_ai_metadata,
  surfaced as a `metadata --check` callout. Format-agnostic (PNG caBX parser +
  JPEG/WebP/AVIF/HEIF/JXL binary scan).
- constants: SYNTHID_C2PA_ISSUERS {Google, OpenAI}; +opened/placed actions.
- c2pa: single CBOR-aware parser (_cbor_text_after) replaces glitchy regex
  (fixes fGPT-4o claim_generator); removed duplicate _scan_png_c2pa_chunk from
  metadata; shared synthid_verdict / synthid_vendors_in helpers.
- corpus: scripts/synthid_corpus.py ingest tool + data/synthid_corpus/
  (manifest tracked, images gitignored) for a labeled reference set.
- tests: +38 across C2PA parser internals, extract/inject round-trip, ISOBMFF
  container stripping, all IPTC AI markers, and invisible watermark strength
  tiers (SynthID/StableSignature/TreeRing/StegaStamp/RingID/RivaGAN/...).

Pixel-level SynthID detection remains out of reach locally (Google's decoder is
proprietary); a from-scratch spectral pilot confirmed it does not separate real
content. See CLAUDE.md for the full evaluation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 11:32:46 -07:00
test-user c1ff4e1cd9 CLAUDE.md: document maintain.sh in Test and lint section
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:02:06 -07:00
test-user 95606ddd5d docs: SynthID v2 defeat by SDXL pipeline now verified end-to-end locally
Local SDXL run on a Gemini 3 Pro output (snowboard scene, 2816x1536), seed 42,
strength 0.05, steps 50, ~10 min on MPS. Gemini app's "Verify with SynthID"
returned "no SynthID watermark detected" on the cleaned file. This closes the
verification gap noted in v0.4.0 release notes and confirms architectural
equivalence to the raiw-app production fal-ai/fast-sdxl path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:12:56 -07:00
test-user 578e229713 style(cli): fix closing paren indentation in cmd_batch
Whitespace-only ruff format alignment, no functional change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:58:35 -07:00
test-user 0e04c4d388 chore(release): v0.4.1 — security fix (diffusers, urllib3)
- Bump diffusers minimum to 0.38.0 (closes GHSA-98h9-4798-4q5v).
- Refresh uv.lock to pull urllib3 2.7.0 (closes GHSA-qccp-gfcp-xxvc and
  GHSA-mf9v-mfxr-j63j via transitive update from requests / huggingface-hub).
- Allow pre-releases globally (`[tool.uv] prerelease = "allow"`) because
  diffusers 0.38.0 declares a dependency on safetensors>=0.8.0rc0. Drop
  once safetensors 0.8.0 stable is published or diffusers re-pins.

uv-secure --ignore-unfixed now reports zero vulnerabilities.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.4.1
2026-05-17 13:14:50 -07:00
test-user 1c1ebec148 chore(release): v0.4.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.4.0
2026-05-17 12:57:13 -07:00
test-user f2fc5e09ab feat: SDXL default; AVIF/HEIF/JPEG-XL C2PA stripping
SD-1.5 dreamshaper at 768 px did not defeat SynthID v2 on Gemini 3 Pro
outputs (verified May 2026 via Gemini app's "Verify with SynthID"). Switch
the default invisible engine to SDXL at 1024 px, matching the raiw-app
production config (strength 0.05, steps 50). Drop the SD-1.5 pipeline.

Metadata layer: add C2PA UUID and IPTC AI marker byte-scan detection
across all formats, plus an ISOBMFF box walker (noai/isobmff.py) that
strips top-level C2PA uuid and JUMBF jumb boxes from AVIF/HEIF/JPEG-XL
containers without re-encoding.

README gets a Legal table and a Threat-model section about SynthID v2's
136-bit payload. CLAUDE.md tracks the SD-1.5 regression as historical
context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:54:37 -07:00
dependabot[bot] 89b7633e5c chore(deps): bump the minor-and-patch group with 4 updates (#4)
Bumps the minor-and-patch group with 4 updates: [transformers](https://github.com/huggingface/transformers), [ultralytics](https://github.com/ultralytics/ultralytics), [ruff](https://github.com/astral-sh/ruff) and [pyright](https://github.com/RobertCraigie/pyright-python).


Updates `transformers` from 5.6.0 to 5.7.0
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v5.6.0...v5.7.0)

Updates `ultralytics` from 8.4.41 to 8.4.43
- [Release notes](https://github.com/ultralytics/ultralytics/releases)
- [Commits](https://github.com/ultralytics/ultralytics/compare/v8.4.41...v8.4.43)

Updates `ruff` from 0.15.11 to 0.15.12
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.15.11...0.15.12)

Updates `pyright` from 1.1.408 to 1.1.409
- [Release notes](https://github.com/RobertCraigie/pyright-python/releases)
- [Commits](https://github.com/RobertCraigie/pyright-python/compare/v1.1.408...v1.1.409)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 5.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: ultralytics
  dependency-version: 8.4.43
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: ruff
  dependency-version: 0.15.12
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: pyright
  dependency-version: 1.1.409
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-10 10:25:39 -07:00
dependabot[bot] bd2852af0e chore(deps): bump rich from 14.3.3 to 15.0.0 (#5)
Bumps [rich](https://github.com/Textualize/rich) from 14.3.3 to 15.0.0.
- [Release notes](https://github.com/Textualize/rich/releases)
- [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Textualize/rich/compare/v14.3.3...v15.0.0)

---
updated-dependencies:
- dependency-name: rich
  dependency-version: 15.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-10 10:25:36 -07:00
test-user eb1f65ae45 fix(gemini): no-op remove_watermark when nothing detected
Reverse alpha blending applied at the assumed default position painted
a visible inverse-sparkle artifact onto clean or edited images. The
function now returns an unmodified copy when detection fails, instead
of falling back to the hardcoded Gemini corner. Bump to 0.3.5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.3.5
2026-04-26 12:17:25 -07:00
Victor Kuznetsov 51105db3b1 Merge pull request #1 from wiltodelta/dependabot/github_actions/actions-51f4226e04
chore(deps): bump the actions group with 2 updates
2026-04-23 09:58:00 -07:00
Victor Kuznetsov f8dd69c601 Merge pull request #2 from wiltodelta/dependabot/uv/minor-and-patch-20ed929fe3
chore(deps): bump the minor-and-patch group with 5 updates
2026-04-23 09:57:47 -07:00
dependabot[bot] a2601544d9 chore(deps): bump the minor-and-patch group with 5 updates
Bumps the minor-and-patch group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [click](https://github.com/pallets/click) | `8.3.1` | `8.3.2` |
| [transformers](https://github.com/huggingface/transformers) | `5.4.0` | `5.5.0` |
| [ultralytics](https://github.com/ultralytics/ultralytics) | `8.4.33` | `8.4.35` |
| [pytest](https://github.com/pytest-dev/pytest) | `9.0.2` | `9.0.3` |
| [ruff](https://github.com/astral-sh/ruff) | `0.15.8` | `0.15.9` |


Updates `click` from 8.3.1 to 8.3.2
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/click/compare/8.3.1...8.3.2)

Updates `transformers` from 5.4.0 to 5.5.0
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v5.4.0...v5.5.0)

Updates `ultralytics` from 8.4.33 to 8.4.35
- [Release notes](https://github.com/ultralytics/ultralytics/releases)
- [Commits](https://github.com/ultralytics/ultralytics/compare/v8.4.33...v8.4.35)

Updates `pytest` from 9.0.2 to 9.0.3
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3)

Updates `ruff` from 0.15.8 to 0.15.9
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.15.8...0.15.9)

---
updated-dependencies:
- dependency-name: click
  dependency-version: 8.3.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: transformers
  dependency-version: 5.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-and-patch
- dependency-name: ultralytics
  dependency-version: 8.4.35
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: pytest
  dependency-version: 9.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: ruff
  dependency-version: 0.15.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-23 00:25:44 +00:00