Commit Graph

14 Commits

Author SHA1 Message Date
test-user 03fb460f77 Track the labeled SynthID corpus; complete metadata-source test coverage
Corpus images were gitignored (local-only). The negatives were reviewed and
cleared for publishing, so the labeled set is now committed (regular git, 65 MB
across 25 files) -- making the removal regression set reproducible and CI-able.

Corpus:
- Track data/synthid_corpus/images/ (pos 9, neg 15, cleaned 1); keep only the
  synthetic refs/ calibration fills gitignored.
- Reconcile manifest.csv to the on-disk files: 117 -> 25 rows (92 dangling rows
  for removed images pruned; dedup left one cleaned output, f6dd47a5).
- Rewrite the corpus README layout/policy (images committed; review every image
  for private content before adding -- public repo, permanent history).

Test fixtures:
- Remove data/samples/not-ai-1/2/3 (personal iPhone photos, incl. GPS EXIF).
- Add the clean_photo conftest fixture serving a verified-negative image from
  the corpus neg/ set; repoint the three "non-AI / clean photo" tests onto it
  (skips if the corpus is absent).

Metadata-source coverage (close the last sub-variant gaps):
- c2pa digitalSourceType: algorithmicMedia (procedural, not flagged AI) and
  compositeWithTrainedAlgorithmicMedia (AI + SynthID proxy).
- exif_generator: EXIF Artist and ImageDescription fields (Software/Make/XMP
  CreatorTool were already covered).

All 8 metadata-source kinds are now tested at both the unit and identify()
level. 313 tests pass. CLAUDE.md updated (corpus tracked, clean_photo fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 14:46:47 -07:00
test-user 59d72c5db7 Record verified gpt-image-2 SynthID-cleaned chain in corpus
Add manifest row for the 4ef377bd -> f6dd47a5 chain: a gpt-image-2 sample
(openai.com/verify: SynthID + C2PA detected) cleaned via v0.5.3 `all` at
native 1254x1254 (prod-equivalent SDXL base, strength 0.05, 50 steps).
openai.com/verify reports SynthID NOT detected after the run, re-confirming
that the #10 native-resolution default defeats OpenAI SynthID and resolving
the #15 root cause (older SD-1.5/768px downscale default did not).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 13:55:10 -07:00
test-user 1afc1e60ef test(samples): add real Doubao TC260 AIGC reference sample
2048x2048 PNG carrying China's TC260 <TC260:AIGC> label; identify reports
it as a China AIGC-labeled generator (TC260). Reference fixture for manual
re-verification of the TC260 detection path -- the automated tests use
synthetic blobs, so nothing depends on this file being present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:36:28 -07:00
test-user 768d997ef0 docs: scope SynthID provenance claims to source-verified facts
Threat model: replace the unverified deployment list (Gemini 3 Pro /
Nano Banana Pro / Imagen 4 / Veo) with the source-verified scope -- SynthID
across Imagen / Veo / Lyria plus Gemini app outputs (>10B items by Dec 2025),
and attribute the 136-bit payload to the paper's SynthID-O variant.

openai-images-2 sample: note the file predates the 19 May 2026 SynthID
rollout across ChatGPT / Codex / API, and that openai.com/verify is now the
public oracle (still no local decoder).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:18:13 -07:00
test-user fb42295b3a docs: record verified fal fast-sdxl checkpoint + native-resolution updates
- fal's llms.txt confirms fast-sdxl is stabilityai/stable-diffusion-xl-base-1.0,
  the exact checkpoint the local CLI defaults to -> local == prod weights.
  Recorded in CLAUDE.md and README.
- README How it works + sample README: replace the old downscale->upscale
  description with native-resolution processing (matches the #10 fix);
  document --max-resolution as an opt-in OOM cap.
- README roadmap: idna already bumped (uv-secure clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:57:03 -07:00
test-user 93c664f7fb docs: sync README + corpus map with v0.5.x detection coverage
- README Features: add the identify / provenance-detection capability.
- README Supported models: add FLUX, Stability AI, Microsoft/Bing
  (MAI-Image), Meta AI rows; note SD/SDXL/FLUX imwatermark is locally
  detectable; add a detection note pointing at identify.
- corpus README per-platform map: add Stability / Ideogram / Recraft /
  Krea-FLUX rows + an imwatermark column; correct Bing (MAI-Image,
  signs 'Microsoft'); note imwatermark fires only on pristine pipeline
  output, not re-hosts/exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:32:03 -07:00
test-user ede35a3db5 feat(metadata): read EXIF Make tag; collect Ideogram/Recraft/Krea-FLUX
Collected live samples from three popular generators we lacked:

- Ideogram tags its downloads with EXIF Make="Ideogram AI" (no C2PA, no
  SynthID, no imwatermark) -- the Make tag is its only signal. exif_generator
  only read Software/Artist/ImageDescription, so it missed this; now reads
  Make too. Real cameras put "Apple"/"Canon" in Make (no AI token), so this
  stays low-false-positive. 4 originals ingested.
- Recraft (PNG export) and Krea hosting FLUX 2: downloads carry NO detectable
  signal -- no C2PA/EXIF/IPTC, and notably no imwatermark despite Krea running
  FLUX. identify correctly reports 'unknown'. Both ingested as neg fixtures.

Lesson recorded in CLAUDE.md: the imwatermark detector fires only on pristine
output from a pipeline that runs the encoder (diffusers default, official BFL),
not from re-hosts (Krea/Stability) or re-encoded exports (Recraft/Canva).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 18:38:56 -07:00
test-user 3a1c5427c8 feat(c2pa): recognize Stability AI issuer; fix Microsoft platform label
Collected live C2PA positives from Bing Image Creator and Stability Brand
Studio (DreamStudio successor) and learned two things our scan got wrong:

- Bing now runs Microsoft's own MAI-Image model, not DALL-E, and signs
  C2PA as 'Microsoft'. The scan caught it, but the platform label claimed
  'Microsoft Designer (DALL-E / OpenAI backend)'. Relabeled model-neutral:
  'Microsoft (Bing Image Creator / Designer)'.
- Stability signs C2PA as 'Stability AI' (cert 'Stability AI Ltd'), which
  was not in C2PA_ISSUERS, so it read as 'unknown signer'. Added the issuer
  and a platform mapping. Stability uses no SynthID and (on its current
  Stable Image model) no imwatermark watermark -- verified, both negative.

Both ingested as SynthID-negative corpus fixtures (they are AI but not
SynthID) for issuer-coverage. Canva skipped: its downloads are re-encoded
design exports that strip C2PA, so a Canva sample would be inconclusive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 17:12:42 -07:00
test-user af787fd8d6 docs(corpus): per-platform watermark map + surface-dependent blind spot
Grow the SynthID corpus to 109 originals (91 iPhone-photo negatives,
2 positives) and document what was learned studying 8 platforms:

- README: per-platform watermark map (C2PA issuer / SynthID pixel / IPTC
  / visible sparkle per platform) and an "originals, not previews" note
  (re-encoded previews strip metadata, so a clean preview is not proof).
- CLAUDE.md: surface-dependent blind spot -- the same Google model wraps
  C2PA in the Gemini app but emits the SynthID pixel watermark + sparkle
  with no C2PA/IPTC via the API/playground (AI Studio, Nano Banana), so
  synthid_source returns None despite SynthID being present; only the
  pixel oracle or the visible-sparkle detector catches those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:55:17 -07:00
test-user da0edcbddc chore(corpus): grow SynthID reference set + document autonomous Chrome collection
Adds content positives (OpenAI gpt-image: forest, fisherman, tokyo; Google
gemini: fisherman, mug) and SDXL/non-SynthID negatives to the local corpus
manifest. Now spans 4 resolutions across 2 vendors (was solid-black only).

README: documents driving generation via Chrome MCP -- Gemini single-click
download; ChatGPT via in-page fetch+blob (preserves original C2PA bytes,
unlike the flaky UI download / a canvas re-encode).

Images stay gitignored; only the manifest (sha256 + labels + extracted
metadata) and protocol are tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 12:46:46 -07:00
test-user f07ce10c72 feat(metadata): SynthID-source detection, C2PA parser consolidation, corpus + tests
Detect SynthID-bearing images via their C2PA companion: a manifest signed by a
SynthID-using vendor (Google/OpenAI) on AI-generated content implies an
invisible SynthID pixel watermark. Verified end-to-end against the vendor
oracles (openai.com/verify, Gemini "Verify with SynthID").

- metadata: synthid_source() + synthid_watermark verdict in get_ai_metadata,
  surfaced as a `metadata --check` callout. Format-agnostic (PNG caBX parser +
  JPEG/WebP/AVIF/HEIF/JXL binary scan).
- constants: SYNTHID_C2PA_ISSUERS {Google, OpenAI}; +opened/placed actions.
- c2pa: single CBOR-aware parser (_cbor_text_after) replaces glitchy regex
  (fixes fGPT-4o claim_generator); removed duplicate _scan_png_c2pa_chunk from
  metadata; shared synthid_verdict / synthid_vendors_in helpers.
- corpus: scripts/synthid_corpus.py ingest tool + data/synthid_corpus/
  (manifest tracked, images gitignored) for a labeled reference set.
- tests: +38 across C2PA parser internals, extract/inject round-trip, ISOBMFF
  container stripping, all IPTC AI markers, and invisible watermark strength
  tiers (SynthID/StableSignature/TreeRing/StegaStamp/RingID/RivaGAN/...).

Pixel-level SynthID detection remains out of reach locally (Google's decoder is
proprietary); a from-scratch spectral pilot confirmed it does not separate real
content. See CLAUDE.md for the full evaluation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 11:32:46 -07:00
test-user 87d02126e3 feat(metadata): parse C2PA JUMBF manifest fields, add Images 2.0 sample, bump to 0.3.4
- metadata --check now shows claim_generator, c2pa_spec, digital_source_type,
  c2pa_actions, signer instead of empty table for C2PA-only files
- reuses existing extract_c2pa_chunk() from noai/c2pa.py — no more duplicate
  PNG chunk parsing or full-file reads
- adds data/samples/openai-images-2/amur-leopard.png: real gpt-image-2 output
  with C2PA manifest signed by OpenAI OpCo LLC / Trufo CA (spec 2.2.0)
- removes stale data/samples/nano-banana-1/2.png (no longer referenced)
- updates README: new Images 2.0 row in supported models table
- documents known text-degradation limitation in CLAUDE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 17:21:51 -07:00
test-user 1890848ec3 SEO-optimized README, add sample images from multiple AI models
- Rewrite README for SEO: Nano Banana, SynthID, Made with AI, C2PA keywords
- Add Supported Models table with 7 AI services
- Add 'Made with AI' label removal to features
- Rename sections for search discoverability
- Add samples: ChatGPT/DALL-E, Midjourney, Adobe Firefly
- Reorganize data/samples with flat structure and clear naming
2026-03-25 17:23:24 -07:00
test-user e5d8970add Add project files, tests, and documentation for GitHub release
- CLI with visible, invisible, all, metadata, and batch commands
- Gemini watermark removal via reverse alpha blending
- Invisible watermark removal via diffusion regeneration (SynthID, TreeRing)
- AI metadata stripping (EXIF, PNG text, C2PA)
- Face protection (YOLO/Haar) and analog humanizer
- 137 tests covering all CLI modes and core engines
- Ruff and Pyright clean
2026-03-25 11:15:05 -07:00