Commit Graph

37 Commits

Author SHA1 Message Date
Victor Kuznetsov 223cbcf171 feat(metadata): detect China TC260 AIGC PNG chunk and HuggingFace hf-job-id
aigc_label now reads the TC260 label from a raw-JSON `AIGC` PNG tEXt chunk
(as Doubao/ByteDance write it, with no namespaced XMP marker) in addition to
the `<TC260:AIGC>` XMP block, via a shared _parse helper gated on a TC260 field
so a generic AIGC key cannot false-positive. New huggingface_job() reads the
hf-job-id PNG chunk; identify surfaces it as a medium-confidence hf_job signal
(parallel to the visible sparkle, never overriding a hard metadata verdict).
Both wired into has_ai_metadata/get_ai_metadata; the PNG save whitelist already
strips them on removal. Found by auditing 646 corpus originals: 28 AIGC and 3
hf-job files the library previously reported as Unknown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:40:17 -07:00
Victor Kuznetsov 0eec3001bb feat(invisible): protect text automatically by default (#21)
Mirror protect_faces: protect_text defaults to True in invisible_engine and
watermark_remover, so the SDXL pipeline detects text per image and switches to
Differential Diffusion only when glyphs are found. Text-free inputs fall back to
plain img2img with no differential-pipeline load, so the autonomy is free. The
CLI now exposes a single off-switch --no-protect-text instead of the positive
flag, keeping the interface minimal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:24:09 -07:00
Victor Kuznetsov a0bf62e601 feat(invisible): preserve text/CJK via Differential Diffusion (--protect-text) (v0.6.10)
SDXL img2img regenerates every pixel, so small text and CJK glyphs deform
at the strengths that defeat SynthID (issue #21). With --protect-text a
CJK-native PP-OCRv3 detector (2.4 MB ONNX, cv2.dnn, no torch, cached on
first use) locates text regions and the pass switches to the SDXL
Differential-Diffusion community pipeline: a per-pixel change map keeps
text regions largely intact while the background is regenerated to strip
the watermark. Gated to the SDXL default model; falls back to plain
img2img with a warning when unavailable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 11:59:15 -07:00
test-user 27539c0da9 docs: tidy roadmap to open items only (drop shipped v0.6.7-0.6.9)
The Roadmap is the project TODO; shipped features (Integrity Clash, streaming-MP4
scan window, meta-box XMP blanking) no longer belong under "not yet implemented".
Removed them and kept the still-open remainder as its own item (AVIF/HEIF Exif
*item* inside the meta box). Net open TODO: SynthID v2 regression test, local
SynthID pixel detector, grow the SynthID corpus, real non-PNG C2PA fixtures,
pyright maintenance debt, meta-box Exif item, Canon/Samsung device signers,
Resemble PerTh (dead end), video pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:18:28 -07:00
test-user 5bfed00553 feat(metadata): blank AI-label XMP inside the HEIF/AVIF meta box (v0.6.9)
HEIF/AVIF store XMP as a meta-box `mime` item whose bytes live in mdat/idat, out
of reach of the top-level uuid/jumb box stripper. An AI-label XMP packet there
(TC260 AIGC, IPTC "Made with AI", IPTC 2025.1) was therefore left in place.

isobmff.blank_ai_xmp_packets locates each XMP packet by its <?xpacket begin ...
end?> delimiters and, if it carries an AI marker (_AI_LABEL_MARKERS), overwrites
it with spaces of the SAME length. Equal length means no box size or iloc offset
shifts -- the coded image stays bit-for-bit intact, the item stays structurally
valid, only the AI label content is destroyed. Plain (non-AI) XMP is left alone,
mirroring the top-level XMP-uuid content match. Wired into remove_ai_metadata's
ISOBMFF branch after strip_c2pa_boxes.

Chosen over exiftool (a non-bundled binary dep) to stay pure-Python and
droplet-compatible; over full iinf/iloc surgery to avoid offset-rewrite
corruption risk. The AI labels we target are all XMP, so this closes the
practical gap. An Exif *item* inside the meta box (rare) still needs iinf/iloc
surgery or exiftool -- documented.

4 new tests (TestMetaBoxXmpBlanking): AI packet blanked (same length, marker
gone, surrounding image bytes intact), plain XMP preserved, no-packet no-op, and
end-to-end remove_ai_metadata on a .heic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:15:48 -07:00
test-user 31f0a82906 feat(metadata): detect C2PA/AIGC/IPTC manifests after a large mdat in MP4 (v0.6.8)
Provenance detection no longer relies on a fixed first-MB read. In a streaming /
non-faststart MP4 the C2PA manifest sits AFTER a multi-megabyte mdat, beyond the
1 MB scan window, so it was missed.

- isobmff.scan_c2pa_region(path): a file-seeking top-level box walker that
  returns the payloads of uuid/jumb (provenance) boxes, seeking past mdat by
  size without reading it -- works on multi-GB files. Returns b"" for
  non-ISOBMFF or on read error. Mirrors the box-size encoding of the existing
  in-memory _iter_top_level_boxes (largesize / size==0).
- metadata.scan_head(path, size): the shared input for every C2PA/AIGC/IPTC
  byte scan -- first __TEXT	__DATA	__OBJC	others	dec	hex bytes plus, for ISOBMFF, the late provenance-box
  payloads. Behavior-neutral (f.read(size)) for non-ISOBMFF inputs.
- Routed all six metadata scan sites (has_ai_metadata, aigc_label,
  iptc_ai_system, synthid_source, exif_generator XMP, get_ai_metadata
  soft-binding) and identify's head read through scan_head.

6 new tests: late box found by scan_c2pa_region / scan_head, the fixed window
provably misses it, non-ISOBMFF -> b"", front-placed (faststart) regression.

The remaining gap stays documented: EXIF/XMP stored as items inside the meta
box (AVIF/HEIF stills) still needs meta-box surgery or exiftool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:42:29 -07:00
test-user 18160fe269 feat(identify): integrity-clash detection for contradictory provenance (v0.6.7)
Surface contradictions between independent provenance signals instead of
collapsing to a single verdict -- a strong tell of spoofed, transplanted, or
laundered metadata. Inspired by arXiv:2603.02378.

Two rules in the new _integrity_clashes helper:
- Conflicting AI-origin attributions: two or more distinct AI vendors named by
  independent generator stamps (e.g. a C2PA OpenAI manifest on an image whose
  EXIF says Make="Ideogram AI").
- Camera + AI: a camera-capture C2PA device (Pixel/Leica/Sony/Nikon/Truepic)
  coexisting with an AI-generation marker -- a genuine capture is not AI.

High-precision by design: only hard generator stamps feed it (C2PA issuer when
the source is AI, SynthID proxy, EXIF/XMP generator, IPTC AISystemUsed, xAI,
AIGC). The fuzzy visible sparkle and the open invisible watermark are excluded
-- the latter can be a by-product of our own SDXL removal pass. Vendor
normalization (_vendor_of over _AI_VENDOR_TOKENS) keeps consistent signals from
clashing (C2PA "Google (Gemini)" + SynthID-Google agree); the C2PA vendor is
read from the issuer attribution, not the resolved platform, so a camera label
like "Google Pixel" cannot mis-normalize to an AI vendor.

Surfaced as ProvenanceReport.integrity_clashes (red in the table view, included
in --json). 19 new tests; all real single-origin fixtures (chatgpt/firefly/
doubao/grok/mj) verified to produce zero clashes (false-positive guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:27:25 -07:00
test-user d847b39292 docs: lawful-use disclaimer + primary-source Legal-table corrections
README:
- surface a lawful-use / no-liability disclaimer near the top
- reword two feature bullets away from detection-evasion framing
  ("bypass AI image classifiers" -> neutral post-processing; drop the
  platform-targeting language from the "Made with AI" bullet)
- Legal table, each corrected against the primary text:
  - CA AB 2655 was struck down on Section 230 ONLY (Kohls v. Bonta,
    E.D. Cal., Aug 2025); the court did not reach the First Amendment
    (the companion AB 2839 was separately enjoined on 1A grounds)
  - COPIED Act: add the bill number (S. 1396, 119th Cong.)
  - South Korea AI Framework Act: in force 22 January 2026 (exact date)

CLAUDE.md: sync the South Korea date to 22 January 2026.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:50:20 -07:00
test-user fee0e139af docs: refresh README roadmap + metadata-strip coverage for v0.6.x
- Metadata-strip feature now lists the audio/video container coverage shipped in
  v0.6.0-v0.6.4 (MP4/MOV/M4V/M4A via ISOBMFF box walker; WebM/MP3/WAV/FLAC/OGG
  losslessly via ffmpeg).
- Roadmap updated: the AVIF/HEIF item now reflects that top-level XMP/C2PA boxes
  and non-ISOBMFF audio/video are handled, with only meta-box-item EXIF/XMP left
  (needs exiftool). Added the open backlog: multi-signal "Integrity Clash"
  reporting (arXiv:2603.02378), Canon/Samsung device signers pending a real
  sample, the streaming-MP4 scan-window limit, and Resemble PerTh audio as
  evaluated-but-infeasible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 10:42:40 -07:00
test-user bc3228d387 feat(visible): Doubao text-mark removal + universal region eraser
Add deterministic, CPU-only removal of the visible Doubao "豆包AI生成" mark and
a position-agnostic region eraser for any other visible watermark/logo.

- doubao_engine.py: locate (geometry, scales with width) + polarity-aware
  white-top-hat glyph mask + cv2 inpaint; coverage-gated detection and a
  dense-text safety guard. No GPU, ~30ms.
- region_eraser.py + `erase` command: inpaint arbitrary --region box(es).
  Default cv2 backend (no deps); optional big-LaMa via onnxruntime (`lama`
  extra, Carve/LaMa-ONNX, model downloaded on first use, never bundled).
- cli `visible --mark auto|gemini|doubao`: auto routes by detector confidence.
- tests for both engines; seed previously-unseeded CLI image fixtures to stop
  the Doubao detector flaking on random corners.
- .gitignore: doubao_capture/{seeds,captures} scratch (alpha-map calibration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:31:51 -07:00
test-user c196a16900 feat: detect soft-binding vendors, IPTC 2025.1, video/audio C2PA, TrustMark (v0.6.0)
Broadens metadata provenance coverage at the detection and container-strip level.

Detection:
- C2PA soft-binding `alg` -> forensic-watermark vendor (Adobe TrustMark,
  Digimarc, Imatag, Steg.AI, Microsoft, ...) via C2PA_SOFT_BINDINGS +
  soft_binding_vendors_in(); names the watermark vendor even when the watermark
  itself can't be decoded.
- IPTC Photo Metadata 2025.1 AI-disclosure XMP fields (AISystemUsed etc.) via
  iptc_ai_system() + IPTC_AI_FIELD_MARKERS.
- Adobe TrustMark open keyless decoder (trustmark_detector.py, optional extra
  `trustmark`) -- the watermark behind Adobe Durable Content Credentials.
  Detects provenance, not AI origin, so it does not assert is_ai.

Removal / containers:
- isobmff.strip_c2pa_boxes now also drops a top-level XMP uuid box that carries
  an AI label (matched by AI-marker content, byte-order-robust; plain XMP kept).
- remove_ai_metadata routes MP4/MOV/M4V/M4A (and any ftyp-sniffed ISOBMFF)
  through the box stripper; raises a clear error for non-ISOBMFF audio/video
  (WebM/MP3/WAV) instead of crashing in the image path.

Tests: soft-binding scan, IPTC element/attribute/presence, MP4 + M4A detect/
strip, ISOBMFF XMP surgical strip, content-sniff, unsupported-container guard,
TrustMark absent-safety + identify integration. ruff clean; pyright clean on
all new modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 17:56:48 -07:00
test-user 74618b91a7 feat: detect xAI/Grok EXIF signature; refresh watermarking landscape (v0.5.5)
xAI Grok (Aurora) images carry no C2PA/SynthID/IPTC -- their only provenance
signal is an EXIF pair: ImageDescription "Signature: <base64>" + a UUID Artist.
Verified stable across 3 genuine generations (a real download previously read
as unknown / "no AI metadata").

- metadata.xai_signature(): matches the Signature blob + UUID Artist pair;
  wired into has_ai_metadata, get_ai_metadata, and identify (platform
  "xAI (Grok / Aurora)").
- data/samples/grok-1.jpg: real Grok fixture (neutral content; the Artist UUID
  is the public image id, not PII).
- Tests: synthetic-fixture unit tests, real-sample assertion, identify
  integration (322 passing).

Docs (research refresh, May 2026):
- C2PA 2.4 Durable Content Credentials (soft-binding re-discovery after the
  embedded manifest is stripped).
- New AI-labeling laws, primary-source verified: EU AI Act Art 50 (2026-08-02),
  South Korea AI Framework Act Art 31(3), California AB 853.
- Hedge removal claims: defeating the SynthID verifier is not forensic
  invisibility (arXiv:2605.09203); cite SynthID-Image (arXiv:2510.09263).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:14:35 -07:00
test-user 5e29c69e7b README: lead with hosted raiw.cc CTA, honest free/paid split
Move the raiw.cc call-to-action above the sponsor ask and drop the
misleading "free web service" framing: visible-watermark and metadata
removal are free, invisible removal runs on paid cloud GPUs. Also point
no-GPU users to the hosted service from the invisible-removal feature
bullet, where the GPU requirement is stated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 12:47:28 -07:00
test-user 8ed4a754ff Add GitHub Sponsors donation button (FUNDING.yml + README badge)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 16:09:52 -07:00
test-user c7f0d71f90 feat(identify): detect China TC260 AIGC label (Doubao et al.)
China-served generators embed an XMP <TC260:AIGC>{"Label":"1",...} block
(China's mandatory AI-content labeling, TC260 standard). Doubao (ByteDance)
uses it -- verified on the real #13 sample. It's none of C2PA / SynthID /
imwatermark / IPTC, so identify() previously returned unknown.

- metadata: AIGC_MARKERS + aigc_label() (json-decodes the HTML-entity-encoded
  block); has_ai_metadata + get_ai_metadata now surface it.
- identify: new 'aigc' signal -> is_ai True, platform 'China AIGC-labeled
  generator (TC260; e.g. Doubao)', carries the ContentProducer code.
- Container-agnostic raw-byte scan, so it covers the whole China-AIGC ecosystem
  (Jimeng/Kling/Qwen/Ernie share the standard).
- Tests: synthetic TC260 block (metadata + identify). Docs updated.

Addresses #13.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:29:51 -07:00
test-user 768d997ef0 docs: scope SynthID provenance claims to source-verified facts
Threat model: replace the unverified deployment list (Gemini 3 Pro /
Nano Banana Pro / Imagen 4 / Veo) with the source-verified scope -- SynthID
across Imagen / Veo / Lyria plus Gemini app outputs (>10B items by Dec 2025),
and attribute the 136-bit payload to the paper's SynthID-O variant.

openai-images-2 sample: note the file predates the 19 May 2026 SynthID
rollout across ChatGPT / Codex / API, and that openai.com/verify is now the
public oracle (still no local decoder).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 12:18:13 -07:00
test-user f3ecea348a docs: gpt-image-2 carries SynthID (no local decoder; openai.com/verify oracle)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:59:16 -07:00
test-user fb42295b3a docs: record verified fal fast-sdxl checkpoint + native-resolution updates
- fal's llms.txt confirms fast-sdxl is stabilityai/stable-diffusion-xl-base-1.0,
  the exact checkpoint the local CLI defaults to -> local == prod weights.
  Recorded in CLAUDE.md and README.
- README How it works + sample README: replace the old downscale->upscale
  description with native-resolution processing (matches the #10 fix);
  document --max-resolution as an opt-in OOM cap.
- README roadmap: idna already bumped (uv-secure clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:57:03 -07:00
test-user 93c664f7fb docs: sync README + corpus map with v0.5.x detection coverage
- README Features: add the identify / provenance-detection capability.
- README Supported models: add FLUX, Stability AI, Microsoft/Bing
  (MAI-Image), Meta AI rows; note SD/SDXL/FLUX imwatermark is locally
  detectable; add a detection note pointing at identify.
- corpus README per-platform map: add Stability / Ideogram / Recraft /
  Krea-FLUX rows + an imwatermark column; correct Bing (MAI-Image,
  signs 'Microsoft'); note imwatermark fires only on pristine pipeline
  output, not re-hosts/exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 09:32:03 -07:00
test-user ad3b8ee248 feat(identify): read EXIF Software / XMP CreatorTool generator tags
Closes the documented gap where EXIF/XMP fields inside AVIF/HEIF/JXL went
unparsed. metadata.exif_generator extracts the EXIF Software/Artist tag
(via PIL+piexif, which opens AVIF natively) and the XMP CreatorTool (via a
container-agnostic raw-byte scan that also covers HEIF/JXL that PIL can't
open), and matches against AI_GENERATOR_TOKENS so only generator names
(Firefly, DALL-E, Midjourney, ComfyUI, ...) fire -- a plain 'Adobe
Photoshop' or 'GIMP' tag is not flagged.

identify() surfaces it as a high-confidence signal and uses it for
platform attribution when no C2PA names a platform, so an AVIF/HEIF whose
only AI signal is an EXIF/XMP generator tag is now caught.

Validated with synthesized fixtures (the 'no positive fixtures' blocker
was self-imposed): real AVIF and JPEG written with EXIF Software via PIL,
plus an XMP CreatorTool raw-scan fixture. Zero false positives across the
109-image corpus (real iPhone photos carry no AI generator token).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 17:56:39 -07:00
test-user 27ad5b7645 feat(identify): detect open SD/SDXL/FLUX invisible watermark
Research found one locally-fillable detection gap: Stable Diffusion, SDXL,
and FLUX all embed an open DWT-DCT watermark via the invisible-watermark
(imwatermark) library -- a PUBLIC decoder, no secret key, unlike SynthID.
New invisible_watermark.py decodes the known fixed patterns (verified
against upstream source: diffusers SDXL WATERMARK_MESSAGE, FLUX.2
src/flux2/watermark.py, and the 'StableDiffusionV1' default string) and
identify() reports the scheme as a high-confidence signal.

Verified locally end-to-end: embedding SDXL's exact 48-bit message and
decoding it back recovers 48/48 bits; a clean image and our own fal-SDXL
outputs decode to ~21/48 (no match). Caveat baked into the report: the
watermark is fragile -- gone after JPEG q90 -- so it confirms origin only
on pristine files; absence is never proof.

imwatermark is an optional dep (extra 'detect'; pulls non-headless opencv),
so the import is guarded and the signal is skipped when absent. CLI
--no-visible now means metadata-only (skips both pixel-domain detectors).

Also records the broader watermarking landscape in CLAUDE.md: which
services are locally detectable (SD/SDXL/FLUX), C2PA-covered (Bing/Canva/
Getty/Shutterstock unsampled), or proprietary-only like SynthID (Amazon
Titan/Nova, Kakao). Midjourney embeds neither C2PA nor an invisible mark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:53:59 -07:00
test-user fa104bcade feat(identify): provenance command (platform + watermark inventory)
New 'identify' command and identify.py module: upload an image, get one
ProvenanceReport answering where it was made and what watermarks it
carries. Aggregates every locally-readable signal:

- C2PA Content Credentials -> generating platform (issuer + generator).
- IPTC digitalSourceType 'Made with AI' (Meta and others).
- Embedded SD/ComfyUI generation parameters (local pipelines).
- SynthID metadata proxy (Google / OpenAI C2PA companion).
- Visible Gemini sparkle (cv2 fallback for the stripped-metadata case),
  promoted only at confidence >= 0.5 (corpus-tuned: Gemini sparkles
  score >= 0.56, non-sparkle <= 0.49).

is_ai_generated is True or None, never asserted False -- stripped
metadata leaves no local proof of a clean origin, so absence of signals
is reported as 'unknown' with an explicit caveat. The SynthID *pixel*
watermark remains locally undecodable; the report says so.

Non-PNG containers (JPEG/WebP/AVIF/HEIF/JXL) get the same issuer +
generator attribution via a binary scan (the caBX parser is PNG-only).
The cv2 dependency is isolated in gemini_engine.detect_sparkle_confidence
so identify.py stays type-clean. CLI supports --json and --no-visible.

Validated against the 109-image corpus: 14/14 positives flagged AI,
93/94 negatives clean (the one 'neg' flagged is a Meta image that
genuinely carries the IPTC tag -- correct), zero true errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:19:26 -07:00
test-user c006f9b8b4 docs(roadmap): record next steps for SynthID detector work
Captures the forward plan so a future session picks it up: local pixel
detector is blocked pending a generation API or raw watermarked dataset
(spectral methods shown insufficient); grow the oracle-labeled corpus;
replace synthetic non-PNG C2PA fixtures with real ones; and the maintenance
debt (idna bump, strict-pyright cleanup) needed for a green maintain.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 11:34:43 -07:00
test-user f07ce10c72 feat(metadata): SynthID-source detection, C2PA parser consolidation, corpus + tests
Detect SynthID-bearing images via their C2PA companion: a manifest signed by a
SynthID-using vendor (Google/OpenAI) on AI-generated content implies an
invisible SynthID pixel watermark. Verified end-to-end against the vendor
oracles (openai.com/verify, Gemini "Verify with SynthID").

- metadata: synthid_source() + synthid_watermark verdict in get_ai_metadata,
  surfaced as a `metadata --check` callout. Format-agnostic (PNG caBX parser +
  JPEG/WebP/AVIF/HEIF/JXL binary scan).
- constants: SYNTHID_C2PA_ISSUERS {Google, OpenAI}; +opened/placed actions.
- c2pa: single CBOR-aware parser (_cbor_text_after) replaces glitchy regex
  (fixes fGPT-4o claim_generator); removed duplicate _scan_png_c2pa_chunk from
  metadata; shared synthid_verdict / synthid_vendors_in helpers.
- corpus: scripts/synthid_corpus.py ingest tool + data/synthid_corpus/
  (manifest tracked, images gitignored) for a labeled reference set.
- tests: +38 across C2PA parser internals, extract/inject round-trip, ISOBMFF
  container stripping, all IPTC AI markers, and invisible watermark strength
  tiers (SynthID/StableSignature/TreeRing/StegaStamp/RingID/RivaGAN/...).

Pixel-level SynthID detection remains out of reach locally (Google's decoder is
proprietary); a from-scratch spectral pilot confirmed it does not separate real
content. See CLAUDE.md for the full evaluation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 11:32:46 -07:00
test-user f2fc5e09ab feat: SDXL default; AVIF/HEIF/JPEG-XL C2PA stripping
SD-1.5 dreamshaper at 768 px did not defeat SynthID v2 on Gemini 3 Pro
outputs (verified May 2026 via Gemini app's "Verify with SynthID"). Switch
the default invisible engine to SDXL at 1024 px, matching the raiw-app
production config (strength 0.05, steps 50). Drop the SD-1.5 pipeline.

Metadata layer: add C2PA UUID and IPTC AI marker byte-scan detection
across all formats, plus an ISOBMFF box walker (noai/isobmff.py) that
strips top-level C2PA uuid and JUMBF jumb boxes from AVIF/HEIF/JPEG-XL
containers without re-encoding.

README gets a Legal table and a Threat-model section about SynthID v2's
136-bit payload. CLAUDE.md tracks the SD-1.5 regression as historical
context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:54:37 -07:00
test-user 87d02126e3 feat(metadata): parse C2PA JUMBF manifest fields, add Images 2.0 sample, bump to 0.3.4
- metadata --check now shows claim_generator, c2pa_spec, digital_source_type,
  c2pa_actions, signer instead of empty table for C2PA-only files
- reuses existing extract_c2pa_chunk() from noai/c2pa.py — no more duplicate
  PNG chunk parsing or full-file reads
- adds data/samples/openai-images-2/amur-leopard.png: real gpt-image-2 output
  with C2PA manifest signed by OpenAI OpCo LLC / Trufo CA (spec 2.2.0)
- removes stale data/samples/nano-banana-1/2.png (no longer referenced)
- updates README: new Images 2.0 row in supported models table
- documents known text-degradation limitation in CLAUDE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 17:21:51 -07:00
test-user bedb1bca08 docs: add raiw.cc mention and move examples section higher 2026-03-27 10:15:40 -07:00
test-user b41c8e5aba v0.3.1: Fix opencv conflict, graceful GPU fallback, correct docs
- Remove opencv-python from [gpu] extra (conflicts with headless in base deps)
- Add graceful fallback in 'invisible' and 'all' commands when GPU deps missing
- Cache InvisibleEngine in batch mode (avoid reloading model per image)
- Fix --humanize help text (was '0.0-1.0', actual range is 0-6.0+)
- Fix stale docstring referencing non-existent [invisible] extra
- Add [gpu] extra install instructions to README
- Fix broken NeuralBleach placeholder URL in Credits
2026-03-26 10:50:26 -07:00
test-user 1890848ec3 SEO-optimized README, add sample images from multiple AI models
- Rewrite README for SEO: Nano Banana, SynthID, Made with AI, C2PA keywords
- Add Supported Models table with 7 AI services
- Add 'Made with AI' label removal to features
- Rename sections for search discoverability
- Add samples: ChatGPT/DALL-E, Midjourney, Adobe Firefly
- Reorganize data/samples with flat structure and clear naming
2026-03-25 17:23:24 -07:00
test-user c7c43a55d7 Add 'How it works' section to README 2026-03-25 13:51:51 -07:00
test-user 507757738e v0.2.2: Unify quality defaults, improve README
- Unify 'all' defaults to match 'invisible' (strength=0.02, steps=100)
- Reorder CLI docs: 'all' command first, individual commands second
- HuggingFace token is now documented as optional
- Remove 'additional setup' label from invisible section
2026-03-25 12:28:02 -07:00
test-user 636d11a65e Make docs platform-neutral (macOS/Linux/Windows) 2026-03-25 12:15:17 -07:00
test-user 345d455a8c Minor README formatting 2026-03-25 12:04:15 -07:00
test-user cace97b04e Bump version to 0.2.0
Changes since 0.1.0:
- Fix phantom model param bug in invisible/all commands
- Fix macOS SSL certificate issue for YOLO downloads
- Use temp file in 'all' pipeline to hide intermediate output
- Add legal disclaimer and fix license attribution
- Add troubleshooting and upgrade docs to README
- Expand test suite to 137 tests covering all CLI modes
- Clean up dependencies and pyright config
2026-03-25 12:03:44 -07:00
test-user 1a3d2a448e Fix macOS SSL cert issue, add troubleshooting and upgrade docs
- Add SSL certificate auto-fix in FaceProtector (certifi)
- Add Troubleshooting section to README (SSL, first-run downloads)
- Add upgrade instructions for pipx/uv tool users
2026-03-25 11:53:02 -07:00
test-user d7614a7b45 Add legal disclaimer, fix attribution, expand credits
- Add disclaimer section to README (research/education purposes)
- Remove incorrect Apache-2.0 license claim from ctrlregen docstrings
- Expand Credits with CtrlRegen and NeuralBleach attribution
- Add license info (MIT) for GeminiWatermarkTool and NeuralBleach
2026-03-25 11:23:28 -07:00
test-user e5d8970add Add project files, tests, and documentation for GitHub release
- CLI with visible, invisible, all, metadata, and batch commands
- Gemini watermark removal via reverse alpha blending
- Invisible watermark removal via diffusion regeneration (SynthID, TreeRing)
- AI metadata stripping (EXIF, PNG text, C2PA)
- Face protection (YOLO/Haar) and analog humanizer
- 137 tests covering all CLI modes and core engines
- Ruff and Pyright clean
2026-03-25 11:15:05 -07:00