135 Commits

Author SHA1 Message Date
Victor Kuznetsov 729f5f2ecd chore(release): v0.8.2
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.8.2
2026-05-31 15:46:47 -07:00
Victor Kuznetsov b0aad476fb fix(scripts): drop rich import from analysis scripts (red CI after rich removal)
The cli refactor dropped rich from dependencies, but four scripts still did
`from rich.console import Console` / `rich.table import Table`. Their test
modules import the scripts, so a clean `uv sync --frozen` (CI: core+dev, no
rich) failed at collection with ModuleNotFoundError on macOS/Windows/Linux.

Add a shared plain-text shim `scripts/_plain_console.py` (Console/Table via
click.echo, markup stripped) and switch all four scripts to it. Verified: all
four import with rich blocked, and tests/test_synthid_corpus.py +
tests/test_synthid_pixel_probe.py pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:41:50 -07:00
Victor Kuznetsov f16216cabc feat(cli): add --no-protect-faces to invisible/all (skip the YOLO face detector)
Mirrors --no-protect-text: when the image has no people, skip loading and
running the YOLO face detector entirely. The heavy extract+blend already only
ran when a face was found, but the detector itself always loaded+inferred to
decide; this flag lets callers skip that fixed cost.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:27:14 -07:00
Victor Kuznetsov e42b7e9d6a refactor(cli): plain-text console output; drop rich; quiet transformers
cli.py now emits plain ASCII through a small click.echo shim
(_Console / _Table / _Progress) instead of rich: no colors, markup tags,
panels, progress bar, or Unicode glyphs (Warning: / -> / ... and dropped
checkmark/cross marks). identify and metadata tables render as indented
plain lines.

- drop rich from dependencies (pyproject.toml + uv.lock)
- __init__: set TRANSFORMERS_VERBOSITY=error (setdefault) plus a warnings
  filter so the transformers Siglip2ImageProcessorFast deprecation no
  longer prints at CLI startup (it fires from the eager noai import)
- TestGpuHintMarkup: the [gpu] hint is now printed verbatim; docstring updated
- CLAUDE.md: replace the obsolete rich-markup lesson, note the verbosity fix

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:21:29 -07:00
Victor Kuznetsov 2d49c3cb58 fix(invisible): ctrlregen defaults to clean-noise strength, not the SDXL 0.10
The ctrlregen profile inherited the SDXL img2img --strength default (0.10), a
near-identity pass that loaded ControlNet + DINOv2-giant and barely changed the
image -- a no-op for removal. resolve_strength() now resolves an unset strength
per profile: 0.10 for the SDXL default, CTRLREGEN_DEFAULT_STRENGTH (1.0,
clean-noise) for ctrlregen. It checks `is None` rather than falsiness, so an
explicit 0.0 is respected (the old `strength or DEFAULT` swallowed it).

Research basis: CtrlRegen (ICLR 2025, arXiv:2410.05470) removes robust
watermarks by regenerating from clean Gaussian noise; partial-noise img2img
retains watermark info that diffuses back, so a high (clean-noise) strength is
the lever, not a knob on the light SDXL pass. CLI wiring (--strength default
None) lands with the cli refactor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:07:19 -07:00
Victor Kuznetsov 33bd401e2a fix(visible): guard remove_watermark_reverse_alpha on tiny images too
The previous commit guarded extract_mask, but the 2048x1 crash was
actually in _fixed_alpha_map's cv2.resize to a ~1-px-tall target (Windows:
"Unknown C++ exception" / access violation). Return image.copy() up front
when h < 32 or w < 64 (no real watermarked image is that small), before any
cv2 call. Same guard in both Doubao and Jimeng.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 14:00:52 -07:00
Victor Kuznetsov 7167d2bae7 fix(visible): guard extract_mask against degenerate ROIs (Windows CI crash)
The always-align removal scores each placement with a residual detect(),
which on an extremely wide/short image (2048x1, test_wide_short_does_not_raise)
fed cv2 GaussianBlur a ~1-px-tall ROI and faulted natively on Windows py3.12
(access violation, non-deterministic -- one CI cell went red, a re-run passed).
The old at-native path never ran detect() on degenerate sizes. Skip the cv2
pipeline and return an empty mask when bh < 16 or bw < 16; real images always
clear the guard (the WM_* box floors are max(16,..) / max(40,..)). Same fix in
both Doubao and Jimeng. Also sync the stale Doubao module docstring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:25:57 -07:00
Victor Kuznetsov 196e8ea35f docs(claude): record the release flow + sdist-must-exclude-data lesson
PyPI rejected the 0.8.0 sdist (data/ test corpora over the file-size
limit); document the release steps and the [tool.hatch.build.targets.sdist]
exclude so it does not recur.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:01:08 -07:00
Victor Kuznetsov e7c57e3892 chore(release): v0.8.1 — exclude data/ from sdist
The 0.8.0 PyPI publish uploaded the wheel but the sdist was rejected
(400 File too large): hatchling's default sdist bundled the committed
data/ test corpora (synthid_corpus images + the new visible-mark
captures), pushing it past PyPI's per-project file-size limit. Add a
sdist target that excludes /data, dropping it ~85 MB -> 9.8 MB. The
wheel already ships only src/ and is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.8.1
2026-05-31 12:57:46 -07:00
Victor Kuznetsov 315320056b chore(release): v0.8.0
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.8.0
2026-05-31 12:25:02 -07:00
Victor Kuznetsov e572767555 feat(visible): add Jimeng remover, fix Doubao outline defect, reproducible mask build
Visible-watermark work across all three corner-mark engines plus a committed,
reproducible alpha-build pipeline (scripts/visible_alpha_solve.py) fed by committed
solid black/gray/white captures.

- jimeng: new "即梦AI" wordmark remover (reverse-alpha + thin residual inpaint,
  always NCC-aligned -- the mark re-rasterizes/jitters per image). Detect via glyph
  silhouette NCC (0.45 threshold; does not cross-fire with Doubao). Registered in the
  visible-mark catalog; `visible --mark jimeng` / `--mark auto`.
- doubao: fix a real production defect -- the shipped remover left a READABLE
  "豆包AI生成" outline on real samples while detect() returned conf 0.0 (fooled by a
  thin outline), so the test passed and the "56/56 clean" claim was detector-measured,
  not visual. Root cause: under-estimated alpha + fixed-geometry-no-inpaint + tight
  locate box. Rebuilt alpha (careful gray-self solve), always-align, thin inpaint,
  widened locate box -> readable outline becomes faint texture-level traces.
- gemini: rebuild gemini_bg_{96,48} from our own controlled captures (validated NCC
  0.9998 vs the prior third-party asset); removal re-verified clean, no behaviour change.
- tests: add textured-shift regression to both engines (guards the align-on-shift path
  the Doubao defect exposed; lesson: a detector-only removal test is insufficient,
  assert visual residual).
- docs: CLAUDE.md, README, capture READMEs and docstrings synced; stale
  "exact/pixel-exact/56-clean" claims removed.

Also includes a SynthID label-wording clarification in identify.py/cli.py
("SynthID pixel watermark" -> "SynthID watermark, inferred from C2PA metadata").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 12:20:19 -07:00
Victor Kuznetsov 5d0e6c3a65 fix: harden metadata parsers and engines; sync docs (full-repo review)
Apply fixes from a full-repo review (code, tests, docs).

Security / correctness:
- Clamp attacker-controlled PNG/caBX chunk lengths to the remaining file
  size in metadata.py and noai/c2pa.py (a malformed length no longer drives
  a multi-GB read); skipped chunks seek instead of read.
- noai/isobmff.strip_c2pa_boxes is now fail-safe on a malformed box: return
  the original bytes with a warning instead of silently truncating the tail,
  so metadata --remove can no longer emit a corrupt file.
- doubao_engine._fixed_alpha_map clamps the glyph box to the image (no crash
  on degenerate width-vs-height).
- watermark_remover._run_region_hires gates the phaseCorrelate offset on
  response and magnitude (a spurious shift no longer garbles text) and drops
  the generator after a CPU fallback (no MPS/CPU device mismatch).

Robustness:
- gemini_engine, doubao_engine, region_eraser normalize grayscale and RGBA
  inputs to BGR at the engine entry points.
- image_io.imwrite returns False on an unwritable path (matches cv2).
- invisible_engine guards a None imread result before use.
- trustmark_detector._decoder uses a double-checked threading lock.
- ctrlregen.tiling.tile_positions raises on overlap >= tile.
- humanizer chromatic shift no longer wraps opposite-edge pixels.
- identify OpenAI caveat keyed on the normalized vendor, not a substring.
- Remove the dead "visible --detect-threshold" CLI option.
- publish.yml verifies the release tag matches the package version.

Docs:
- README strength 0.05 to 0.10; .env.example HF_TOKEN marked optional;
  doubao_capture README updated to reverse-alpha-only; CLAUDE.md synced with
  the new behaviors and the batch command.

Tests: new test_security_clamp.py for the read clamp and isobmff fail-safe;
erase CLI coverage; integrity-clash rule 2 end-to-end; multi-tag EXIF
survival and cross-format strip guards; channel/size, tiling, humanizer, and
imwrite regressions. Full suite 493 passed, 2 skipped; ruff and pyright src/
clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 18:00:39 -07:00
Victor Kuznetsov 5298dcc6a3 chore(release): v0.7.2
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.7.2
2026-05-30 14:35:04 -07:00
Victor Kuznetsov d88b87ca4e Fix #29 black output: use fp16-fixed SDXL VAE on fp16 GPUs
The stock SDXL VAE overflows to NaN in fp16, so the plain img2img path decodes
to an all-black image on a CUDA/XPU fp16 backend. This is the raiw.cc black
result HitaoLin reported (a 1086x1448 input came back uniformly black). cpu/mps
run fp32 and never hit it, and the differential / region-hires pipeline already
upcasts the VAE itself, so only the plain path on a fp16 GPU was exposed.

`_load_pipeline` now loads `madebyollin/sdxl-vae-fp16-fix` for the default SDXL
checkpoint when running fp16, gated by the pure helper `_needs_fp16_vae_fix`. A
custom non-SDXL model keeps its own VAE.

The decision logic is unit-tested without a download (TestFp16VaeFix). The
black->clean recovery itself needs a CUDA GPU and was not verifiable on this MPS
machine; it must be confirmed on the backend.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 14:31:51 -07:00
Victor Kuznetsov 9be66752c5 chore(release): v0.7.1
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.7.1
2026-05-30 14:21:02 -07:00
Victor Kuznetsov 69559226d7 Clarify metadata command supports video/audio, drop misfiring format warning (#33)
The `metadata` command handles more than images: `remove_ai_metadata` strips
C2PA / AIGC provenance from MP4/MOV/M4V/M4A and from WebM/MP3/WAV/FLAC/OGG via
ffmpeg. But the help said "from images" and the shared `_validate_image` call
printed "Warning: .mp4 may not be supported" on exactly those supported
containers. The argument's `exists=True` already enforces the file exists, so
the validation call only added the wrong warning here.

Update the docstring to list the real format coverage and drop the
image-only validation from this command. The image commands keep it.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 13:20:04 -07:00
Victor Kuznetsov 29da3c52b6 Raise default SynthID-removal strength 0.05 → 0.10 (current Google SynthID) (#32)
* Raise default SynthID-removal strength 0.05 -> 0.10 (current Google SynthID)

The old default (0.04/0.05) no longer removes the CURRENT Google SynthID (Nano
Banana / Gemini 3): verified 2026-05-30 via the Gemini 'Verify with SynthID'
oracle on a real image -- 0.05 still detected, 0.10 not detected (OpenAI's was
already cleared at 0.05). Add DEFAULT_STRENGTH=0.10 in watermark_profiles, route
the engine + CLI defaults to it. At 0.10 small text deforms more, which is why
text protection (_run_region_hires) runs by default. CLAUDE.md SynthID note
corrected. CAVEAT: n=1 Google + n=1 OpenAI; broad corpus oracle validation
pending (task tracked).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Drop unused LOW/MEDIUM/HIGH strength profiles; CLI --strength defaults to DEFAULT_STRENGTH

The fixed strength presets (and get_recommended_strength) were dead -- nothing in
the pipeline used them, only tests. One knob now: DEFAULT_STRENGTH (0.10),
overridable per-call via the CLI --strength flag, which now defaults to that
constant (single source of truth). Removed the WatermarkRemover.LOW/MEDIUM/HIGH
class attrs and the get_recommended_strength tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 13:15:58 -07:00
Victor Kuznetsov e4f558dccf Add per-region high-resolution text protection (regenerate crisp, scrub everywhere) (#31)
Replace the default text-protection path. Differential Diffusion froze text in
latent space, which left SynthID intact inside text (violating remove-everywhere)
and still softened sub-8px strokes (VAE latent limit). _run_region_hires instead
scrubs the whole image, then re-scrubs each detected text block at high resolution
and feather-composites it back: every pixel is regenerated (watermark removed
everywhere) while small text stays crisp (high-res strokes span >1 latent cell).

merge_text_regions + feather_paste are pure and unit-tested; each re-scrubbed
patch is phase-correlated back to the original crop to null the ~1-2px round-trip
offset. Synthetic 18px multilingual text: text-region SSIM 0.28 -> 0.48, visually
garbled -> readable across Latin/Cyrillic/CJK. Legacy _run_differential /
build_change_map remain but are no longer the default. Prod use still requires
confirming via the SynthID oracle that re-scrubbed text zones read watermark-free.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 12:59:29 -07:00
Victor Kuznetsov c928ee6e42 chore(release): v0.7.0
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.7.0
2026-05-30 12:35:32 -07:00
Victor Kuznetsov 89f427852f Fix #30 white box: stop zeroing alpha in the watermark region on save
On RGBA inputs the CLI forced the watermark bbox alpha to 0 on save, so the
removed-sparkle area became a transparent hole that renders as a solid white
box on any non-transparent viewer. The Gemini app exports opaque RGBA, so
every user hit it. Reverse-alpha already recovers the real pixels there (and
`erase` inpaints them), so there is no artifact to hide -- the hole was the
bug, introduced as an over-correction in d091b9f.

`_write_bgr_with_alpha` now rejoins the input alpha plane unchanged (drops the
`clear_region`/`pad` params); the `visible` / `erase` / `all` / `batch` call
sites drop the cleared-region argument and the orphaned region bookkeeping.
The registry `remove()` still returns the mark bbox (used for inpaint_residual
positioning); the CLI just no longer clears alpha with it.

Inverts the test that locked in the old behavior into a #30 regression guard
(watermark-region alpha stays opaque, no pixel forced transparent). Verified
end-to-end on a real Gemini RGBA export: sparkle gone, zero transparent
pixels, clean over a white background.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 12:27:37 -07:00
Victor Kuznetsov 25a1acc53b Detect TC260 AIGC label in JPEG EXIF and late/attribute PNG XMP
A corpus audit surfaced China TC260 AIGC-labeled images that `identify`
missed. Three detection gaps in `aigc_label`, all fixed:

- raw-JSON `{"AIGC":{...}}` in JPEG EXIF (UserComment): brace-matched from
  the scan head with `json.raw_decode`, gated on a TC260 field like the
  PNG-chunk path. (Doubao-class output via that export surface.)
- XMP attribute form `TC260:AIGC="{...}"` (PicWish): folded into the
  element regex as a second alternation.
- TC260 XMP packet appended after a large `IDAT`, past the 1 MB scan
  window: `scan_head` now appends late PNG metadata chunks via
  `_png_late_metadata`, mirroring the existing ISOBMFF late-box scan.

Adds `scripts/corpus_gap_scan.py`: runs `identify` over a corpus, writes
the per-file report CSV, and flags `unknown` files that carry a known
marker in their metadata region (the audit that found these gaps).
Scanning only the metadata region — not the whole file — avoids the
random short-token collisions inside compressed PNG/JPEG streams.

On the local corpus this lifts 3 files from `unknown` to AI (China AIGC)
and leaves zero false gap candidates. Synthetic piexif/PngInfo fixtures
cover all three forms.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 11:44:53 -07:00
Victor Kuznetsov 58bdf51c59 Visible-watermark registry: reverse-alpha-only Doubao + Gemini, exact native recovery (#28)
* fix(trustmark): gate detection on re-encode durability to kill false positives

TrustMark's wm_present flag is a BCH validity check that spuriously
validates on a content-correlated fraction of un-watermarked images
(AI textures trip it more than camera photos). On a 1343-image set all
20 raw detections were false, several on Gemini/OpenAI/Doubao output that
cannot carry Adobe's watermark, with random-bytes secrets.

A genuine TrustMark is a durable soft binding that survives re-encoding,
so detect_trustmark now re-decodes after a mild JPEG round-trip and
requires the same schema both times. Every observed false positive
collapsed under this gate; the second decode runs only on the rare hit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(identify): Samsung Galaxy AI, FLUX, ByteDance C2PA; fix C2PA substring FP

Detection extensions verified on real signed files (2026-05-29):

- Samsung Galaxy AI: signer attribution via a new _SIGNER_C2PA_PLATFORM
  (Samsung Galaxy / ASUS Gallery) kept separate from the capture-camera
  _DEVICE_C2PA_PLATFORM so a Galaxy AI edit (device cert + AI source type)
  does not trip the camera-vs-AI integrity clash. Plus metadata.samsung_genai:
  the proprietary genAIType marker in PhotoEditor_Re_Edit_Data, a medium-
  confidence AI-editing signal (samsung_only branch).
- Black Forest Labs (FLUX) and ByteDance Volcano Engine (Doubao/Jimeng)
  added as C2PA issuers + issuer->platform mappings.
- fix: C2PA presence required only the bare 4-byte 'c2pa' substring, which
  false-positives on compressed pixel data (a recompressed PNG IDAT re-flagged
  C2PA after its manifest was correctly stripped). New c2pa_marker_in() requires
  the JUMBF wrapper (jumb+c2pa) or the C2PA uuid box; applied in identify +
  metadata. Verified: all 535 real C2PA files carry jumb.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(doubao): gate detection on text structure to cut ~95% of false positives (#23)

Coverage alone over-fired: any textured bottom-right corner cleared the
threshold, so the detector false-positived on ~28% of arbitrary images.
The real '豆包AI生成' mark is six glyphs in one row, so detect now also
requires the text-structure signature (_glyph_structure): many connected
components, no single dominant blob, concentration in a thin horizontal
band. False positives dropped 343 -> 17 across the corpus while keeping
real-mark recall and the doubao-1.png sample. Also accept a no-op force
kwarg for remover-interface symmetry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(samsung): add Samsung Galaxy AI visible-badge remover

New samsung_engine.py removes the bottom-left sparkle + localized
'AI-generated content' badge that Galaxy AI tools stamp. Mirrors the
Doubao locate->mask->inpaint pattern but bottom-left, with a dual-polarity
top-hat mask (the badge is light-on-dark or dark-on-light). Detection gates
on a band + left-anchor signature (the Doubao CJK-component gate does not
transfer: Latin badge letters connect into few blobs). Explicit-only --
tuned on few real badges with a ~4% FP floor, so it is not used in auto.
Synthetic byte-blob fixtures (real badges are user content, not shipped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(visible): unified known-watermark registry + LaMa inpaint backend

watermark_registry.py is a single catalog of known visible marks, each
tying {usual location, in_auto flag, recovery strategy, detect adapter,
remove adapter}: gemini (reverse-alpha, exact), doubao, samsung. cmd_visible
is now registry-driven (best_auto_mark for --mark auto; mark_keys() feeds the
CLI choices) -- the per-mark _run_doubao/_run_samsung helper branches are gone.

Cross-engine confidences are not comparable, so the gemini adapter applies the
corpus-validated 0.5 sparkle threshold for auto arbitration (its engine flag is
loose and weakly fired ~0.36 on Doubao text, hijacking auto).

--backend auto|cv2|lama chooses background reconstruction for the mask-based
marks; auto = LaMa when onnxruntime is present, else cv2. For LaMa the mask is
the FILLED glyph bounding box (sparse glyph masks leave anti-aliased edges
behind). cv2 stays the zero-dependency fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: watermark registry, Samsung/FLUX/ByteDance detection, LaMa backend, trustmark gate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(doubao): exact reverse-alpha removal from captured alpha map

The Doubao '豆包AI生成' mark is a fixed semi-transparent white overlay, so
given its alpha map the original pixels are recovered exactly:
original = (wm - a*logo)/(1-a) -- no inpaint hallucination.

The alpha map + logo colour were solved from real black+gray Doubao captures
on a controlled background: on black captured = a*logo, and the black/gray pair
solves a per-pixel without assuming the logo colour (a_max~0.65, logo near-white);
the white capture cross-validates (mark vanishes to a flat fill). Bundled as
assets/doubao_alpha.png + geometry constants.

remove_watermark_reverse_alpha applies it scaled to image width; exact at the
captured width, so the registry routes doubao through it only when
reverse_alpha_available (width within the calibrated band) and the mark is
detected, falling back to mask inpaint (cv2/LaMa) otherwise. A light residual
inpaint cleans the sub-pixel rescaling error. Add captures at more resolutions
to widen exact coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(visible): reverse-alpha only -- drop inpaint removal + heuristic detection

Per the principle that we only remove/detect what we can do exactly, the
visible-mark path is now reverse-alpha only:

- Doubao detect is reverse-alpha-consistent: match the bundled alpha glyph
  silhouette against the corner via TM_CCOEFF_NORMED (DETECT_NCC_THRESHOLD 0.4)
  -- keys on the '豆包AI生成' SHAPE, not coverage/structure heuristics. FP
  7/1243 (0.6%). Removes the cv2 inpaint path + the _glyph_structure gate.
- Registry is reverse-alpha only: dropped the cv2/LaMa backend (_glyph_remove,
  _lama_box_inpaint, default_backend, --backend) and the Samsung entry. Doubao
  outside the alpha resolution band is skipped, never inpainted.
- Removed samsung_engine.py + tests + --mark samsung (no alpha map captured;
  Samsung C2PA/genAIType metadata detection in identify is unaffected).
- The universal erase --region (cv2/LaMa) is unchanged -- arbitrary-region
  inpainting stays a user-directed tool, separate from the known-mark registry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(doubao): NCC sub-pixel alignment -> reverse-alpha at any resolution

A pure width-scale of the captured alpha map is only sub-pixel-accurate at the
captured width and leaves a faint ghost elsewhere. remove_watermark_reverse_alpha
now registers the alpha glyph to the actual mark via a TM_CCOEFF_NORMED
scale+position search (_aligned_alpha_map) before inverting the blend, so the
single 2048 capture works at any resolution -- verified clean on the 1773x2364
(3:4) corpus size, the biggest coverage gap (23 files).

reverse_alpha_available is now just 'asset present' (no width band); the registry
still gates removal on detect so a clean corner is never touched. Drops the
_ALPHA_WIDTH_TOLERANCE gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(doubao): keep native recovery exact -- fixed geometry at captured width

Integer-pixel NCC alignment landed ~1px off at the captured width, degrading the
otherwise-exact native reverse-alpha (synthetic recovery error 0.94 -> 1.39).
remove_watermark_reverse_alpha now uses exact width-relative geometry within
_ALPHA_NATIVE_BAND of the captured width and the NCC search only off it -- best
of both: native back to 0.94, other resolutions still aligned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(doubao): harden alignment -- try fixed+aligned, keep least residual (56/56)

On a faint/busy-background mark the NCC alignment peak can wander a few px off
the true mark and leave a residual (2/56 real corpus files). Off the captured
width, remove_watermark_reverse_alpha now builds BOTH the fixed-geometry and the
NCC-aligned alpha map, applies each, and keeps whichever leaves the least
residual mark (re-detect confidence on the bare reverse-alpha) -- geometry wins
on faint marks, alignment on clear ones, no magic threshold. Real-file round-trip
now removes 56/56 detected Doubao clean across every corpus resolution (was 54).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* perf(doubao): skip residual inpaint at native width for exact recovery

At the captured width the fixed-geometry reverse-alpha is pixel-exact, so
inpainting over it only replaced exactly-recovered interior pixels with a
cv2 hallucination -- measured worse on a textured background (native error
vs true bg 1.6 reverse-alpha-only vs 2.6 with the old always-on
full-footprint inpaint). Native now returns the bare recovery untouched;
off-native, where NCC alignment is only sub-pixel-approximate, the footprint
inpaint stays to clean the seam. Real round-trip still 56/56 across all
corpus resolutions; negatives 0/60, Gemini unaffected.

Add test_native_returns_exact_reverse_alpha_no_inpaint as the regression
guard. Sync CLAUDE.md + README (the table cell and prose described the
pre-NCC "skipped off native / cv2-LaMa" behavior, now stale). Gitignore the
session scheduled_tasks.lock, and add the text-protection research note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 19:49:09 -07:00
Victor Kuznetsov ef6fdaeeec Detect text at native resolution (capped), fixing small-text recall on large images (#27)
The text-protection detector scaled every image to a fixed 736 px long side, so
small text on large canvases (e.g. ~16 px on 2048) was downscaled below the
detector and missed -> deformed by the SDXL pass (issue #14). Detect at the
native long side capped at 1536, never upscaled (_detection_input_size, a pure
unit-tested helper). Detection is script-agnostic (DB segments regions, not
characters), so this is language-agnostic: a new benchmark
(scripts/text_detection_benchmark.py) measures recall across Latin/Cyrillic/CJK/
Hangul/Arabic/digits x sizes x canvas -> overall hit-rate 0.91 -> 1.00, worst
cell (2048/16 px) 0.06 -> 1.00. Docs updated.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:28:30 -07:00
xchacha20-poly1305 0c7ff1874e feat(device): support xpu backend (#24)
* feat(device): support xpu backend

* Fall back to CPU seed generator when device RNG unsupported (xpu)

Some torch-xpu builds have no device-side RNG, so torch.Generator(device="xpu")
raises when --seed is used. _make_seed_generator tries the device generator and
falls back to a backend-agnostic CPU generator. Adds a fallback unit test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Victor Kuznetsov <kuznetsov.va@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:13:23 -07:00
Victor Kuznetsov 1598c499fe Record Doubao reverse-alpha finding: needs black-background capture, not more content images (#26)
Session 2026-05-29: content-image reverse-alpha distillation fails (persistent
ghost) because the mark is never observed on a dark background (median darkest
bg over glyph pixels 58/255), so alpha is unidentifiable; no dark halo (white-
logo model is right); LaMa O is a hallucination. Gemini is clean only because
its map is the watermark on pure black (alpha=capture/255). Real unlock = black
Doubao captures (requested in #13). Shipped direction stays mask + cv2/lama.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:12:58 -07:00
Victor Kuznetsov a46268f6eb Add cross-platform CI test matrix + PyPI classifiers (#25)
* Add cross-platform CI test matrix, PyPI classifiers

CI: new test.yml runs lint (ubuntu) + a test matrix (ubuntu/macos/windows
x py3.10/3.12, core+dev, GPU tests skip) on push to main and PRs, closing the
gap where only the release publish.yml ran (ubuntu, no tests). Add PyPI
classifiers (OS/Python/topic). README Tests badge, CLAUDE.md CI note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Make availability tests reflect installed deps, not assume gpu extra

The new core+dev CI matrix has no diffusers, so the invisible-engine
availability tests (asserting is_available() is True unconditionally) and the
two mocked invisible CLI tests (whose command gates on is_available before the
mock) failed. Assert availability == actual importability of torch+diffusers,
and patch the CLI availability gate so the mocked-engine tests run regardless of
the gpu extra.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 11:04:12 -07:00
Victor Kuznetsov 96b3653b9e chore(release): v0.6.12
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.6.12
2026-05-28 18:54:44 -07:00
Victor Kuznetsov 9aaa53fe32 fix(metadata): preserve upload format and quality on strip
remove_ai_metadata now writes JPEG at quality 95 with 4:4:4 (no chroma
subsampling) instead of the lossy PIL defaults (q75, 4:2:0), and preserves
WebP losslessly instead of silently rewriting it as PNG.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 18:46:26 -07:00
Victor Kuznetsov 41e4365cd4 fix(identify): explain the unknown verdict inline (#22)
A bare "unknown" verdict reads as the tool being broken. Print a one-line note
right under the verdict explaining that no locally-readable AI signal was found,
that this is not the same as clean (metadata is often stripped), and that
SynthID-class pixel watermarks have no local detector. The why was previously
only in the caveats section below.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 14:16:14 -07:00
Victor Kuznetsov 888c8c2556 chore(types): clear strict-pyright debt across src (0 errors)
Make `pyright src/` strict-clean via a hybrid: pure-logic files are fully typed
(piexif gets a local typings/ stub; PIL info-dict loops guard isinstance(key, str);
progress returns Callable[..., None]; availability checks use importlib.util.find_spec
instead of unused imports), while the irreducibly-untyped cv2/torch/diffusers boundary
files carry a documented per-file `# pyright:` relax pragma (or a ctrlregen
executionEnvironment) that disables only the unknown-type rules. Public ndarray-returning
signatures on the relaxed engines are annotated NDArray[Any] so strict consumers (cli.py)
stay clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 14:00:15 -07:00
Victor Kuznetsov f326bab189 chore(release): v0.6.11
Ships the China TC260 AIGC PNG-chunk and HuggingFace hf-job-id provenance
detectors (223cbcf). Also syncs src/__init__.__version__, which had drifted to
0.6.9 (not bumped in the 0.6.10 release).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.6.11
2026-05-28 12:43:52 -07:00
Victor Kuznetsov 223cbcf171 feat(metadata): detect China TC260 AIGC PNG chunk and HuggingFace hf-job-id
aigc_label now reads the TC260 label from a raw-JSON `AIGC` PNG tEXt chunk
(as Doubao/ByteDance write it, with no namespaced XMP marker) in addition to
the `<TC260:AIGC>` XMP block, via a shared _parse helper gated on a TC260 field
so a generic AIGC key cannot false-positive. New huggingface_job() reads the
hf-job-id PNG chunk; identify surfaces it as a medium-confidence hf_job signal
(parallel to the visible sparkle, never overriding a hard metadata verdict).
Both wired into has_ai_metadata/get_ai_metadata; the PNG save whitelist already
strips them on removal. Found by auditing 646 corpus originals: 28 AIGC and 3
hf-job files the library previously reported as Unknown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:40:17 -07:00
Victor Kuznetsov 0eec3001bb feat(invisible): protect text automatically by default (#21)
Mirror protect_faces: protect_text defaults to True in invisible_engine and
watermark_remover, so the SDXL pipeline detects text per image and switches to
Differential Diffusion only when glyphs are found. Text-free inputs fall back to
plain img2img with no differential-pipeline load, so the autonomy is free. The
CLI now exposes a single off-switch --no-protect-text instead of the positive
flag, keeping the interface minimal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.6.10
2026-05-28 12:24:09 -07:00
Victor Kuznetsov a0bf62e601 feat(invisible): preserve text/CJK via Differential Diffusion (--protect-text) (v0.6.10)
SDXL img2img regenerates every pixel, so small text and CJK glyphs deform
at the strengths that defeat SynthID (issue #21). With --protect-text a
CJK-native PP-OCRv3 detector (2.4 MB ONNX, cv2.dnn, no torch, cached on
first use) locates text regions and the pass switches to the SDXL
Differential-Diffusion community pipeline: a per-pixel change map keeps
text regions largely intact while the background is regenerated to strip
the watermark. Gated to the SDXL default model; falls back to plain
img2img with a warning when unavailable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 11:59:15 -07:00
Victor Kuznetsov 7db4e231e8 fix(deps): require transformers 5.x with stable tokenizers for SDXL load
diffusers 0.38's auto-pipeline registry imports a transformers 5.x-only
symbol, so the gpu extra needs transformers>=5. Cap tokenizers to the
stable 0.22 line so the global prerelease="allow" no longer drags in the
0.23.0rc0 whose CLIP tokenizer breaks SDXL loading.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 11:58:53 -07:00
test-user 27539c0da9 docs: tidy roadmap to open items only (drop shipped v0.6.7-0.6.9)
The Roadmap is the project TODO; shipped features (Integrity Clash, streaming-MP4
scan window, meta-box XMP blanking) no longer belong under "not yet implemented".
Removed them and kept the still-open remainder as its own item (AVIF/HEIF Exif
*item* inside the meta box). Net open TODO: SynthID v2 regression test, local
SynthID pixel detector, grow the SynthID corpus, real non-PNG C2PA fixtures,
pyright maintenance debt, meta-box Exif item, Canon/Samsung device signers,
Resemble PerTh (dead end), video pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:18:28 -07:00
test-user 5bfed00553 feat(metadata): blank AI-label XMP inside the HEIF/AVIF meta box (v0.6.9)
HEIF/AVIF store XMP as a meta-box `mime` item whose bytes live in mdat/idat, out
of reach of the top-level uuid/jumb box stripper. An AI-label XMP packet there
(TC260 AIGC, IPTC "Made with AI", IPTC 2025.1) was therefore left in place.

isobmff.blank_ai_xmp_packets locates each XMP packet by its <?xpacket begin ...
end?> delimiters and, if it carries an AI marker (_AI_LABEL_MARKERS), overwrites
it with spaces of the SAME length. Equal length means no box size or iloc offset
shifts -- the coded image stays bit-for-bit intact, the item stays structurally
valid, only the AI label content is destroyed. Plain (non-AI) XMP is left alone,
mirroring the top-level XMP-uuid content match. Wired into remove_ai_metadata's
ISOBMFF branch after strip_c2pa_boxes.

Chosen over exiftool (a non-bundled binary dep) to stay pure-Python and
droplet-compatible; over full iinf/iloc surgery to avoid offset-rewrite
corruption risk. The AI labels we target are all XMP, so this closes the
practical gap. An Exif *item* inside the meta box (rare) still needs iinf/iloc
surgery or exiftool -- documented.

4 new tests (TestMetaBoxXmpBlanking): AI packet blanked (same length, marker
gone, surrounding image bytes intact), plain XMP preserved, no-packet no-op, and
end-to-end remove_ai_metadata on a .heic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:15:48 -07:00
test-user 31f0a82906 feat(metadata): detect C2PA/AIGC/IPTC manifests after a large mdat in MP4 (v0.6.8)
Provenance detection no longer relies on a fixed first-MB read. In a streaming /
non-faststart MP4 the C2PA manifest sits AFTER a multi-megabyte mdat, beyond the
1 MB scan window, so it was missed.

- isobmff.scan_c2pa_region(path): a file-seeking top-level box walker that
  returns the payloads of uuid/jumb (provenance) boxes, seeking past mdat by
  size without reading it -- works on multi-GB files. Returns b"" for
  non-ISOBMFF or on read error. Mirrors the box-size encoding of the existing
  in-memory _iter_top_level_boxes (largesize / size==0).
- metadata.scan_head(path, size): the shared input for every C2PA/AIGC/IPTC
  byte scan -- first __TEXT	__DATA	__OBJC	others	dec	hex bytes plus, for ISOBMFF, the late provenance-box
  payloads. Behavior-neutral (f.read(size)) for non-ISOBMFF inputs.
- Routed all six metadata scan sites (has_ai_metadata, aigc_label,
  iptc_ai_system, synthid_source, exif_generator XMP, get_ai_metadata
  soft-binding) and identify's head read through scan_head.

6 new tests: late box found by scan_c2pa_region / scan_head, the fixed window
provably misses it, non-ISOBMFF -> b"", front-placed (faststart) regression.

The remaining gap stays documented: EXIF/XMP stored as items inside the meta
box (AVIF/HEIF stills) still needs meta-box surgery or exiftool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:42:29 -07:00
test-user 18160fe269 feat(identify): integrity-clash detection for contradictory provenance (v0.6.7)
Surface contradictions between independent provenance signals instead of
collapsing to a single verdict -- a strong tell of spoofed, transplanted, or
laundered metadata. Inspired by arXiv:2603.02378.

Two rules in the new _integrity_clashes helper:
- Conflicting AI-origin attributions: two or more distinct AI vendors named by
  independent generator stamps (e.g. a C2PA OpenAI manifest on an image whose
  EXIF says Make="Ideogram AI").
- Camera + AI: a camera-capture C2PA device (Pixel/Leica/Sony/Nikon/Truepic)
  coexisting with an AI-generation marker -- a genuine capture is not AI.

High-precision by design: only hard generator stamps feed it (C2PA issuer when
the source is AI, SynthID proxy, EXIF/XMP generator, IPTC AISystemUsed, xAI,
AIGC). The fuzzy visible sparkle and the open invisible watermark are excluded
-- the latter can be a by-product of our own SDXL removal pass. Vendor
normalization (_vendor_of over _AI_VENDOR_TOKENS) keeps consistent signals from
clashing (C2PA "Google (Gemini)" + SynthID-Google agree); the C2PA vendor is
read from the issuer attribution, not the resolved platform, so a camera label
like "Google Pixel" cannot mis-normalize to an AI vendor.

Surfaced as ProvenanceReport.integrity_clashes (red in the table view, included
in --json). 19 new tests; all real single-origin fixtures (chatgpt/firefly/
doubao/grok/mj) verified to produce zero clashes (false-positive guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:27:25 -07:00
dependabot[bot] 6694a79514 chore(deps): bump ultralytics in the minor-and-patch group (#20)
Bumps the minor-and-patch group with 1 update: [ultralytics](https://github.com/ultralytics/ultralytics).


Updates `ultralytics` from 8.4.55 to 8.4.56
- [Release notes](https://github.com/ultralytics/ultralytics/releases)
- [Commits](https://github.com/ultralytics/ultralytics/compare/v8.4.55...v8.4.56)

---
updated-dependencies:
- dependency-name: ultralytics
  dependency-version: 8.4.56
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-27 12:04:47 -07:00
test-user 7b47fa9f6a fix(io): Unicode-safe cv2 image IO + un-eat the [gpu] install hint (v0.6.6)
Two CLI/IO robustness bugs surfaced by issues #17 and #19.

#17 -- non-ASCII image paths (Chinese/Cyrillic/accented) failed on Windows:
cv2.imread/imwrite use the platform ANSI code-page API, so the decode came back
empty with a "can't open/read file" warning. New image_io.imread/imwrite route
through np.fromfile+cv2.imdecode / cv2.imencode+tofile (Unicode-safe, byte-
identical output, cv2.imread None-semantics preserved); all 8 cv2 read/write
call sites now go through it. Behavior-neutral on macOS/Linux (already accept
UTF-8 paths), so the fix is correct-by-construction for the Windows-only bug.

#19 (incidental) -- rich parsed the "[gpu]" in the GPU-extra install hint as a
style tag and dropped it, so the printed command was the un-installable
"pip install 'remove-ai-watermarks'". Escaped as \[gpu] at both call sites.

Tests: test_image_io.py (non-ASCII round-trip, alpha, missing/empty/garbage
semantics); test_cli.py::TestGpuHintMarkup (install hint keeps the extra).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:52:48 -07:00
test-user d847b39292 docs: lawful-use disclaimer + primary-source Legal-table corrections
README:
- surface a lawful-use / no-liability disclaimer near the top
- reword two feature bullets away from detection-evasion framing
  ("bypass AI image classifiers" -> neutral post-processing; drop the
  platform-targeting language from the "Made with AI" bullet)
- Legal table, each corrected against the primary text:
  - CA AB 2655 was struck down on Section 230 ONLY (Kohls v. Bonta,
    E.D. Cal., Aug 2025); the court did not reach the First Amendment
    (the companion AB 2839 was separately enjoined on 1A grounds)
  - COPIED Act: add the bill number (S. 1396, 119th Cong.)
  - South Korea AI Framework Act: in force 22 January 2026 (exact date)

CLAUDE.md: sync the South Korea date to 22 January 2026.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:50:20 -07:00
test-user fee0e139af docs: refresh README roadmap + metadata-strip coverage for v0.6.x
- Metadata-strip feature now lists the audio/video container coverage shipped in
  v0.6.0-v0.6.4 (MP4/MOV/M4V/M4A via ISOBMFF box walker; WebM/MP3/WAV/FLAC/OGG
  losslessly via ffmpeg).
- Roadmap updated: the AVIF/HEIF item now reflects that top-level XMP/C2PA boxes
  and non-ISOBMFF audio/video are handled, with only meta-box-item EXIF/XMP left
  (needs exiftool). Added the open backlog: multi-signal "Integrity Clash"
  reporting (arXiv:2603.02378), Canon/Samsung device signers pending a real
  sample, the streaming-MP4 scan-window limit, and Resemble PerTh audio as
  evaluated-but-infeasible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 10:42:40 -07:00
test-user e1c99b5937 fix(identify): gate C2PA issuer->generator attribution on AI source type (v0.6.5)
Prevents an unmapped C2PA device whose manifest incidentally contains a mapped
issuer substring (e.g. the "Adobe XMP" toolkit string in a Canon/Sony camera
capture) from being mislabeled as that AI generator ("Adobe Firefly").
_attribute_platform now names a specific AI-generator platform only when the
digital-source-type is trainedAlgorithmicMedia; otherwise it degrades to the
neutral "C2PA signer: X" label. Real Firefly/OpenAI/Google output carries the
AI source-type and is unaffected (verified: chatgpt-1.png->OpenAI,
firefly-1.png->Adobe Firefly still attribute). Closes the only real downside of
leaving Canon/Samsung/Bria device signers unmapped: detection and removal were
already unaffected; now the platform label degrades gracefully too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.6.5
2026-05-27 10:29:12 -07:00
test-user f9cf14c372 feat(metadata): strip container metadata from WebM/MP3/WAV/FLAC/OGG via ffmpeg (v0.6.4)
remove_ai_metadata now handles non-ISOBMFF audio/video (which the box walker
can't reach) by shelling out to ffmpeg with a lossless stream copy
(`-map_metadata -1 -map_chapters -1 -c copy`): codec data is untouched, only
container tags/chapters (ID3 / RIFF / Vorbis comments / EBML tags) are dropped.
Requires ffmpeg on PATH; raises a clear RuntimeError if absent or if ffmpeg
can't parse the input (instead of crashing in the image path).

Verified end-to-end: a real ffmpeg-made WAV/MP3 with a "Suno AI" title tag ->
tag gone, audio bytes preserved.

NOT built (evaluated, deliberate): Resemble PerTh audio *detection* --
`get_watermark()` returns a raw bit array with no presence/confidence flag, so
reliably telling watermarked from clean needs Resemble's fixed payload or a
confidence API (neither public; no real sample to calibrate). Same wall as the
SynthID pixel detector. AVIF/HEIF meta-box EXIF/XMP stripping also stays a gap
(needs exiftool, a non-installed binary). Both documented in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.6.4
2026-05-26 21:39:42 -07:00
test-user bc3228d387 feat(visible): Doubao text-mark removal + universal region eraser
Add deterministic, CPU-only removal of the visible Doubao "豆包AI生成" mark and
a position-agnostic region eraser for any other visible watermark/logo.

- doubao_engine.py: locate (geometry, scales with width) + polarity-aware
  white-top-hat glyph mask + cv2 inpaint; coverage-gated detection and a
  dense-text safety guard. No GPU, ~30ms.
- region_eraser.py + `erase` command: inpaint arbitrary --region box(es).
  Default cv2 backend (no deps); optional big-LaMa via onnxruntime (`lama`
  extra, Carve/LaMa-ONNX, model downloaded on first use, never bundled).
- cli `visible --mark auto|gemini|doubao`: auto routes by detector confidence.
- tests for both engines; seed previously-unseeded CLI image fixtures to stop
  the Doubao detector flaking on random corners.
- .gitignore: doubao_capture/{seeds,captures} scratch (alpha-map calibration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:31:51 -07:00
test-user 9f93d9c0c5 feat(identify): add Sony C2PA device attribution, verified (v0.6.3)
Adds Sony to _DEVICE_C2PA_PLATFORM, matching Sony's own `sony.sig` / `sony.cert`
C2PA assertion namespace (NOT bare "Sony", which is a common EXIF Make). Verified
against a real Sony-signed file (Sony PXW-Z300, signer "Sony Corporation") found
in the Security4Media/c2pa-video-player repo. The sample is video (MP4) -- our
ISOBMFF C2PA path detects it; Sony Alpha stills likely share the namespace.

Verified device set is now Leica, Nikon, Google Pixel, Sony, Truepic. Canon /
Samsung / Bria still have no public direct-download C2PA sample to verify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.6.3
2026-05-26 21:13:49 -07:00
test-user 64be9598f2 fix(identify): device-token-first C2PA attribution; add verified Pixel (v0.6.2)
Replaces the claim-generator-string match with a distinctive device-token scan
of the manifest bytes (_device_platform / _DEVICE_C2PA_PLATFORM), which is more
robust: it catches devices where the generator name lives under a non-standard
CBOR key (Pixel uses `claim_generator_info`, so it has no `claim_generator`).

- Adds Google Pixel, verified against a real Pixel 10 Pro C2PA file (attached to
  c2pa-rs issue #1609/#1554): cert CN "Pixel Camera", digitalSourceType
  `computationalCapture` -> capture authenticity, not AI (is_ai stays None).
- Token distinctiveness is load-bearing: bare "Truepic" matched the OpenAI
  chatgpt-1.png fixture (Truepic is a trust-chain signing authority), so the
  token is the specific "Truepic_Lens"; "Pixel Camera" (cert CN) not "Pixel".
- Verified Leica/Nikon/Truepic/Pixel attribute correctly and OpenAI/Adobe/MJ
  do not regress. Sony/Canon/Samsung/Bria stay unmapped: no public direct-
  download C2PA sample exists to verify their in-manifest string.
- Regression tests: device token beats incidental issuer mentions (Leica,
  Pixel-vs-Google).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.6.2
2026-05-26 20:43:40 -07:00
test-user dda2ee7fbb fix(identify): attribute C2PA by claim_generator, not incidental issuer tokens (v0.6.1)
Verified on real signed files that the issuer byte-scan mis-attributes
multi-entity manifests: Leica read as "Truepic" (timestamp authority in the
chain), Nikon as "Adobe Firefly" (XMP-toolkit "Adobe" + the sample's
"Adobe_MAX" name), Truepic as "Google". Platform attribution now prefers the
claim generator (what produced the asset) and falls back to the issuer scan.

- New _CLAIM_GENERATOR_PLATFORM map + _platform_from_generator; claim generator
  read for non-PNG via the now-public c2pa.cbor_text_after.
- Device tokens listed only where verified against a real C2PA file (Leica
  lc_c2pa, Nikon, Truepic Lens); Pixel/Samsung/Sony/Canon/Bria deferred until a
  real sample confirms the in-manifest string. Camera C2PA marks capture
  authenticity, so these never set is_ai.
- cbor_text_after made public (was _cbor_text_after); call sites + tests updated.
- Regression test: claim_generator beats incidental Adobe/Google/Truepic tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.6.1
2026-05-26 20:10:07 -07:00
test-user 2676325184 feat(c2pa): expand soft-binding vendor map with registry-verified algs
Adds Trufo, Overlai, MarkAny, Mentaport, LumaTrace, VerdaAI, ContentLens, ISCC
(io.iscc content code), and Adobe ICN fingerprint to C2PA_SOFT_BINDINGS, and
notes AIWatermark wraps Meta PixelSeal. All `alg` prefixes verified against the
official c2pa-org/softbinding-algorithm-list registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 18:00:16 -07:00