The cli refactor dropped rich from dependencies, but four scripts still did
`from rich.console import Console` / `rich.table import Table`. Their test
modules import the scripts, so a clean `uv sync --frozen` (CI: core+dev, no
rich) failed at collection with ModuleNotFoundError on macOS/Windows/Linux.
Add a shared plain-text shim `scripts/_plain_console.py` (Console/Table via
click.echo, markup stripped) and switch all four scripts to it. Verified: all
four import with rich blocked, and tests/test_synthid_corpus.py +
tests/test_synthid_pixel_probe.py pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Visible-watermark work across all three corner-mark engines plus a committed,
reproducible alpha-build pipeline (scripts/visible_alpha_solve.py) fed by committed
solid black/gray/white captures.
- jimeng: new "即梦AI" wordmark remover (reverse-alpha + thin residual inpaint,
always NCC-aligned -- the mark re-rasterizes/jitters per image). Detect via glyph
silhouette NCC (0.45 threshold; does not cross-fire with Doubao). Registered in the
visible-mark catalog; `visible --mark jimeng` / `--mark auto`.
- doubao: fix a real production defect -- the shipped remover left a READABLE
"豆包AI生成" outline on real samples while detect() returned conf 0.0 (fooled by a
thin outline), so the test passed and the "56/56 clean" claim was detector-measured,
not visual. Root cause: under-estimated alpha + fixed-geometry-no-inpaint + tight
locate box. Rebuilt alpha (careful gray-self solve), always-align, thin inpaint,
widened locate box -> readable outline becomes faint texture-level traces.
- gemini: rebuild gemini_bg_{96,48} from our own controlled captures (validated NCC
0.9998 vs the prior third-party asset); removal re-verified clean, no behaviour change.
- tests: add textured-shift regression to both engines (guards the align-on-shift path
the Doubao defect exposed; lesson: a detector-only removal test is insufficient,
assert visual residual).
- docs: CLAUDE.md, README, capture READMEs and docstrings synced; stale
"exact/pixel-exact/56-clean" claims removed.
Also includes a SynthID label-wording clarification in identify.py/cli.py
("SynthID pixel watermark" -> "SynthID watermark, inferred from C2PA metadata").
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A corpus audit surfaced China TC260 AIGC-labeled images that `identify`
missed. Three detection gaps in `aigc_label`, all fixed:
- raw-JSON `{"AIGC":{...}}` in JPEG EXIF (UserComment): brace-matched from
the scan head with `json.raw_decode`, gated on a TC260 field like the
PNG-chunk path. (Doubao-class output via that export surface.)
- XMP attribute form `TC260:AIGC="{...}"` (PicWish): folded into the
element regex as a second alternation.
- TC260 XMP packet appended after a large `IDAT`, past the 1 MB scan
window: `scan_head` now appends late PNG metadata chunks via
`_png_late_metadata`, mirroring the existing ISOBMFF late-box scan.
Adds `scripts/corpus_gap_scan.py`: runs `identify` over a corpus, writes
the per-file report CSV, and flags `unknown` files that carry a known
marker in their metadata region (the audit that found these gaps).
Scanning only the metadata region — not the whole file — avoids the
random short-token collisions inside compressed PNG/JPEG streams.
On the local corpus this lifts 3 files from `unknown` to AI (China AIGC)
and leaves zero false gap candidates. Synthetic piexif/PngInfo fixtures
cover all three forms.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The text-protection detector scaled every image to a fixed 736 px long side, so
small text on large canvases (e.g. ~16 px on 2048) was downscaled below the
detector and missed -> deformed by the SDXL pass (issue #14). Detect at the
native long side capped at 1536, never upscaled (_detection_input_size, a pure
unit-tested helper). Detection is script-agnostic (DB segments regions, not
characters), so this is language-agnostic: a new benchmark
(scripts/text_detection_benchmark.py) measures recall across Latin/Cyrillic/CJK/
Hangul/Arabic/digits x sizes x canvas -> overall hit-rate 0.91 -> 1.00, worst
cell (2048/16 px) 0.06 -> 1.00. Docs updated.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/synthid_pixel_probe.py is an experimental/diagnostic tool for the
one pixel-domain question that isn't a dead-end: on solid-color fills the
zero-mean residual IS essentially the watermark carrier. Two modes:
'consistency' (mean pairwise NCC of carriers across fills vs random
baseline) and 'removal' (does the pipeline drop the carrier toward
baseline?). Logic validated synthetically (injected carrier correlates,
random noise doesn't, simulated removal collapses it) -- no real fills or
GPU needed.
Running its metric on the corpus independently re-confirms the documented
dead-end for real content: at matched resolution SynthID positives do not
cluster apart from negatives (within-Gemini 0.07; at 1024 px pos-vs-neg
>= pos-vs-pos). An apparent 0.62 among 1254px ChatGPT positives turned out
to be near-duplicate content (5 renders of one prompt at ~0.92; a distinct
ChatGPT image scored ~0 against them), not a shared carrier. The probe is
solid-fills-only; do not use on real content.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PIL cannot open iPhone HEIC without pillow-heif, so width/height stayed
0 for those negatives. Fall back to sips -g pixelWidth/pixelHeight on
macOS when PIL fails; returns (0,0) elsewhere.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detect SynthID-bearing images via their C2PA companion: a manifest signed by a
SynthID-using vendor (Google/OpenAI) on AI-generated content implies an
invisible SynthID pixel watermark. Verified end-to-end against the vendor
oracles (openai.com/verify, Gemini "Verify with SynthID").
- metadata: synthid_source() + synthid_watermark verdict in get_ai_metadata,
surfaced as a `metadata --check` callout. Format-agnostic (PNG caBX parser +
JPEG/WebP/AVIF/HEIF/JXL binary scan).
- constants: SYNTHID_C2PA_ISSUERS {Google, OpenAI}; +opened/placed actions.
- c2pa: single CBOR-aware parser (_cbor_text_after) replaces glitchy regex
(fixes fGPT-4o claim_generator); removed duplicate _scan_png_c2pa_chunk from
metadata; shared synthid_verdict / synthid_vendors_in helpers.
- corpus: scripts/synthid_corpus.py ingest tool + data/synthid_corpus/
(manifest tracked, images gitignored) for a labeled reference set.
- tests: +38 across C2PA parser internals, extract/inject round-trip, ISOBMFF
container stripping, all IPTC AI markers, and invisible watermark strength
tiers (SynthID/StableSignature/TreeRing/StegaStamp/RingID/RivaGAN/...).
Pixel-level SynthID detection remains out of reach locally (Google's decoder is
proprietary); a from-scratch spectral pilot confirmed it does not separate real
content. See CLAUDE.md for the full evaluation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>