Bright-background photos/renders and a tiny app icon were flagged as
AI-generated by the visible detectors. Two failure modes:
- Gemini sparkle on a bright background (snow+sky photo, white product
render) scored ~0.51. The FP gate only demoted on a low core-ring
brightness margin, which a bright background makes high. Add a gradient
floor (_SPARKLE_FP_GRAD 0.55): a real sparkle is a crisp star (grad
~0.97-1.0), a smooth luminance blob that NCC-matches the diamond is not
(the two FPs measured grad 0.105 / 0.463). The OR is a strict superset
of the old margin-only demotion, so it cannot regress dark/mid (kept by
margin) or white-bg (kept by confidence) real sparkles.
- A 48x48 geometric icon matched the Doubao/Jimeng CJK silhouette at
0.41/0.47 NCC. Purely a small-size artifact (the same icon at >=256px
collapses to ~0.06-0.10). Guard text-mark detection below a 200px short
side (_MIN_DETECT_SHORT_SIDE); real marks ship on full-resolution
renders (smallest captured sample 1086px).
Corpus re-sweep flips only OpenAI content and already-cleaned outputs,
all sub-0.5, so no provenance verdict changes. Add synthetic regression
fixtures for both modes; docs/module-internals.md updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mined from the retained corpus 2026-06-22 (open-world EXIF/PNG-text/XMP scan,
minus the registry): three AI image generators that stamp a plain generator
name and no C2PA, so identify read them as no-signal -- and under the P0#5
no-signal skip would have skipped the scrub.
- NovelAI (anime SD): PNG tEXt Software/Source/Title. exif_generator now reads
PNG text chunks (via img.info), not only EXIF/XMP.
- Reve (reve.com): EXIF Software / XMP CreatorTool. Token is the full
"reve.com", not bare "reve" (would false-fire on "forever"/"reverie").
- Aphrodite AI: EXIF Make / Software.
Detection/removal parity: NovelAI stamps an AI-shaped VALUE under a non-AI KEY
(Title/Source), which _is_ai_key alone keeps. New _is_ai_value drops a text
chunk by value-token match on removal, mirroring exif_generator -- else the
cleaned file still read as NovelAI (verified on a real corpus file).
Tests: TestExifGenerator gains NovelAI PNG-text, Reve, Reve-not-overmatched,
Aphrodite, and a NovelAI detect/remove parity regression. Docs synced
(module-internals, watermarking-landscape, CLAUDE.md).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerating pixels removes SynthID / open watermarks but degrades a real
photo, so running it on a clean image is the dominant paid score-0 cause on
no-watermark uploads. Gate invisible/all/batch on identify.has_invisible_target:
when no invisible AI signal is locally detectable and --force is unset, skip the
regeneration. Per-command semantics:
- invisible: write no output, exit EXIT_NO_INVISIBLE_SIGNAL (2)
- all: skip step 2 but keep visible-removed pixels + strip metadata, exit 0
- batch: skip the scrub; copy the input through in invisible mode
A skip never claims the image is clean (a pixel SynthID is undetectable once its
metadata proxy is gone); the message says so and routes to --force. The gate
fails safe (a detector error runs the removal).
has_invisible_target wraps identify(check_visible=False, check_invisible=True)
and returns the new ProvenanceReport.ai_from_metadata field (the confidence==high
union), so the raiw.cc worker can reuse the same gate. Gate placed before engine
construction so the skip path is cheap; shared via cli._should_skip_invisible_scrub.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- watermark_remover: _build_qwen_kwargs now passes explicit height/width (via
_qwen_target_size, floored to /16). Without it QwenImageImg2ImgPipeline defaults to
1024x1024 and silently squishes non-square inputs, distorting the scene and garbling text.
- watermark_profiles: resolve_strength gains a `pipeline` arg + a Qwen strength ladder
(_QWEN_VENDOR_STRENGTH, Gemini 0.25), so `--pipeline qwen` gets its certified floor
automatically; retires the manual "pass --strength 0.25 for Gemini on qwen" workaround.
- fidelity_metrics: replace per-face nearest matching (collided on multi-face images when a
variant dropped a face, corrupting the identity metric) with a collision-free one-to-one
assignment (assign_faces_one_to_one). lapvar/LPIPS were always bbox-anchored and immune.
Regression-guarded by tests/test_fidelity_matching.py.
- docs: record the measured outcomes of the qwen-improvement arc. The Qwen ControlNet
face-fix is CLOSED (no permissive Qwen detail/tile ControlNet exists; canny carries edges,
not skin grain). The `--pipeline auto` router + faces+text mixed dual-pass were prototyped
and DROPPED (controlnet wins faces AND display text: abba CER 0.114 vs qwen 0.379).
Z-Image-Turbo was tried and dropped (same regeneration limits). qwen stays a manual opt-in;
controlnet is the default for everything.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Update CLAUDE.md and docs/module-internals.md for: ProvenanceReport.ai_source_kind
(generated vs enhanced) and the shared GEMINI_SPARKLE_TRUST_CONF; the text-mark
over-subtraction guard; noai/tiling.feather_region_composite + the region-targeted
WatermarkRemover.remove_watermark(region=) path; the new C2PA vendor rows (Volcano
Engine Chinese legal name, ElevenLabs) and the documented TikTok/PixelBin
exclusion. Record the rejected gemini-gate-lowering experiment.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Oracle seed-repeat + floor refinement (2026-06-20, data/qwen_in):
- OpenAI floor 0.10 is SEED-ROBUST: 0.05 and 0.075 still detected; 0.10 clean on
seeds 0-4 (5/5) -> a random seed is safe.
- Gemini floor lowered 0.30 -> 0.25 (0.20 still detected, 0.25 clean on both
images). Single-seed (seed 0): the Gemini oracle rate-limits volume seed-repeat,
so pin a seed in prod rather than relying on seed-robustness there.
Re-measured fidelity at the certified floors (controlnet 0.15 vs Qwen 0.25 for
Gemini): faces still favor controlnet (ArcFace 0.546 vs 0.382, lapvar 0.62 vs
0.40); the short-CJK text case is now a TIE (gemini_1 0.037 vs 0.037 -- the earlier
Qwen 0.000 was at 0.30, not the floor). Qwen's text win holds on substantial
Latin/mixed text (OpenAI 0.385 vs 0.241 / 0.341 vs 0.290). Update watermark_profiles
comment, CLAUDE.md, module-internals, known-limitations.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The face fidelity numbers cited an equal-strength compare (both 0.15), but Qwen at
0.15 does NOT clear Gemini SynthID -- so that output is un-scrubbed and the compare
is invalid. Per the methodology rule (compare fidelity only between outputs where
SynthID is removed in BOTH), restate faces at each pipeline's scrub floor
(controlnet 0.15 / Qwen 0.30): ArcFace identity 0.546 vs 0.331, lapvar 0.62 vs 0.40,
face LPIPS 0.09 vs 0.19 -- controlnet still wins faces, conclusion unchanged. Drop
the "equal strength" framing in CLAUDE.md / module-internals / known-limitations.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add scripts/fidelity_metrics.py: an objective eval harness comparing
watermark-removal outputs against the original (reference) across four groups
-- OCR character error rate (EasyOCR), ArcFace identity cosine (insightface),
face texture (LPIPS + Laplacian-variance ratio), and whole-image LPIPS/SSIM/
PSNR. PEP 723 inline deps so it stays out of the package / uv.lock; metrics
self-gate (faces only where faces, text only where text).
The metrics overturned an eyeball conclusion: at EQUAL strength Qwen beats
controlnet on TEXT (OpenAI typography 0.10: OCR CER 0.25 vs 0.37) but controlnet
beats Qwen on FACES (gemini_3, 18 faces, 0.15 each: Laplacian-variance retention
0.62 vs 0.41, face LPIPS 0.09 vs 0.13 -- Qwen smooths faces MORE; ArcFace
identity ~tied). So Qwen is the better TEXT-preserving remover, not a universal
fidelity win. Correct the earlier "qwen keeps faces faithful where controlnet
plasticizes" claim in CLAUDE.md, module-internals.md, known-limitations.md, README.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A third diffusion pipeline alongside sdxl/controlnet: Qwen-Image (20B MMDiT,
Apache-2.0 code AND weights) img2img. The scrub still comes from the img2img
strength; Qwen preserves text (incl. CJK) and structure markedly better than
SDXL at the scrub floor, so it over-regenerates real photos far less (directly
targets the controlnet over-regeneration that degrades real uploads).
- watermark_profiles: QWEN_MODEL_ID, normalize_profile accepts "qwen".
- WatermarkRemover: _load_qwen_pipeline (bf16, loads Qwen base unless --model
overridden, clear ImportError if diffusers lacks the class), _run_qwen (no
MPS fallback -- 20B is CUDA/cloud-class), dispatch in _generate_one/preload,
pure _build_qwen_kwargs (true_cfg_scale, not guidance_scale).
- Shared _base_load_kwargs() across all three loaders (dtype + token).
- CLI --pipeline gains "qwen"; invisible_engine threads it through.
- scripts/qwen_scrub_prototype.py: standalone PEP 723 GPU experiment.
Prototype oracle floors (Modal A100-80GB, single seed, controls SynthID-positive,
PENDING seed-repeat cert): OpenAI clears at strength ~0.10, Gemini at ~0.30 (0.20
still detected), with CJK text + faces faithful where controlnet plasticizes. The
Gemini floor is higher than the shared default ladder, so pass an explicit
--strength for Gemini on this pipeline until a Qwen-specific ladder is certified.
The model-running path is CUDA-only (untestable locally); unit tests cover the
pure call-shape (_build_qwen_kwargs) and profile normalization without torch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a lossless alternative to the --max-resolution downscale for large
images that OOM on MPS/GPU: regenerate in overlapping, feather-blended
tiles at native resolution.
- noai/tiling.py: pure plan_tiles (uniform tiles, last flush to edge) +
feather_weights (strictly-positive separable taper -> partition-of-unity
blend) + run_tiled (per-tile generate callable, decoupled from the
pipeline). Unit-tested without the model.
- WatermarkRemover.remove_watermark: refactor _generate into _generate_one
+ a tiled branch that engages only when --tile is set and the long side
exceeds tile_size (ControlNet canny is rebuilt per tile).
- Thread tile/tile_size/tile_overlap through InvisibleEngine and the
invisible/all/batch CLI commands via a shared _tile_options decorator.
Verified end-to-end on the real SDXL pipeline (forced 2x2 tiling on a
1024px sample, MPS): non-degenerate output, no gross seam at tile borders.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The reverse-alpha text-mark engine (Doubao/Jimeng/Samsung) allocated
full-frame arrays where only the glyph footprint is ever read:
- _fixed_alpha_map / _aligned_alpha_map each built a full (h, w) float32
alpha map non-zero only inside the glyph box, and two were held at once
during removal (~96 MB of mostly-zeros on a 12 MP frame);
- extract_mask built a full (h, w) uint8 mask that every caller cropped to
the located box (~12 MB, rebuilt per text-mark detector on the
memory-tight identify path).
Both now return footprint-sized arrays: the alpha helpers return the
glyph-sized block plus its placement (ax, ay, gw, gh), and extract_mask
returns the box-sized mask. _apply_reverse_alpha consumes the block
directly; the residual inpaint embeds it into one full-frame uint8 mask only
at cv2.inpaint time (which needs a full-frame mask). remove_watermark_
reverse_alpha tracks the winning region alongside best_amap to place it.
Peak allocation drops from O(image*4)x2 + O(image) to O(footprint)x2 +
one gated O(image*1) uint8 mask -- a win every consumer gets, motivated by
the 512 MB raiw.cc worker that OOMs on large decodes. GPU path untouched.
Byte-identical to the old full-frame path (verified: 17 output hashes
across the three engines, inpaint/no-inpaint, detect, and the real
doubao-1.png fixture, unchanged before/after). tests/test_text_mark_memory.py
guards it by reconstructing the old full-frame path inline and asserting
equality, so the proof survives a cv2/asset bump, and pins the O(footprint)
shape so a regression to full-frame fails loudly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The free `visible` path over-subtracted a faint Gemini sparkle on a
mid-tone background into a darker-than-background brown diamond instead
of removing it (2026-06-18 prod NPS report, "the watermark was not
removed, just its color changed"). The existing over-subtraction guard
only tripped when reverse-alpha drove a footprint pixel fully negative
(the issue #30 dark-background black-pit case); on a mid-tone background
the over-subtraction darkens the core well below the background without
any pixel crossing zero, so the gate missed it and shipped the dark mark.
Add a second over-subtraction signal to `_reverse_alpha_oversubtracts`:
predict the reverse-alpha output at the bright core, (core - a*logo)/(1-a),
and route to the footprint inpaint when it lands more than
`_OVERSUB_DARK_MARGIN` (25) gray levels below the local background ring.
Calibrated wide: clean removals predict within ~12 of background
(demo_banana ~-1), the prod regression ~-40, the issue #30 dark case ~-82.
Corpus-validated on the 479 detected Gemini images: 10 switch reverse-alpha
to inpaint, all of them dark-diamond cases that improve or match; the
other 469 stay byte-identical. demo_banana stays on the reverse-alpha
path (byte-identical).
Also crop both reverse-alpha helpers to the region they actually touch,
a pure O(image) -> O(mark) win that is byte-identical to the full-frame
math (a uint8<->float32 round-trip is exact):
- `GeminiEngine._core_and_bg` converts only the footprint+ring crop to
gray, not the whole frame (~70 ms -> 0.1 ms on a 12 MP image; it runs
for both the alpha-gain estimate and the new gate). Verified identical
across 479 images; detector confidence unchanged.
- `TextMarkEngine._apply_reverse_alpha` computes the blend on the glyph
crop only (`amap` is zero outside it, so the math is a no-op there):
~275 ms -> ~2 ms per placement on a 12 MP frame, up to 2 placements per
removal. Verified identical across 142 Doubao/Jimeng placements.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 256->512 detection-search widening (v0.8) let a large, low-gradient
shape match outrank a genuine mid-size corner sparkle whose raw NCC sits
below the 0.85 corner-promote gate, so `identify` read `unknown` on Gemini
images that v0.7.2 caught (reporter osachub: scale-48 sparkle on light
bedding -- true sparkle spatial 0.775 / grad 0.960 / fusion 0.676, but the
size-weighted argmax locked onto a decoy at spatial 0.628 / grad 0.036).
detect_watermark now keeps the top-K (_SELECT_TOPK=3) size-weighted
candidates (NMS-deduped) plus the corner-promote candidate, scores each by
full fusion (spatial+gradient+variance) via the extracted _grad_var_scores
helper, and selects the highest -- the gradient term lifts the true sparkle
over the decoy. Ranking by the SIZE-WEIGHTED score (not a raw-NCC argmax)
preserves tiny-patch suppression: a raw-NCC argmax re-admitted 16-18px
content false positives (14/65 doubao + 4/11 jimeng visible images). Top-K
adds zero flips on the doubao/jimeng corpora and leaves the 495-image Gemini
set unchanged (479 detected) while recovering the reporter's image at 0.676.
- _grad_var_scores: gradient/variance scoring factored out of detect_watermark
- confidence = best_fused (drop the duplicated fusion recompute)
- tests: rename test_promotion_is_what_rescues_it ->
test_size_weighted_search_alone_traps_on_the_decoy (corner-promote is no
longer the sole rescue path); add a deterministic regression test mirroring
the real spatial/grad signature
- docs: module-internals.md detector section + CLAUDE.md mechanism map
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>