detect_watermark's shape-only NCC (spatial/gradient/var fusion) fires on ornate
or flat content (text strips, banners, hatching) that coincidentally matches the
diamond shape. The NCC is contrast-invariant, so it cannot see the defining
property of a real Gemini sparkle: a bright WHITE overlay whose core sits above
the local background.
The fusion now demotes (caps confidence to 0.30) a match that is BOTH
low-confidence (< _SPARKLE_FP_CONF 0.65) AND has a low core-ring brightness
margin (_core_ring_margin < _SPARKLE_FP_MARGIN 5). Real sparkles escape via
EITHER high confidence (white-bg sparkles score >=0.79 despite a low margin) OR
high margin (dark/mid backgrounds, incl. the #36 faint-corner case), so both
must fail to demote. The gate is monotonic -- it only removes detections, never
adds -- so it cannot regress the verified-negative corpus (already 0 FPs).
On the spaces corpus it demoted 16/495 flagged sparkles (13 no AI metadata =
content FPs; the 3 AI-meta ones were visually FPs / a near-invisible
white-on-white sparkle whose AI verdict is held by metadata), and dropped the
removal-audit failures 20 -> 15.
- _core_and_bg shared helper (core 75th-pct brightness vs background-ring median);
_estimate_alpha_gain refactored onto it, new _core_ring_margin wrapper.
- TestSparkleFalsePositiveGate: margin high/low, strong-sparkle kept (incl. on
white via high conf), blurred no-core blob demoted.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The captured sparkle alpha peaks ~0.51, but some real Gemini sparkles are
rendered more opaque. The fixed-alpha reverse blend then UNDER-subtracts and
leaves a bright residual the detector still fires on. A visible-removal audit
through the registry path on the spaces corpus showed this as a meaningful
fraction of marks -- all under-removals, not a background-brightness class
(failures and successes had the same input confidence and background luma; the
discriminator was the removal delta itself).
remove_watermark now estimates a per-image alpha gain (_estimate_alpha_gain:
effective sparkle opacity at the bright core vs the local background ring,
a_eff/a_cap, clamped [1.0, 1.94]) and scales the alpha to match before the
over-sub/blend branch. A 1.05 deadband keeps a sparkle that already matches the
capture byte-identical to the pre-fix output, so the fix is purely additive
(0 regressions on the audit set; failures dropped substantially). The over-sub
guard still runs on the scaled alpha as the safety net for an over-shoot.
- _estimate_alpha_gain + _ALPHA_GAIN_MAX/_DEADBAND/_CORE_FRAC in gemini_engine.
- TestUnderSubtractionGain asserts on footprint pixels, NOT the detector (its
NCC is degenerate on a flat synthetic bg; the real corpus removal drops the
detector ~0.80 -> ~0.27).
- scripts/visible_removal_audit.py: the detect -> remove -> re-detect audit tool
that found and validated this (operates on gitignored data/spaces only).
- CLAUDE.md + README: document the under-subtraction gain.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
detect_watermark's size-weighted global NCC search lets a larger, mediocre
match (e.g. a bright collar in a portrait) outrank a small, near-perfect
sparkle in the bottom-right corner, so a faint sparkle on a busy background
scored below threshold and the image read as clean -- the regression from
widening the search window 256px->512px between v0.7.2 and v0.8.8.
Add _corner_promote: a bottom-right-corner raw-NCC pass that overrides the
global pick when the corner holds a match with raw NCC >= 0.85 that beats it.
It only ever replaces a lower-fidelity pick (cannot weaken an existing
detection) and keeps the wider window for variant margins. The corner side is
relative-clamped (0.20 of the short side, [96, 384]) so it stays a true corner
at every scale: a fixed 256px covers ~70% of a small portrait, where a real
photo raw-matches the star at ~0.81; relative tightening drops that to ~0.69.
The 0.85 gate sits between the worst real-photo corner match (~0.78) and a
genuine faint sparkle (~0.93): zero false positives across native + downscaled
negatives, headshot rescued from below-threshold to 0.71.
Factor the shared multi-scale matchTemplate loop into _scan_scales.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On a dark/textured background (e.g. grass) the captured alpha map over-estimates
the real Gemini sparkle's effective opacity (~0.51 captured vs ~0.31 effective),
so the fixed-alpha reverse blend over-subtracts (watermarked - alpha*logo goes
negative) and drives the footprint to black -- the white sparkle turns into a
black diamond (issue #30, reported by @CoolZimo1).
remove_watermark now detects this via _reverse_alpha_oversubtracts (fraction of
footprint pixels with a negative numerator > 5%) and inpaints the small sparkle
footprint from the surrounding pixels (cv2 NS, cropped to a padded box) instead.
Behavior-neutral on the working case: a bright background over-subtracts at ~0%,
so reverse-alpha is used and the output is byte-identical to before (verified:
demo_banana 0.0 frac vs the issue-#30 grass image 0.61 frac; issue-#30 footprint
recovers to background grass with no pit, residual sparkle conf 0.25 < 0.35).
Guard is scoped to GeminiEngine: doubao/jimeng already NCC-align their alpha to
the actual mark per image, which sidesteps the fixed-alpha mismatch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds 20 tests around the new provenance path:
- identify(): local SD/ComfyUI params -> local-pipeline attribution;
visible-sparkle gating at the 0.5 threshold (mocked detector: above,
below, unavailable, opt-out); metadata verdict not downgraded by a
sparkle hit; OpenAI/SynthID caveats + dedup; ProvenanceReport is
JSON-serializable (the CLI --json path); and the honest edge where a
C2PA manifest without an AI source marker stays 'unknown'.
- CLI 'identify': help, clean PNG, AI PNG platform, valid --json,
missing file.
- gemini_engine.detect_sparkle_confidence: float in range for a real
image, None for an unreadable file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CLI with visible, invisible, all, metadata, and batch commands
- Gemini watermark removal via reverse alpha blending
- Invisible watermark removal via diffusion regeneration (SynthID, TreeRing)
- AI metadata stripping (EXIF, PNG text, C2PA)
- Face protection (YOLO/Haar) and analog humanizer
- 137 tests covering all CLI modes and core engines
- Ruff and Pyright clean