Files
remove-ai-watermarks/tests/test_fidelity_matching.py
T
Victor Kuznetsov d5dd24140c fix(qwen): native-geometry img2img + pipeline-aware strength; record dropped auto/mixed/Z-Image leads
- watermark_remover: _build_qwen_kwargs now passes explicit height/width (via
  _qwen_target_size, floored to /16). Without it QwenImageImg2ImgPipeline defaults to
  1024x1024 and silently squishes non-square inputs, distorting the scene and garbling text.
- watermark_profiles: resolve_strength gains a `pipeline` arg + a Qwen strength ladder
  (_QWEN_VENDOR_STRENGTH, Gemini 0.25), so `--pipeline qwen` gets its certified floor
  automatically; retires the manual "pass --strength 0.25 for Gemini on qwen" workaround.
- fidelity_metrics: replace per-face nearest matching (collided on multi-face images when a
  variant dropped a face, corrupting the identity metric) with a collision-free one-to-one
  assignment (assign_faces_one_to_one). lapvar/LPIPS were always bbox-anchored and immune.
  Regression-guarded by tests/test_fidelity_matching.py.
- docs: record the measured outcomes of the qwen-improvement arc. The Qwen ControlNet
  face-fix is CLOSED (no permissive Qwen detail/tile ControlNet exists; canny carries edges,
  not skin grain). The `--pipeline auto` router + faces+text mixed dual-pass were prototyped
  and DROPPED (controlnet wins faces AND display text: abba CER 0.114 vs qwen 0.379).
  Z-Image-Turbo was tried and dropped (same regeneration limits). qwen stays a manual opt-in;
  controlnet is the default for everything.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 21:52:56 -07:00

77 lines
3.0 KiB
Python

"""Regression test for the one-to-one face matcher in ``scripts/fidelity_metrics.py``.
The shipped per-face nearest matcher collided on multi-face images (two original faces
both picking the same variant face when regeneration dropped a face), which inflated/
corrupted the identity metric. ``assign_faces_one_to_one`` is the collision-free
replacement. The function is pure (centers + diagonals in, index map out), so it is
tested here without insightface / the heavy PEP723 env. Caught on the gemini_3 Qwen
ControlNet experiment, where the original had 18 faces but the regenerated variants had
17, producing two collisions under the old matcher.
"""
from __future__ import annotations
import importlib.util
import sys
from pathlib import Path
import pytest
_SCRIPTS = Path(__file__).resolve().parent.parent / "scripts"
def _load_assign():
# fidelity_metrics is a standalone PEP723 script, not an installed module; load it by
# path with scripts/ on sys.path so its `_plain_console` shim import resolves.
sys.path.insert(0, str(_SCRIPTS))
try:
spec = importlib.util.spec_from_file_location("fidelity_metrics", _SCRIPTS / "fidelity_metrics.py")
assert spec is not None
assert spec.loader is not None
mod = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = mod # @dataclass introspection needs the module registered
spec.loader.exec_module(mod)
except ImportError as exc: # cv2/click absent in a bare env -> skip, not fail
pytest.skip(f"fidelity_metrics import deps missing: {exc}")
finally:
sys.path.remove(str(_SCRIPTS))
return mod.assign_faces_one_to_one
def test_distinct_faces_match_nearest() -> None:
assign = _load_assign()
ref = [(0.0, 0.0), (100.0, 100.0)]
var = [(2.0, 1.0), (98.0, 102.0)]
diags = [50.0, 50.0]
assert assign(ref, var, diags) == {0: 0, 1: 1}
def test_no_collision_when_variant_drops_a_face() -> None:
# Two original faces near the SAME single variant face: the old nearest matcher mapped
# BOTH to index 0; one-to-one must give the nearer ref the match and drop the other.
assign = _load_assign()
ref = [(10.0, 10.0), (14.0, 10.0)] # both close to the lone variant
var = [(12.0, 10.0)]
diags = [50.0, 50.0]
matched = assign(ref, var, diags)
assert sorted(matched.values()) == [0] # variant 0 used at most once
assert len(matched) == 1
def test_gate_drops_implausibly_far_match() -> None:
assign = _load_assign()
ref = [(0.0, 0.0)]
var = [(1000.0, 1000.0)] # far beyond 0.6 * diag
diags = [50.0]
assert assign(ref, var, diags) == {}
def test_assignment_is_one_to_one_over_many_faces() -> None:
assign = _load_assign()
ref = [(float(i * 100), 0.0) for i in range(18)]
var = [(float(i * 100) + 3.0, 0.0) for i in range(17)] # one fewer, as in the experiment
diags = [50.0] * 18
matched = assign(ref, var, diags)
assert len(matched) == 17
assert len(set(matched.values())) == 17 # every variant used at most once