- watermark_remover: _build_qwen_kwargs now passes explicit height/width (via
_qwen_target_size, floored to /16). Without it QwenImageImg2ImgPipeline defaults to
1024x1024 and silently squishes non-square inputs, distorting the scene and garbling text.
- watermark_profiles: resolve_strength gains a `pipeline` arg + a Qwen strength ladder
(_QWEN_VENDOR_STRENGTH, Gemini 0.25), so `--pipeline qwen` gets its certified floor
automatically; retires the manual "pass --strength 0.25 for Gemini on qwen" workaround.
- fidelity_metrics: replace per-face nearest matching (collided on multi-face images when a
variant dropped a face, corrupting the identity metric) with a collision-free one-to-one
assignment (assign_faces_one_to_one). lapvar/LPIPS were always bbox-anchored and immune.
Regression-guarded by tests/test_fidelity_matching.py.
- docs: record the measured outcomes of the qwen-improvement arc. The Qwen ControlNet
face-fix is CLOSED (no permissive Qwen detail/tile ControlNet exists; canny carries edges,
not skin grain). The `--pipeline auto` router + faces+text mixed dual-pass were prototyped
and DROPPED (controlnet wins faces AND display text: abba CER 0.114 vs qwen 0.379).
Z-Image-Turbo was tried and dropped (same regeneration limits). qwen stays a manual opt-in;
controlnet is the default for everything.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- data/qwen_in/: a stable, committed set of 4 AI-generated images (OpenAI +
Google, carrying SynthID/C2PA -- same class as data/samples fixtures) used to
compare the controlnet/sdxl/qwen pipelines for fidelity. Two text-multi-script
(incl. RU/CJK), one EN poster, one face grid. README documents the set + the
ground-truth workflow. data/ is sdist-excluded so the wheel is unaffected.
- scripts/fidelity_metrics.py: switch text OCR from EasyOCR to PaddleOCR
(PP-OCRv6, higher accuracy esp. CJK, single multilingual stack); split into
`ocr` (seed a {basename: text} ground truth) and `compare` (--ground-truth for
a clean CER vs the hand-verified reference instead of noisy OCR-vs-OCR). Spatial
IoU-NMS keeps the best-scoring read per line so wrong-script models don't inject
garbage over Cyrillic/CJK.
- Oracle methodology: validate the OpenAI arm FIRST (openai.com/verify is more
accessible and the strongest Playwright/Chrome-MCP automation candidate; the
Gemini app is more manual). Recorded in CLAUDE.md + docs/synthid.md.
Ground-truth JSON (data/qwen_in/ground_truth.json) lands in a follow-up once
hand-verified.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add scripts/fidelity_metrics.py: an objective eval harness comparing
watermark-removal outputs against the original (reference) across four groups
-- OCR character error rate (EasyOCR), ArcFace identity cosine (insightface),
face texture (LPIPS + Laplacian-variance ratio), and whole-image LPIPS/SSIM/
PSNR. PEP 723 inline deps so it stays out of the package / uv.lock; metrics
self-gate (faces only where faces, text only where text).
The metrics overturned an eyeball conclusion: at EQUAL strength Qwen beats
controlnet on TEXT (OpenAI typography 0.10: OCR CER 0.25 vs 0.37) but controlnet
beats Qwen on FACES (gemini_3, 18 faces, 0.15 each: Laplacian-variance retention
0.62 vs 0.41, face LPIPS 0.09 vs 0.13 -- Qwen smooths faces MORE; ArcFace
identity ~tied). So Qwen is the better TEXT-preserving remover, not a universal
fidelity win. Correct the earlier "qwen keeps faces faithful where controlnet
plasticizes" claim in CLAUDE.md, module-internals.md, known-limitations.md, README.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>