Commit Graph

2 Commits

Author SHA1 Message Date
Victor Kuznetsov e29c156279 test(eval): fix the qwen_in pipeline-fidelity eval set + PaddleOCR ground-truth flow
- data/qwen_in/: a stable, committed set of 4 AI-generated images (OpenAI +
  Google, carrying SynthID/C2PA -- same class as data/samples fixtures) used to
  compare the controlnet/sdxl/qwen pipelines for fidelity. Two text-multi-script
  (incl. RU/CJK), one EN poster, one face grid. README documents the set + the
  ground-truth workflow. data/ is sdist-excluded so the wheel is unaffected.
- scripts/fidelity_metrics.py: switch text OCR from EasyOCR to PaddleOCR
  (PP-OCRv6, higher accuracy esp. CJK, single multilingual stack); split into
  `ocr` (seed a {basename: text} ground truth) and `compare` (--ground-truth for
  a clean CER vs the hand-verified reference instead of noisy OCR-vs-OCR). Spatial
  IoU-NMS keeps the best-scoring read per line so wrong-script models don't inject
  garbage over Cyrillic/CJK.
- Oracle methodology: validate the OpenAI arm FIRST (openai.com/verify is more
  accessible and the strongest Playwright/Chrome-MCP automation candidate; the
  Gemini app is more manual). Recorded in CLAUDE.md + docs/synthid.md.

Ground-truth JSON (data/qwen_in/ground_truth.json) lands in a follow-up once
hand-verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 14:17:04 -07:00
Victor Kuznetsov a2c33af284 feat(scripts): fidelity_metrics.py + correct the qwen-vs-controlnet claim
Add scripts/fidelity_metrics.py: an objective eval harness comparing
watermark-removal outputs against the original (reference) across four groups
-- OCR character error rate (EasyOCR), ArcFace identity cosine (insightface),
face texture (LPIPS + Laplacian-variance ratio), and whole-image LPIPS/SSIM/
PSNR. PEP 723 inline deps so it stays out of the package / uv.lock; metrics
self-gate (faces only where faces, text only where text).

The metrics overturned an eyeball conclusion: at EQUAL strength Qwen beats
controlnet on TEXT (OpenAI typography 0.10: OCR CER 0.25 vs 0.37) but controlnet
beats Qwen on FACES (gemini_3, 18 faces, 0.15 each: Laplacian-variance retention
0.62 vs 0.41, face LPIPS 0.09 vs 0.13 -- Qwen smooths faces MORE; ArcFace
identity ~tied). So Qwen is the better TEXT-preserving remover, not a universal
fidelity win. Correct the earlier "qwen keeps faces faithful where controlnet
plasticizes" claim in CLAUDE.md, module-internals.md, known-limitations.md, README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:58:22 -07:00