remove-ai-watermarks

CalvinBackup/remove-ai-watermarks

Fork 0

mirror of https://github.com/wiltodelta/remove-ai-watermarks.git synced 2026-07-05 07:57:50 +02:00

Commit Graph

Author	SHA1	Message	Date
Victor Kuznetsov	e29c156279	test(eval): fix the qwen_in pipeline-fidelity eval set + PaddleOCR ground-truth flow - data/qwen_in/: a stable, committed set of 4 AI-generated images (OpenAI + Google, carrying SynthID/C2PA -- same class as data/samples fixtures) used to compare the controlnet/sdxl/qwen pipelines for fidelity. Two text-multi-script (incl. RU/CJK), one EN poster, one face grid. README documents the set + the ground-truth workflow. data/ is sdist-excluded so the wheel is unaffected. - scripts/fidelity_metrics.py: switch text OCR from EasyOCR to PaddleOCR (PP-OCRv6, higher accuracy esp. CJK, single multilingual stack); split into `ocr` (seed a {basename: text} ground truth) and `compare` (--ground-truth for a clean CER vs the hand-verified reference instead of noisy OCR-vs-OCR). Spatial IoU-NMS keeps the best-scoring read per line so wrong-script models don't inject garbage over Cyrillic/CJK. - Oracle methodology: validate the OpenAI arm FIRST (openai.com/verify is more accessible and the strongest Playwright/Chrome-MCP automation candidate; the Gemini app is more manual). Recorded in CLAUDE.md + docs/synthid.md. Ground-truth JSON (data/qwen_in/ground_truth.json) lands in a follow-up once hand-verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-20 14:17:04 -07:00
Victor Kuznetsov	a2c33af284	feat(scripts): fidelity_metrics.py + correct the qwen-vs-controlnet claim Add scripts/fidelity_metrics.py: an objective eval harness comparing watermark-removal outputs against the original (reference) across four groups -- OCR character error rate (EasyOCR), ArcFace identity cosine (insightface), face texture (LPIPS + Laplacian-variance ratio), and whole-image LPIPS/SSIM/ PSNR. PEP 723 inline deps so it stays out of the package / uv.lock; metrics self-gate (faces only where faces, text only where text). The metrics overturned an eyeball conclusion: at EQUAL strength Qwen beats controlnet on TEXT (OpenAI typography 0.10: OCR CER 0.25 vs 0.37) but controlnet beats Qwen on FACES (gemini_3, 18 faces, 0.15 each: Laplacian-variance retention 0.62 vs 0.41, face LPIPS 0.09 vs 0.13 -- Qwen smooths faces MORE; ArcFace identity ~tied). So Qwen is the better TEXT-preserving remover, not a universal fidelity win. Correct the earlier "qwen keeps faces faithful where controlnet plasticizes" claim in CLAUDE.md, module-internals.md, known-limitations.md, README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-20 09:58:22 -07:00

Author

SHA1

Message

Date

Victor Kuznetsov

e29c156279

test(eval): fix the qwen_in pipeline-fidelity eval set + PaddleOCR ground-truth flow

- data/qwen_in/: a stable, committed set of 4 AI-generated images (OpenAI +
  Google, carrying SynthID/C2PA -- same class as data/samples fixtures) used to
  compare the controlnet/sdxl/qwen pipelines for fidelity. Two text-multi-script
  (incl. RU/CJK), one EN poster, one face grid. README documents the set + the
  ground-truth workflow. data/ is sdist-excluded so the wheel is unaffected.
- scripts/fidelity_metrics.py: switch text OCR from EasyOCR to PaddleOCR
  (PP-OCRv6, higher accuracy esp. CJK, single multilingual stack); split into
  `ocr` (seed a {basename: text} ground truth) and `compare` (--ground-truth for
  a clean CER vs the hand-verified reference instead of noisy OCR-vs-OCR). Spatial
  IoU-NMS keeps the best-scoring read per line so wrong-script models don't inject
  garbage over Cyrillic/CJK.
- Oracle methodology: validate the OpenAI arm FIRST (openai.com/verify is more
  accessible and the strongest Playwright/Chrome-MCP automation candidate; the
  Gemini app is more manual). Recorded in CLAUDE.md + docs/synthid.md.

Ground-truth JSON (data/qwen_in/ground_truth.json) lands in a follow-up once
hand-verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-20 14:17:04 -07:00

Victor Kuznetsov

a2c33af284

feat(scripts): fidelity_metrics.py + correct the qwen-vs-controlnet claim

Add scripts/fidelity_metrics.py: an objective eval harness comparing
watermark-removal outputs against the original (reference) across four groups
-- OCR character error rate (EasyOCR), ArcFace identity cosine (insightface),
face texture (LPIPS + Laplacian-variance ratio), and whole-image LPIPS/SSIM/
PSNR. PEP 723 inline deps so it stays out of the package / uv.lock; metrics
self-gate (faces only where faces, text only where text).

The metrics overturned an eyeball conclusion: at EQUAL strength Qwen beats
controlnet on TEXT (OpenAI typography 0.10: OCR CER 0.25 vs 0.37) but controlnet
beats Qwen on FACES (gemini_3, 18 faces, 0.15 each: Laplacian-variance retention
0.62 vs 0.41, face LPIPS 0.09 vs 0.13 -- Qwen smooths faces MORE; ArcFace
identity ~tied). So Qwen is the better TEXT-preserving remover, not a universal
fidelity win. Correct the earlier "qwen keeps faces faithful where controlnet
plasticizes" claim in CLAUDE.md, module-internals.md, known-limitations.md, README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-20 09:58:22 -07:00

2 Commits