mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-04 23:47:49 +02:00
test(eval): fix the qwen_in pipeline-fidelity eval set + PaddleOCR ground-truth flow
- data/qwen_in/: a stable, committed set of 4 AI-generated images (OpenAI +
Google, carrying SynthID/C2PA -- same class as data/samples fixtures) used to
compare the controlnet/sdxl/qwen pipelines for fidelity. Two text-multi-script
(incl. RU/CJK), one EN poster, one face grid. README documents the set + the
ground-truth workflow. data/ is sdist-excluded so the wheel is unaffected.
- scripts/fidelity_metrics.py: switch text OCR from EasyOCR to PaddleOCR
(PP-OCRv6, higher accuracy esp. CJK, single multilingual stack); split into
`ocr` (seed a {basename: text} ground truth) and `compare` (--ground-truth for
a clean CER vs the hand-verified reference instead of noisy OCR-vs-OCR). Spatial
IoU-NMS keeps the best-scoring read per line so wrong-script models don't inject
garbage over Cyrillic/CJK.
- Oracle methodology: validate the OpenAI arm FIRST (openai.com/verify is more
accessible and the strongest Playwright/Chrome-MCP automation candidate; the
Gemini app is more manual). Recorded in CLAUDE.md + docs/synthid.md.
Ground-truth JSON (data/qwen_in/ground_truth.json) lands in a follow-up once
hand-verified.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -84,7 +84,7 @@ Compact list. Full measurements, incident history, and oracle-validation runs li
|
||||
- rich was dropped: the CLI + analysis scripts print plain text (`click.echo` / the `scripts/_plain_console.py` shim). `rich` is NOT a dependency — importing it breaks the core+dev CI sync; new scripts must use the shim. No Unicode glyphs / colors / progress bars in CLI output by design.
|
||||
- AVIF/HEIF/JPEG-XL metadata detection is a binary scan; C2PA removal in those containers (and MP4/MOV/M4V) is `noai/isobmff.py`; non-ISOBMFF audio/video (WebM/MP3/WAV/FLAC/OGG) strips losslessly via ffmpeg on PATH. An AI-generator token in an `Exif` meta-box *item* (bytes in `mdat`/`idat`) is now blanked **in place** by `isobmff.blank_ai_exif_tokens` (same-length space overwrite, piexif-validated so a coincidental II/MM run in pixels is ignored — no `iinf`/`iloc` surgery, mirrors `blank_ai_xmp_packets`); it scrubs the AI-token value only, leaving camera/editor EXIF intact. Still NOT built: Resemble PerTh audio detection (no presence/confidence flag exists).
|
||||
- **SynthID technical reference: `docs/synthid.md`** — primary-source-cited doc covering mechanism (post-hoc encoder/decoder pair, 136-bit payload at 512x512, pixel-space, model weights NOT modified), robustness numbers (arXiv:2510.09263: ~99.98% TPR@0.1%FPR across 30 transforms including JPEG/crop/resize/color/noise), removal attacks and forensic detectability (arXiv:2605.09203: all 6 attacks detectable at >98% TPR@1%FPR), detectability limits (no public decoder, metadata-proxy only), oracle scope, and adoption landscape. Read that doc first before adding notes here.
|
||||
- **SynthID detection is metadata-only.** No local pixel detector is possible by design (Google's decoder is proprietary, trusted-testers only); we read the C2PA companion proxy, which goes quiet once metadata is stripped — a quiet proxy is not proof the pixel watermark is gone. The Gemini app "Verify with SynthID" is the ONLY valid SynthID oracle; `openai.com/verify` is scoped to OpenAI provenance and each vendor's oracle detects only its own content. SynthID survives JPEG re-encode, so GitHub issue attachments remain valid pixel-watermark test subjects. Every spectral/phase detection approach evaluated (reverse-SynthID, our own probes) works only on controlled solid fills, never on real content.
|
||||
- **SynthID detection is metadata-only.** No local pixel detector is possible by design (Google's decoder is proprietary, trusted-testers only); we read the C2PA companion proxy, which goes quiet once metadata is stripped — a quiet proxy is not proof the pixel watermark is gone. Each vendor has its OWN oracle and it detects only that vendor's content: the Gemini app "Verify with SynthID" for Google, `openai.com/verify` for OpenAI. **Validate the OpenAI arm FIRST** — `openai.com/verify` is more accessible (fewer per-check restrictions) and the strongest automation candidate (Playwright / Chrome MCP); the Gemini flow is more manual. Ordering/throughput choice, not a substitution (see `docs/synthid.md`). SynthID survives JPEG re-encode, so GitHub issue attachments remain valid pixel-watermark test subjects. Every spectral/phase detection approach evaluated (reverse-SynthID, our own probes) works only on controlled solid fills, never on real content.
|
||||
- **External AI-vs-real classifier models are out of scope** (decided 2026-05-24): per-generator, degrade off-distribution, and our own light SDXL pass would likely defeat them. Detection stays local + signal-based.
|
||||
- **Default strength is VENDOR-ADAPTIVE, one ladder for BOTH pipelines** (since 2026-06-09): `resolve_strength(strength, vendor)` picks OpenAI **0.20** / Gemini **0.30** / unknown **0.30** when `--strength` is unset; explicit `--strength` always wins. Removal at low strength is content x pipeline dependent, and near-threshold removal is SEED-NON-DETERMINISTIC — pick a strength with margin and oracle-revalidate per content type. Certified controlnet floors (Modal cert 2026-06-04): OpenAI 0.20 (resolution-independent), Gemini 0.30 (only <= 1536px; native large Gemini needs ~0.35+ or a cap).
|
||||
- **`controlnet` is the default pipeline**; `--pipeline sdxl` is the lighter opt-down. Neither pipeline clears all content at low strength (photoreal survives controlnet, flat graphics survive sdxl — the lever is higher strength). A removal-priority caller MUST oracle-validate strength across content types; prod recipe: controlnet + per-vendor floor + FIXED seed. Forensic-stealth caveat (arXiv:2605.09203): defeating the SynthID verifier is NOT forensic invisibility — removal-processed images are flaggable at >98% TPR@1%FPR.
|
||||
|
||||
@@ -0,0 +1,35 @@
|
||||
# qwen_in — pipeline-fidelity eval set
|
||||
|
||||
A small, **stable** set of AI-generated images used to compare the diffusion
|
||||
removal pipelines (`controlnet` / `sdxl` / `qwen`) for fidelity with
|
||||
`scripts/fidelity_metrics.py`. Fixing the set in the repo keeps comparisons
|
||||
reproducible across runs and pipelines.
|
||||
|
||||
All four are AI-generated test content (they carry SynthID + C2PA from their
|
||||
generator — verify with `remove-ai-watermarks identify`), same class as the
|
||||
`data/samples/` fixtures. No real-person photos.
|
||||
|
||||
| file | vendor (SynthID) | content | exercises |
|
||||
|---|---|---|---|
|
||||
| `openai_1_original.png` | OpenAI | typography sheet (EN + RU + ZH) | text (multi-script) |
|
||||
| `openai_2_original.png` | OpenAI | Raiw.cc poster | text (EN, small) |
|
||||
| `gemini_1_original.png` | Google | landscape + Chinese sign | text (CJK) |
|
||||
| `gemini_3_original.png` | Google | 3x3 portrait grid | faces (identity / skin texture) |
|
||||
|
||||
## Text ground truth
|
||||
|
||||
`ground_truth.json` (`{basename: text}`) is the **hand-verified** OCR of the
|
||||
text-bearing originals, seeded by `fidelity_metrics.py ocr` and corrected by
|
||||
hand (PaddleOCR mis-reads stylized Cyrillic in particular). It is the reference
|
||||
for the text CER metric — much cleaner than OCR-vs-OCR. Regenerate the seed with:
|
||||
|
||||
uv run scripts/fidelity_metrics.py ocr data/qwen_in/openai_1_original.png \
|
||||
data/qwen_in/openai_2_original.png data/qwen_in/gemini_1_original.png \
|
||||
--langs en,ru,ch --out data/qwen_in/ground_truth.json
|
||||
# then re-verify by hand before trusting it.
|
||||
|
||||
## Compare
|
||||
|
||||
uv run scripts/fidelity_metrics.py compare \
|
||||
--original data/qwen_in/gemini_3_original.png \
|
||||
--variant controlnet=<out>.png --variant qwen=<out>.png --ocr-langs ""
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 8.2 MiB |
Binary file not shown.
|
After Width: | Height: | Size: 9.3 MiB |
Binary file not shown.
|
After Width: | Height: | Size: 2.3 MiB |
Binary file not shown.
|
After Width: | Height: | Size: 1.4 MiB |
@@ -181,6 +181,15 @@ pos original plus its minimum-clearing cleaned output (manifest `verified_via` =
|
||||
was oracle-verified but is not committed (third-party content stays out of the
|
||||
public corpus).
|
||||
|
||||
**Oracle validation order: start with OpenAI.** When validating removal across
|
||||
vendors, run the OpenAI arm first. `openai.com/verify` is more accessible than the
|
||||
Gemini app -- fewer per-check restrictions, so it gives the fastest signal and is
|
||||
the strongest candidate for automation (Playwright / Chrome MCP driving
|
||||
`openai.com/verify`); the Gemini "Verify with SynthID" flow is more manual. This is
|
||||
an ordering/throughput choice, not a substitution: each oracle only reads its own
|
||||
vendor's SynthID (`openai.com/verify` is OpenAI-scoped), so Google content still
|
||||
needs the Gemini app.
|
||||
|
||||
| Vendor | Images | Resolution(s) | Pipeline | Removed at |
|
||||
|--------|--------|---------------|----------|------------|
|
||||
| OpenAI (gpt-image) | n=4 (3 archived + 1 external-only) | 1024x1536 .. 1600x1600 | native | **0.05** |
|
||||
|
||||
+149
-72
@@ -9,7 +9,8 @@
|
||||
# "rapidfuzz",
|
||||
# "torch",
|
||||
# "lpips",
|
||||
# "easyocr",
|
||||
# "paddleocr",
|
||||
# "paddlepaddle",
|
||||
# "insightface",
|
||||
# "onnxruntime",
|
||||
# ]
|
||||
@@ -22,29 +23,38 @@ preserved -- so "closer to the original" is the right axis here (between two
|
||||
equally-scrubbed outputs, the one that deviates less from the original wins).
|
||||
|
||||
It is a standalone eval tool, NOT part of the package: PEP 723 inline deps let
|
||||
``uv run`` build a throwaway env so the heavy models (EasyOCR, insightface,
|
||||
``uv run`` build a throwaway env so the heavy models (PaddleOCR, insightface,
|
||||
LPIPS) never touch uv.lock or the shipped library. Metrics self-gate: face
|
||||
metrics run only where faces are detected, text metrics only where text is.
|
||||
|
||||
Four metric groups (all reference = original):
|
||||
1. Text -- EasyOCR character error rate (CER) of each variant vs the original's
|
||||
OCR string. Lower = text better preserved. OCR is noisy, so treat it
|
||||
as a RELATIVE comparison (every variant scored against the same ref).
|
||||
2. Face identity -- insightface (buffalo_l) ArcFace cosine similarity, original
|
||||
face vs the geometrically-matched variant face. Higher = identity kept.
|
||||
3. Face texture -- LPIPS + Laplacian-variance ratio (variant/original) on face
|
||||
crops. Catches "plastication" (lost high-frequency skin detail):
|
||||
lapvar ratio < 1 = smoother than the original.
|
||||
4. Whole image -- LPIPS / SSIM / PSNR vs the original (context: background too).
|
||||
Two subcommands:
|
||||
|
||||
ocr -- OCR images (PaddleOCR PP-OCRv6) into a JSON {basename: text} file.
|
||||
Run this on the ORIGINALS, hand-verify/correct the file, and it
|
||||
becomes the ground truth for ``compare --ground-truth`` -- the clean
|
||||
way to score text, since OCR-vs-OCR is doubly noisy (errors on both
|
||||
images + reading-order differences inflate CER even on identical text).
|
||||
|
||||
compare -- Score each VARIANT against the ORIGINAL across four groups:
|
||||
1. Text -- character error rate (CER) of the variant's OCR vs the
|
||||
verified ground truth (or the original's OCR if no --ground-truth).
|
||||
2. Face identity -- insightface (buffalo_l) ArcFace cosine similarity.
|
||||
3. Face texture -- LPIPS + Laplacian-variance ratio on face crops
|
||||
(catches "plastication": ratio < 1 = smoother than the original).
|
||||
4. Whole image -- LPIPS / SSIM / PSNR vs the original.
|
||||
|
||||
Usage:
|
||||
uv run scripts/fidelity_metrics.py --original O.png \
|
||||
--variant controlnet=C.png --variant qwen=Q.png --ocr-langs en,ru,ch_sim
|
||||
uv run scripts/fidelity_metrics.py ocr O1.png O2.png --langs en,ru,ch --out gt.json
|
||||
# (edit gt.json by hand to fix any OCR slips, then:)
|
||||
uv run scripts/fidelity_metrics.py compare --original O1.png \
|
||||
--variant controlnet=C.png --variant qwen=Q.png \
|
||||
--ocr-langs en,ru,ch --ground-truth gt.json
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import json
|
||||
import unicodedata
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
@@ -54,8 +64,6 @@ import cv2
|
||||
import numpy as np
|
||||
from _plain_console import Console, Table
|
||||
|
||||
logging.basicConfig(level=logging.WARNING, format="%(message)s")
|
||||
log = logging.getLogger(__name__)
|
||||
console = Console()
|
||||
|
||||
|
||||
@@ -76,45 +84,90 @@ def _match_size(variant: np.ndarray, ref: np.ndarray) -> np.ndarray:
|
||||
return variant
|
||||
|
||||
|
||||
# ── text: OCR CER ────────────────────────────────────────────────────
|
||||
|
||||
# EasyOCR rejects some language combos in one Reader, so group into compatible
|
||||
# readers and union the detections. Cyrillic and Chinese cannot share a reader.
|
||||
_OCR_GROUPS = {
|
||||
"en": ["en"],
|
||||
"ru": ["ru", "en"],
|
||||
"ch_sim": ["ch_sim", "en"],
|
||||
}
|
||||
def _norm(text: str) -> str:
|
||||
"""Normalize for CER: NFC + drop ALL whitespace (segmentation-order agnostic)."""
|
||||
return "".join(unicodedata.normalize("NFC", text).split())
|
||||
|
||||
|
||||
def _ocr_string(readers: list, bgr: np.ndarray) -> str:
|
||||
"""Union all readers' detections into one position-sorted, whitespace-free string."""
|
||||
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
|
||||
dets: list[tuple[float, float, str]] = []
|
||||
for reader in readers:
|
||||
for box, text, conf in reader.readtext(rgb):
|
||||
if conf < 0.3 or not text.strip():
|
||||
continue
|
||||
ys = [p[1] for p in box]
|
||||
xs = [p[0] for p in box]
|
||||
dets.append((min(ys), min(xs), text.strip()))
|
||||
# Sort top-to-bottom, then left-to-right (coarse reading order).
|
||||
dets.sort(key=lambda d: (round(d[0] / 20.0), d[1]))
|
||||
return "".join(t for _, _, t in dets).replace(" ", "")
|
||||
# ── text: PaddleOCR (PP-OCRv6) ───────────────────────────────────────
|
||||
|
||||
# Our lang codes -> PaddleOCR lang. The 'ch' model also reads Latin; 'ru' reads
|
||||
# Cyrillic + Latin. Multiple langs in one image -> run each model, union detections.
|
||||
_PADDLE_LANG = {"en": "en", "ru": "ru", "ch": "ch", "ch_sim": "ch", "latin": "latin"}
|
||||
_paddle_cache: dict[str, Any] = {}
|
||||
|
||||
|
||||
def _build_ocr_readers(langs: list[str]) -> list:
|
||||
import easyocr
|
||||
def _paddle(lang: str) -> Any:
|
||||
if lang not in _paddle_cache:
|
||||
from paddleocr import PaddleOCR
|
||||
|
||||
seen: set[tuple[str, ...]] = set()
|
||||
readers = []
|
||||
_paddle_cache[lang] = PaddleOCR(
|
||||
lang=lang,
|
||||
use_doc_orientation_classify=False,
|
||||
use_doc_unwarping=False,
|
||||
use_textline_orientation=False,
|
||||
)
|
||||
return _paddle_cache[lang]
|
||||
|
||||
|
||||
def _box_xyxy(box: Any) -> tuple[float, float, float, float]:
|
||||
"""Axis-aligned (x1, y1, x2, y2) of a PaddleOCR rec box ([x1,y1,x2,y2]) or poly (4x2)."""
|
||||
arr = np.asarray(box, dtype=np.float32).reshape(-1)
|
||||
if arr.size == 4:
|
||||
return float(arr[0]), float(arr[1]), float(arr[2]), float(arr[3])
|
||||
pts = arr.reshape(-1, 2)
|
||||
return float(pts[:, 0].min()), float(pts[:, 1].min()), float(pts[:, 0].max()), float(pts[:, 1].max())
|
||||
|
||||
|
||||
def _iou(a: tuple[float, float, float, float], b: tuple[float, float, float, float]) -> float:
|
||||
ix1, iy1 = max(a[0], b[0]), max(a[1], b[1])
|
||||
ix2, iy2 = min(a[2], b[2]), min(a[3], b[3])
|
||||
iw, ih = max(0.0, ix2 - ix1), max(0.0, iy2 - iy1)
|
||||
inter = iw * ih
|
||||
if inter <= 0:
|
||||
return 0.0
|
||||
area_a = (a[2] - a[0]) * (a[3] - a[1])
|
||||
area_b = (b[2] - b[0]) * (b[3] - b[1])
|
||||
return inter / (area_a + area_b - inter + 1e-9)
|
||||
|
||||
|
||||
def _ocr_lines(bgr: np.ndarray, langs: list[str], min_score: float = 0.5) -> list[str]:
|
||||
"""Detected text lines in reading order, unioned across lang models with spatial NMS.
|
||||
|
||||
Several language models over one image re-detect the same lines -- and crucially the
|
||||
WRONG-script models read e.g. Cyrillic as confident Latin gibberish. So instead of a
|
||||
naive union, keep the HIGHEST-score detection per physical location (greedy IoU NMS):
|
||||
the model that actually fits a line wins it (the 'ru' model takes the Cyrillic, 'ch'
|
||||
the CJK, 'en' the Latin), and the cross-script garbage is dropped.
|
||||
"""
|
||||
raw: list[tuple[float, tuple[float, float, float, float], str]] = []
|
||||
for lang in langs:
|
||||
group = tuple(_OCR_GROUPS.get(lang, [lang]))
|
||||
if group in seen:
|
||||
plang = _PADDLE_LANG.get(lang, lang)
|
||||
for page in _paddle(plang).predict(bgr):
|
||||
texts = page.get("rec_texts", [])
|
||||
scores = page.get("rec_scores", [])
|
||||
boxes = page.get("rec_boxes", None)
|
||||
if boxes is None or len(boxes) == 0:
|
||||
boxes = page.get("rec_polys", [])
|
||||
for text, score, box in zip(texts, scores, boxes, strict=False):
|
||||
if score < min_score or not text.strip():
|
||||
continue
|
||||
raw.append((float(score), _box_xyxy(box), text.strip()))
|
||||
|
||||
raw.sort(key=lambda d: d[0], reverse=True)
|
||||
kept: list[tuple[tuple[float, float, float, float], str]] = []
|
||||
for _score, box, text in raw:
|
||||
if any(_iou(box, kbox) > 0.3 for kbox, _ in kept):
|
||||
continue
|
||||
seen.add(group)
|
||||
readers.append(easyocr.Reader(list(group), gpu=False, verbose=False))
|
||||
return readers
|
||||
kept.append((box, text))
|
||||
kept.sort(key=lambda d: (round(d[0][1] / 20.0), d[0][0])) # reading order: y then x
|
||||
return [t for _, t in kept]
|
||||
|
||||
|
||||
def _cer(ref: str, hyp: str) -> float:
|
||||
from rapidfuzz.distance import Levenshtein
|
||||
|
||||
return Levenshtein.normalized_distance(_norm(ref), _norm(hyp))
|
||||
|
||||
|
||||
# ── face: detection + ArcFace + texture ──────────────────────────────
|
||||
@@ -183,12 +236,10 @@ def _ssim_psnr(a_bgr: np.ndarray, b_bgr: np.ndarray) -> tuple[float, float]:
|
||||
|
||||
a = cv2.cvtColor(a_bgr, cv2.COLOR_BGR2GRAY)
|
||||
b = cv2.cvtColor(b_bgr, cv2.COLOR_BGR2GRAY)
|
||||
ssim = float(structural_similarity(a, b))
|
||||
psnr = float(peak_signal_noise_ratio(a, b))
|
||||
return ssim, psnr
|
||||
return float(structural_similarity(a, b)), float(peak_signal_noise_ratio(a, b))
|
||||
|
||||
|
||||
# ── main ─────────────────────────────────────────────────────────────
|
||||
# ── reporting ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _mean(xs: list[float]) -> float | None:
|
||||
@@ -199,18 +250,42 @@ def _fmt(v: float | None, nd: int = 3) -> str:
|
||||
return "-" if v is None else f"{v:.{nd}f}"
|
||||
|
||||
|
||||
@click.command()
|
||||
# ── CLI ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@click.group()
|
||||
def cli() -> None:
|
||||
"""Objective fidelity metrics for watermark-removal outputs."""
|
||||
|
||||
|
||||
@cli.command("ocr")
|
||||
@click.argument("images", nargs=-1, required=True, type=click.Path(exists=True))
|
||||
@click.option("--langs", default="en", help="Comma list of OCR langs (en,ru,ch).")
|
||||
@click.option("--out", type=click.Path(), default=None, help="Write {basename: text} JSON here (for ground truth).")
|
||||
def ocr_cmd(images: tuple[str, ...], langs: str, out: str | None) -> None:
|
||||
"""OCR images into a ground-truth seed -- hand-verify the result before using it."""
|
||||
lang_list = [x.strip() for x in langs.split(",") if x.strip()]
|
||||
result: dict[str, str] = {}
|
||||
for path in images:
|
||||
lines = _ocr_lines(_load_bgr(path), lang_list)
|
||||
text = "\n".join(lines)
|
||||
result[Path(path).name] = text
|
||||
console.print(f"\n=== {Path(path).name} ===")
|
||||
console.print(text or "(no text detected)")
|
||||
if out:
|
||||
Path(out).write_text(json.dumps(result, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||
console.print(f"\n Wrote {out} -- verify/correct it by hand, then pass it to `compare --ground-truth`.")
|
||||
|
||||
|
||||
@cli.command("compare")
|
||||
@click.option("--original", required=True, type=click.Path(exists=True), help="Reference (unprocessed) image.")
|
||||
@click.option(
|
||||
"--variant",
|
||||
"variants",
|
||||
multiple=True,
|
||||
required=True,
|
||||
help="LABEL=PATH of a cleaned output (repeatable).",
|
||||
"--variant", "variants", multiple=True, required=True, help="LABEL=PATH of a cleaned output (repeatable)."
|
||||
)
|
||||
@click.option("--ocr-langs", default="en", help="Comma list of EasyOCR langs (en,ru,ch_sim). Empty = skip text.")
|
||||
@click.option("--ocr-langs", default="en", help="Comma list of OCR langs (en,ru,ch). Empty = skip text.")
|
||||
@click.option("--ground-truth", type=click.Path(exists=True), default=None, help="Verified {basename: text} JSON.")
|
||||
@click.option("--no-faces", is_flag=True, help="Skip face metrics.")
|
||||
def main(original: str, variants: tuple[str, ...], ocr_langs: str, no_faces: bool) -> None:
|
||||
def compare(original: str, variants: tuple[str, ...], ocr_langs: str, ground_truth: str | None, no_faces: bool) -> None:
|
||||
"""Score each VARIANT against ORIGINAL across the four fidelity groups."""
|
||||
ref = _load_bgr(original)
|
||||
parsed: list[tuple[str, np.ndarray]] = []
|
||||
@@ -226,17 +301,19 @@ def main(original: str, variants: tuple[str, ...], ocr_langs: str, no_faces: boo
|
||||
# ── text ──
|
||||
ocr_cer: dict[str, float | None] = {label: None for label, _ in parsed}
|
||||
if langs:
|
||||
console.print(f" OCR ({','.join(langs)})...")
|
||||
from rapidfuzz.distance import Levenshtein
|
||||
|
||||
readers = _build_ocr_readers(langs)
|
||||
ref_text = _ocr_string(readers, ref)
|
||||
if ref_text:
|
||||
for label, img in parsed:
|
||||
hyp = _ocr_string(readers, img)
|
||||
ocr_cer[label] = Levenshtein.normalized_distance(ref_text, hyp)
|
||||
ref_text: str | None = None
|
||||
if ground_truth:
|
||||
gt = json.loads(Path(ground_truth).read_text(encoding="utf-8"))
|
||||
ref_text = gt.get(Path(original).name)
|
||||
if ref_text is None:
|
||||
console.print(f" (no ground-truth entry for {Path(original).name}; skipping text)")
|
||||
else:
|
||||
console.print(" (no text detected in the original; skipping text metric)")
|
||||
console.print(f" OCR original ({','.join(langs)})...")
|
||||
ref_text = "\n".join(_ocr_lines(ref, langs))
|
||||
if ref_text:
|
||||
console.print(f" OCR variants ({','.join(langs)})...")
|
||||
for label, img in parsed:
|
||||
ocr_cer[label] = _cer(ref_text, "\n".join(_ocr_lines(img, langs)))
|
||||
|
||||
# ── faces ──
|
||||
face_stats: dict[str, FaceStats] = {label: FaceStats() for label, _ in parsed}
|
||||
@@ -300,4 +377,4 @@ def main(original: str, variants: tuple[str, ...], ocr_langs: str, no_faces: boo
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
cli()
|
||||
|
||||
Reference in New Issue
Block a user