mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-04 23:47:49 +02:00
docs: neutralize local-pull path reference in doubao research note
Replace the `data/spaces/originals/` path with a generic "local corpus of pristine originals" so the committed public doc carries no reference to the local working-data pull (the data itself is gitignored). The analysis scripts' default paths are left untouched (operational tooling, no content/provenance). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -10,7 +10,7 @@
|
||||
|
||||
**Conclusion (historical): pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it.**
|
||||
|
||||
The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- `data/spaces/originals/` holds plenty. Curate them with `DoubaoEngine.detect` + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. **15 pixel-aligned 2048² marks** (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean `O` + weighted-LS (and per-pixel I-on-O regression) for `α` (+ logo colour) was tried end-to-end and **still leaves a persistent ghost outline.**
|
||||
The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- a local corpus of pristine originals holds plenty. Curate them with `DoubaoEngine.detect` + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. **15 pixel-aligned 2048² marks** (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean `O` + weighted-LS (and per-pixel I-on-O regression) for `α` (+ logo colour) was tried end-to-end and **still leaves a persistent ghost outline.**
|
||||
|
||||
Diagnosed why, empirically (cached stacks, `/tmp/doubao_distill`): (1) the mark is a clean white overlay with **no dark halo** -- over glyph pixels ~54% are brighter than the clean bg, only ~4% darker -- so the white-logo model `I=(1-α)O+α·255` is correct; (2) but content backgrounds are almost never dark *under* the mark (median darkest available bg over glyph pixels = **58/255**; only ~13% of mark pixels are ever observed on a bg < 40), so on bright backgrounds the equation is ill-conditioned and `α` is unidentifiable; (3) LaMa's `O` is a plausible **hallucination**, not the true pre-mark background, which compounds the error, and per-pixel regression on ~15 obs overfits into colour noise.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user