From 78e2ae65ad8f263d6dac7cb5d75c5afc558ab37f Mon Sep 17 00:00:00 2001
From: Victor Kuznetsov <kuznetsov.va@gmail.com>
Date: Mon, 22 Jun 2026 09:41:23 -0700
Subject: [PATCH] docs: neutralize local-pull path reference in doubao research
 note

Replace the `data/spaces/originals/` path with a generic "local corpus of
pristine originals" so the committed public doc carries no reference to the
local working-data pull (the data itself is gitignored). The analysis scripts'
default paths are left untouched (operational tooling, no content/provenance).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/research-doubao-distillation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/research-doubao-distillation.md b/docs/research-doubao-distillation.md
index 925072b..1932a63 100644
--- a/docs/research-doubao-distillation.md
+++ b/docs/research-doubao-distillation.md
@@ -10,7 +10,7 @@
 
 **Conclusion (historical): pure reverse-alpha distilled from content images does NOT work, and the blocker is the WRONG kind of data, not too little of it.**
 
-The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- `data/spaces/originals/` holds plenty. Curate them with `DoubaoEngine.detect` + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. **15 pixel-aligned 2048² marks** (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean `O` + weighted-LS (and per-pixel I-on-O regression) for `α` (+ logo colour) was tried end-to-end and **still leaves a persistent ghost outline.**
+The earlier framing ("need ~5-8 PRISTINE same-resolution originals") is obsolete -- a local corpus of pristine originals holds plenty. Curate them with `DoubaoEngine.detect` + an NCC filter against a clean glyph template, keeping only marks at offset ≈ (0,0): that yields e.g. **15 pixel-aligned 2048² marks** (sub-pixel drift, not the ±50 px the old lossy/mixed-res scrapes had), plus 1086x1448 / 1792x2400 clusters. With those, LaMa-clean `O` + weighted-LS (and per-pixel I-on-O regression) for `α` (+ logo colour) was tried end-to-end and **still leaves a persistent ghost outline.**
 
 Diagnosed why, empirically (cached stacks, `/tmp/doubao_distill`): (1) the mark is a clean white overlay with **no dark halo** -- over glyph pixels ~54% are brighter than the clean bg, only ~4% darker -- so the white-logo model `I=(1-α)O+α·255` is correct; (2) but content backgrounds are almost never dark *under* the mark (median darkest available bg over glyph pixels = **58/255**; only ~13% of mark pixels are ever observed on a bg < 40), so on bright backgrounds the equation is ill-conditioned and `α` is unidentifiable; (3) LaMa's `O` is a plausible **hallucination**, not the true pre-mark background, which compounds the error, and per-pixel regression on ~15 obs overfits into colour noise.