fix(visible): guard extract_mask against degenerate ROIs (Windows CI crash)

The always-align removal scores each placement with a residual detect(),
which on an extremely wide/short image (2048x1, test_wide_short_does_not_raise)
fed cv2 GaussianBlur a ~1-px-tall ROI and faulted natively on Windows py3.12
(access violation, non-deterministic -- one CI cell went red, a re-run passed).
The old at-native path never ran detect() on degenerate sizes. Skip the cv2
pipeline and return an empty mask when bh < 16 or bw < 16; real images always
clear the guard (the WM_* box floors are max(16,..) / max(40,..)). Same fix in
both Doubao and Jimeng. Also sync the stale Doubao module docstring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Victor Kuznetsov
2026-05-31 13:25:57 -07:00
parent 196e8ea35f
commit 7167d2bae7
3 changed files with 32 additions and 13 deletions
+23 -12
View File
@@ -4,21 +4,25 @@ Doubao (ByteDance) stamps every generated image with a visible "豆包AI生成"
(Doubao AI generated) text strip in the bottom-right corner -- the explicit AIGC
label mandated by China's TC260 standard, a near-white semi-transparent overlay.
Like the Gemini sparkle, it is a fixed overlay, so it is removed by **exact
reverse-alpha blending** against a captured alpha map (``remove_watermark_reverse_alpha``):
``original = (wm - a*logo)/(1-a)`` -- recovering the true pixels, not an inpaint
guess. The alpha map + logo colour were solved from black+gray Doubao captures
(see data/doubao_capture/ and the reverse-alpha section below) and bundled as
``assets/doubao_alpha.png``.
Like the Gemini sparkle and the Jimeng wordmark, it is a fixed overlay, so removal
starts from **reverse-alpha blending** against a captured alpha map
(``remove_watermark_reverse_alpha``): ``original = (wm - a*logo)/(1-a)``. The alpha
map is rebuilt by ``scripts/visible_alpha_solve.py`` from black/gray Doubao captures
(the careful gray-self solve; logo is pure white) and bundled as
``assets/doubao_alpha.png``. The mark re-rasterizes a few px off per image, so
removal ALWAYS NCC-aligns the template to the actual mark and then clears the
residual edges with a deliberately THIN inpaint over the glyph footprint (an
earlier under-estimated alpha + fixed-no-inpaint left a readable outline that the
detector did not flag -- see the reverse-alpha section below).
Detection (``detect``) is reverse-alpha-consistent: it matches that same alpha
glyph silhouette against the corner via normalized correlation, so it keys on
the actual "豆包AI生成" shape rather than coverage/structure heuristics.
Detection (``detect``) is shape-consistent: it matches that same alpha glyph
silhouette against the corner via normalized correlation, so it keys on the actual
"豆包AI生成" shape rather than coverage/structure heuristics.
``locate`` (geometry box, scales with image WIDTH) and ``extract_mask`` (the
candidate glyph mask the detector correlates) remain; there is no inpaint-based
removal here -- arbitrary-region inpainting lives in ``region_eraser`` / the
``erase`` command. Fast, offline, no GPU.
candidate glyph mask the detector correlates) mirror the Jimeng engine.
Arbitrary-region inpainting still lives in ``region_eraser`` / the ``erase``
command. Fast, offline, no GPU.
"""
# cv2/numpy boundary: third-party libs ship no usable element types; relax the
@@ -227,6 +231,13 @@ class DoubaoEngine:
"""
h, w = image.shape[:2]
x, y, bw, bh = loc.bbox
# A degenerate ROI (a sliver from an extremely wide/short image) cannot hold
# the mark and would feed cv2's GaussianBlur/morphology a ~1-px-tall array,
# which can fault the native code on some platforms (observed: a Windows
# access violation via the always-align removal's residual `detect`). Skip
# the cv2 pipeline and return an empty mask there.
if bh < 16 or bw < 16:
return np.zeros((h, w), np.uint8)
# Normalize the ROI to 3-channel BGR: a 2D grayscale or 4-channel BGRA
# input would otherwise break the axis=2 channel reductions below.
roi = image[y : y + bh, x : x + bw]
@@ -238,6 +238,13 @@ class JimengEngine:
"""
h, w = image.shape[:2]
x, y, bw, bh = loc.bbox
# A degenerate ROI (a sliver from an extremely wide/short image) cannot hold
# the mark and would feed cv2's GaussianBlur/morphology a ~1-px-tall array,
# which can fault the native code on some platforms (observed: a Windows
# access violation via the always-align removal's residual `detect`). Skip
# the cv2 pipeline and return an empty mask there.
if bh < 16 or bw < 16:
return np.zeros((h, w), np.uint8)
# Normalize the ROI to 3-channel BGR: a 2D grayscale or 4-channel BGRA
# input would otherwise break the axis=2 channel reductions below.
roi = image[y : y + bh, x : x + bw]