fix(visible): guard extract_mask against degenerate ROIs (Windows CI crash)

The always-align removal scores each placement with a residual detect(), which on an extremely wide/short image (2048x1, test_wide_short_does_not_raise) fed cv2 GaussianBlur a ~1-px-tall ROI and faulted natively on Windows py3.12 (access violation, non-deterministic -- one CI cell went red, a re-run passed). The old at-native path never ran detect() on degenerate sizes. Skip the cv2 pipeline and return an empty mask when bh < 16 or bw < 16; real images always clear the guard (the WM_* box floors are max(16,..) / max(40,..)). Same fix in both Doubao and Jimeng. Also sync the stale Doubao module docstring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-23 08:00:51 +02:00 · 2026-05-31 13:25:57 -07:00
parent 196e8ea35f
commit 7167d2bae7
3 changed files with 32 additions and 13 deletions
@@ -4,21 +4,25 @@ Doubao (ByteDance) stamps every generated image with a visible "豆包AI生成"
 (Doubao AI generated) text strip in the bottom-right corner -- the explicit AIGC
 label mandated by China's TC260 standard, a near-white semi-transparent overlay.

-Like the Gemini sparkle, it is a fixed overlay, so it is removed by **exact
-reverse-alpha blending** against a captured alpha map (``remove_watermark_reverse_alpha``):
-``original = (wm - a*logo)/(1-a)`` -- recovering the true pixels, not an inpaint
-guess. The alpha map + logo colour were solved from black+gray Doubao captures
-(see data/doubao_capture/ and the reverse-alpha section below) and bundled as
-``assets/doubao_alpha.png``.
+Like the Gemini sparkle and the Jimeng wordmark, it is a fixed overlay, so removal
+starts from **reverse-alpha blending** against a captured alpha map
+(``remove_watermark_reverse_alpha``): ``original = (wm - a*logo)/(1-a)``. The alpha
+map is rebuilt by ``scripts/visible_alpha_solve.py`` from black/gray Doubao captures
+(the careful gray-self solve; logo is pure white) and bundled as
+``assets/doubao_alpha.png``. The mark re-rasterizes a few px off per image, so
+removal ALWAYS NCC-aligns the template to the actual mark and then clears the
+residual edges with a deliberately THIN inpaint over the glyph footprint (an
+earlier under-estimated alpha + fixed-no-inpaint left a readable outline that the
+detector did not flag -- see the reverse-alpha section below).

-Detection (``detect``) is reverse-alpha-consistent: it matches that same alpha
-glyph silhouette against the corner via normalized correlation, so it keys on
-the actual "豆包AI生成" shape rather than coverage/structure heuristics.
+Detection (``detect``) is shape-consistent: it matches that same alpha glyph
+silhouette against the corner via normalized correlation, so it keys on the actual
+"豆包AI生成" shape rather than coverage/structure heuristics.

 ``locate`` (geometry box, scales with image WIDTH) and ``extract_mask`` (the
-candidate glyph mask the detector correlates) remain; there is no inpaint-based
-removal here -- arbitrary-region inpainting lives in ``region_eraser`` / the
-``erase`` command. Fast, offline, no GPU.
+candidate glyph mask the detector correlates) mirror the Jimeng engine.
+Arbitrary-region inpainting still lives in ``region_eraser`` / the ``erase``
+command. Fast, offline, no GPU.
 """

 # cv2/numpy boundary: third-party libs ship no usable element types; relax the
@@ -227,6 +231,13 @@ class DoubaoEngine:
        """
        h, w = image.shape[:2]
        x, y, bw, bh = loc.bbox
+        # A degenerate ROI (a sliver from an extremely wide/short image) cannot hold
+        # the mark and would feed cv2's GaussianBlur/morphology a ~1-px-tall array,
+        # which can fault the native code on some platforms (observed: a Windows
+        # access violation via the always-align removal's residual `detect`). Skip
+        # the cv2 pipeline and return an empty mask there.
+        if bh < 16 or bw < 16:
+            return np.zeros((h, w), np.uint8)
        # Normalize the ROI to 3-channel BGR: a 2D grayscale or 4-channel BGRA
        # input would otherwise break the axis=2 channel reductions below.
        roi = image[y : y + bh, x : x + bw]
@@ -238,6 +238,13 @@ class JimengEngine:
        """
        h, w = image.shape[:2]
        x, y, bw, bh = loc.bbox
+        # A degenerate ROI (a sliver from an extremely wide/short image) cannot hold
+        # the mark and would feed cv2's GaussianBlur/morphology a ~1-px-tall array,
+        # which can fault the native code on some platforms (observed: a Windows
+        # access violation via the always-align removal's residual `detect`). Skip
+        # the cv2 pipeline and return an empty mask there.
+        if bh < 16 or bw < 16:
+            return np.zeros((h, w), np.uint8)
        # Normalize the ROI to 3-channel BGR: a 2D grayscale or 4-channel BGRA
        # input would otherwise break the axis=2 channel reductions below.
        roi = image[y : y + bh, x : x + bw]