Fix #29 black output: use fp16-fixed SDXL VAE on fp16 GPUs

The stock SDXL VAE overflows to NaN in fp16, so the plain img2img path decodes to an all-black image on a CUDA/XPU fp16 backend. This is the raiw.cc black result HitaoLin reported (a 1086x1448 input came back uniformly black). cpu/mps run fp32 and never hit it, and the differential / region-hires pipeline already upcasts the VAE itself, so only the plain path on a fp16 GPU was exposed. `_load_pipeline` now loads `madebyollin/sdxl-vae-fp16-fix` for the default SDXL checkpoint when running fp16, gated by the pure helper `_needs_fp16_vae_fix`. A custom non-SDXL model keeps its own VAE. The decision logic is unit-tested without a download (TestFp16VaeFix). The black->clean recovery itself needs a CUDA GPU and was not verifiable on this MPS machine; it must be confirmed on the backend. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-28 10:18:49 +02:00 · 2026-05-30 14:31:51 -07:00
parent 9be66752c5
commit d88b87ca4e
3 changed files with 52 additions and 0 deletions
@@ -61,6 +61,24 @@ def is_watermark_removal_available() -> bool:
    return _HAS_TORCH and _HAS_DIFFUSERS


+# Drop-in fp16-safe replacement for the SDXL VAE. The stock SDXL VAE overflows
+# to NaN in fp16 and decodes to an all-black image (issue #29: the raiw.cc black
+# result on a CUDA fp16 backend). This community VAE is numerically rescaled to
+# stay in fp16 range. SDXL-architecture only.
+_SDXL_FP16_VAE_ID = "madebyollin/sdxl-vae-fp16-fix"
+
+
+def _needs_fp16_vae_fix(model_id: str, default_model_id: str, is_fp16: bool) -> bool:
+    """Whether the plain img2img pipeline must swap in the fp16-fixed SDXL VAE.
+
+    Gated to the default SDXL checkpoint running in fp16: cpu/mps run fp32 (the
+    stock VAE is fine there) and the differential pipeline upcasts the VAE on its
+    own, so only this path on a fp16 GPU (CUDA/XPU) hits the NaN/black decode.
+    A custom non-SDXL ``model_id`` keeps its own VAE (the fix is SDXL-specific).
+    """
+    return is_fp16 and model_id == default_model_id
+
+
 _CUDA_FIX_ENV_KEY = "NOAI_CUDA_FIXED"


@@ -370,6 +388,14 @@ class WatermarkRemover:
            if self.hf_token:
                load_kwargs["token"] = self.hf_token

+            # Avoid the SDXL fp16 NaN/all-black decode (issue #29) by loading the
+            # fp16-fixed VAE for the default SDXL checkpoint on a fp16 GPU.
+            if _needs_fp16_vae_fix(self.model_id, self.DEFAULT_MODEL_ID, self.torch_dtype == torch.float16):
+                from diffusers import AutoencoderKL
+
+                self._set_progress("Loading fp16-fixed SDXL VAE (avoids black output)...")
+                load_kwargs["vae"] = AutoencoderKL.from_pretrained(_SDXL_FP16_VAE_ID, torch_dtype=torch.float16)
+
            self._pipeline = AutoImg2ImgPipeline.from_pretrained(  # type: ignore
                self.model_id,
                **load_kwargs,