mirror of
https://github.com/wiltodelta/remove-ai-watermarks.git
synced 2026-07-04 23:47:49 +02:00
675590e8b2
InvisibleEngine loads SDXL/ControlNet in fp16 on CUDA/XPU but called from_pretrained without variant="fp16", so it read the full fp32 weight files (~7 GB) and downcast in memory. _load_from_pretrained now passes variant="fp16" when torch_dtype is float16, reading the half-precision files (~3.5 GB) instead - roughly halving the cold-start weight read + host->device transfer (a phase-timed Modal run measured weight load as ~half of the ~25s cold start). Falls back to the default weights when a checkpoint ships no fp16 variant (a custom --model), so the worst case is the prior behavior. fp32 (cpu/mps) and bf16 (qwen) never request the variant. Tests: TestFp16WeightVariant (variant requested on fp16, fallback on missing, never on fp32). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>