Files
remove-ai-watermarks/tests
Victor Kuznetsov 675590e8b2 perf(invisible): read the fp16 weight variant to halve the cold-start weight load
InvisibleEngine loads SDXL/ControlNet in fp16 on CUDA/XPU but called from_pretrained
without variant="fp16", so it read the full fp32 weight files (~7 GB) and downcast in
memory. _load_from_pretrained now passes variant="fp16" when torch_dtype is float16,
reading the half-precision files (~3.5 GB) instead - roughly halving the cold-start
weight read + host->device transfer (a phase-timed Modal run measured weight load as
~half of the ~25s cold start). Falls back to the default weights when a checkpoint ships
no fp16 variant (a custom --model), so the worst case is the prior behavior. fp32
(cpu/mps) and bf16 (qwen) never request the variant.

Tests: TestFp16WeightVariant (variant requested on fp16, fallback on missing, never on
fp32).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:42:47 -07:00
..