Files
hacksider-Deep-Live-Cam/modules/processors
Max Buckley cbf0859347 Paste-back blend: uint8 cv2 SIMD, no float32 round-trip
Both face_swapper._fast_paste_back and face_enhancer._paste_back were
doing a numpy float32 round-trip per frame: convert the target crop and
the warped face to float32, blend, clip, cast back to uint8. That's four
crop-sized allocations plus unvectorized elementwise math.

Replace with a fused uint8 blend using cv2.merge + cv2.multiply + cv2.add,
which cv2 dispatches to SIMD (NEON on Apple Silicon / AVX on x86). Stored
alpha templates switched from float32 [0, 1] to uint8 [0, 255] so no
conversion is needed per frame. CUDA paths also simplified — upload uint8
alpha (less bandwidth) and scale on device.

Micro-bench on 1000x1000 RGB crop:
  current (float32 numpy): 9.43 ms
  cv2 uint8 fused:         1.16 ms  (8.1× faster, max diff 2/255)

Visual diff is imperceptible (quantization noise in the last step).
2026-04-22 12:05:39 +02:00
..
2023-09-24 21:36:57 +08:00