Files
hacksider-Deep-Live-Cam/modules
Max Buckley cbf0859347 Paste-back blend: uint8 cv2 SIMD, no float32 round-trip
Both face_swapper._fast_paste_back and face_enhancer._paste_back were
doing a numpy float32 round-trip per frame: convert the target crop and
the warped face to float32, blend, clip, cast back to uint8. That's four
crop-sized allocations plus unvectorized elementwise math.

Replace with a fused uint8 blend using cv2.merge + cv2.multiply + cv2.add,
which cv2 dispatches to SIMD (NEON on Apple Silicon / AVX on x86). Stored
alpha templates switched from float32 [0, 1] to uint8 [0, 255] so no
conversion is needed per frame. CUDA paths also simplified — upload uint8
alpha (less bandwidth) and scale on device.

Micro-bench on 1000x1000 RGB crop:
  current (float32 numpy): 9.43 ms
  cv2 uint8 fused:         1.16 ms  (8.1× faster, max diff 2/255)

Visual diff is imperceptible (quantization noise in the last step).
2026-04-22 12:05:39 +02:00
..
2025-05-13 00:14:49 +08:00
2026-02-12 19:44:04 +08:00
2024-09-10 05:40:55 +05:30
2025-10-12 22:33:09 +08:00
2025-01-07 14:04:18 +08:00
2026-02-12 19:44:04 +08:00
2025-10-12 22:33:09 +08:00
2025-10-12 22:33:09 +08:00
2023-09-24 21:36:57 +08:00
2024-09-19 17:38:02 +08:00
2026-04-09 16:25:22 +08:00