hacksider-Deep-Live-Cam

mirror of https://github.com/hacksider/Deep-Live-Cam.git synced 2026-05-31 01:39:29 +02:00

Files

T

Max Buckley cbf0859347 Paste-back blend: uint8 cv2 SIMD, no float32 round-trip

Both face_swapper._fast_paste_back and face_enhancer._paste_back were
doing a numpy float32 round-trip per frame: convert the target crop and
the warped face to float32, blend, clip, cast back to uint8. That's four
crop-sized allocations plus unvectorized elementwise math.

Replace with a fused uint8 blend using cv2.merge + cv2.multiply + cv2.add,
which cv2 dispatches to SIMD (NEON on Apple Silicon / AVX on x86). Stored
alpha templates switched from float32 [0, 1] to uint8 [0, 255] so no
conversion is needed per frame. CUDA paths also simplified — upload uint8
alpha (less bandwidth) and scale on device.

Micro-bench on 1000x1000 RGB crop:
  current (float32 numpy): 9.43 ms
  cv2 uint8 fused:         1.16 ms  (8.1× faster, max diff 2/255)

Visual diff is imperceptible (quantization noise in the last step).

2026-04-22 12:05:39 +02:00

frame

Paste-back blend: uint8 cv2 SIMD, no float32 round-trip

2026-04-22 12:05:39 +02:00

__init__.py

initial commit

2023-09-24 21:36:57 +08:00