hacksider-Deep-Live-Cam

mirror of https://github.com/hacksider/Deep-Live-Cam.git synced 2026-05-14 02:42:09 +02:00

Author	SHA1	Message	Date
zuyua9	d1376b07d1	fix(face): avoid hiding invalid face inputs	2026-05-08 01:50:25 +08:00
zuyua9	5deadaf428	fix(face): reuse pre-detected face list	2026-05-08 01:35:55 +08:00
Ihor Kuzmychov	297acded3b	fix: use BORDER_REPLICATE for face warp to eliminate black border	2026-04-23 19:42:32 +02:00
Max Buckley	890a6d41b6	onnx_optimize: widen scalar Gather indices for CoreML EP ORT's CoreML EP GatherOpBuilder::IsOpSupportedImpl explicitly rejects rank-0 (scalar) index tensors. StyleGAN-derived models (GFPGAN's 1024 variant has 16 of them, one per style-code slice) hit this in the generator, and the resulting CPU fallbacks split the CoreML subgraph into multiple partitions with boundary crossings on every inference. Add a load-time ONNX rewrite that promotes each scalar index to [1] and squeezes the added axis on the Gather output — semantically identical but CoreML-compatible. GFPGAN now runs as a single CoreML partition with zero CPU-fallback nodes; inference drops from ~87 ms to ~81 ms on an M-series Mac. The fix has been filed upstream as microsoft/onnxruntime#28180 — the existing code comment in gather_op_builder.cc already describes this exact workaround, it just isn't applied. Once the upstream fix ships and the ORT floor is raised, this pass can be deleted.	2026-04-22 14:08:18 +02:00
Max Buckley	f95a0bb7fb	Make square aligned-face assumption explicit in _fast_paste_back Addresses Sourcery feedback on PR #1776: _get_soft_alpha caches a single NxN template keyed by N, which is correct for the inswapper model (128x128 aligned-face space) but would silently mis-warp if a caller ever passed a non-square aligned face. Assert the shape instead of silently assuming it.	2026-04-22 13:40:18 +02:00
Max Buckley	e957a7f4dd	Move BGR→RGB after resize in preview display path The processing thread was running cvtColor on the full-resolution 1920×1080 frame before queueing it for display. Since the display thread immediately resizes the frame to the preview window (~5× smaller pixel count), doing the colour conversion on the resized buffer is cheaper overall. Processing thread now queues BGR; display thread resizes then cvtColor.	2026-04-22 13:31:11 +02:00
Max Buckley	cbf0859347	Paste-back blend: uint8 cv2 SIMD, no float32 round-trip Both face_swapper._fast_paste_back and face_enhancer._paste_back were doing a numpy float32 round-trip per frame: convert the target crop and the warped face to float32, blend, clip, cast back to uint8. That's four crop-sized allocations plus unvectorized elementwise math. Replace with a fused uint8 blend using cv2.merge + cv2.multiply + cv2.add, which cv2 dispatches to SIMD (NEON on Apple Silicon / AVX on x86). Stored alpha templates switched from float32 [0, 1] to uint8 [0, 255] so no conversion is needed per frame. CUDA paths also simplified — upload uint8 alpha (less bandwidth) and scale on device. Micro-bench on 1000x1000 RGB crop: current (float32 numpy): 9.43 ms cv2 uint8 fused: 1.16 ms (8.1× faster, max diff 2/255) Visual diff is imperceptible (quantization noise in the last step).	2026-04-22 12:05:39 +02:00
Max Buckley	a6c99607fc	Cut paste-back from quartic to linear in face size _fast_paste_back used to erode and Gaussian-blur the warped alpha mask in output coordinates with kernel sizes proportional to the on-screen face bbox. That made the per-frame cost ~O(area * k^2) — a face filling half the frame took ~8x the compositing work of one filling a quarter, which is why FPS fell off when leaning into the camera. Instead, build a feathered alpha template once at aligned-face resolution (128x128 for inswapper) and warp the soft mask per-frame. The affine transform preserves the relative feather width, so the visual output is equivalent; the per-frame cost is now O(crop_area) with no size-scaled erode/blur and no size-scaled padding. Also collapses the CPU fallback onto the same shape — it previously did a full-frame warpAffine twice per call, which scaled with the whole frame instead of the face crop.	2026-04-22 11:58:02 +02:00
Max Buckley	0a87d63560	Address PR #1775 review: pipelined-detection race and CUDA-graph monkey-patch - core._run_pipe_pipeline: hand the background detector its own copy of the frame. The frame processors mutate in place via paste-back, which was racing with concurrent face detection on the same buffer. - face_swapper._init_cuda_graph_session: replace the `swapper.session.run` monkey-patch with a `_CudaGraphSessionAdapter` that proxies every attribute to the underlying session and only overrides `.run()`. Guarded so repeat init does not double-wrap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:45:59 +02:00
Max Buckley	4d04e830bc	Fix CUDA-graph replay race + many_faces enhancer regression Two issues surfaced in post-squash review of `f65aeae`: 1. CUDA-graph replay buffers were shared across threads with no lock. `_cuda_graph_swap_inference` mutates module-level ort_input/ort_latent and runs run_with_iobinding — concurrent swap calls on Windows/CUDA could overwrite each other's bound input buffers before replay, producing wrong-face output. Added `_cuda_graph_lock` around the full update/run/read sequence. 2. Face enhancer loop unconditionally broke after the first face, so `many_faces=True` silently enhanced only one face. Also, the single-slot temporal cache would paste the same enhancement onto every target if reused in many-faces mode. Gated the break on `not many_faces_mode` and disabled the cache path in that mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:08:23 +02:00
Max Buckley	f65aeae5db	Apple Silicon + Windows CUDA perf: 60 FPS pipeline, cross-platform routing Bundles CoreML graph rewrites, GPU-accelerated pipeline work, Windows CUDA fixes, and Mac/Windows runtime routing into a single drop. CoreML (Apple Silicon): - Decompose Pad(reflect) → Slice+Concat in inswapper_128 so the model runs in one CoreML partition instead of 14 (TEMPORARY: fixed upstream in microsoft/onnxruntime#28073, drop when ORT >= 1.26.0). - Fold Shape/Gather chains to constants in det_10g (21ms → 4ms). - Decompose Split(axis=1) → Slice pairs in GFPGAN (155ms → 89ms). - Route detection model to GPU so the ANE is free for the swap model. - Centralize provider/config selection in create_onnx_session. Pipeline (all platforms): - Parallelize face landmark + recognition post-detection; skip landmark_2d_106 when only face_swapper is active. - Pipeline face detection with swap for ANE overlap. - GPU-accelerated paste_back, MJPEG capture, zero-copy display path. - Standalone pipeline benchmark script. Windows / CUDA: - CUDA graphs + FP16 model + all-GPU pipeline for 1080p 60 FPS. - Auto-detect GPU provider and fix DLL discovery for Windows CUDA execution. Cross-platform: - platform_info helper for Mac/Windows runtime routing. - GFPGAN 30 fps + MSMF camera 60 fps with adaptive pipeline tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 10:44:59 +02:00
Kenneth Estanislao	fceafcb234	Merge pull request #1751 from Gujiassh/fix/face-mask-none-frame-guard fix(face-mask): guard create_face_mask against None frame	2026-04-15 14:13:18 +08:00
gujishh	fbcea9e135	fix(face-mask): guard create_face_mask against None frame	2026-04-12 14:19:48 +09:00
Max Buckley	646b0f816f	Move hot-path imports to module scope Address Sourcery review feedback: move face_align and get_one_face imports from inside per-frame functions to module-level to avoid repeated attribute lookup overhead in the processing loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:34:53 +02:00
Max Buckley	bcdd0ce2dd	Apple Silicon performance: 1.5 → 10+ FPS (zero quality loss) Fix CoreML execution provider falling back to CPU silently, eliminate redundant per-frame face detection, and optimize the paste-back blend to operate on the face bounding box instead of the full frame. All changes are quality-neutral (pixel-identical output verified) and benefit non-Mac platforms via the shared detection and paste-back improvements. Changes: - Remove unsupported CoreML options (RequireStaticShapes, MaximumCacheSize) that caused ORT 1.24 to silently fall back to CPUExecutionProvider - Add _fast_paste_back(): bbox-restricted erode/blur/blend, skip dead fake_diff code in insightface's inswapper (computed but never used) - process_frame() accepts optional pre-detected target_face to avoid redundant get_one_face() call (~30-40ms saved per frame, all platforms) - In-memory pipeline detects face once and shares across processors - Fix get_face_swapper() to fall back to FP16 model when FP32 absent - Fix pre_start() to accept either model variant (was FP16-only check) - Make tensorflow import conditional (fixes crash on macOS) - Add missing tqdm dep, make tensorflow/pygrabber platform-conditional Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:28:07 +02:00
Kenneth Estanislao	8703d394d6	ONNX CUDA exhaustive convolution search + IO binding	2026-04-09 16:34:27 +08:00
Kenneth Estanislao	69e3fc5611	Rendering optimization The PNG encode/decode alone was consuming significant CPU time per frame. This is eliminated entirely.	2026-04-09 16:25:22 +08:00
Kenneth Estanislao	2b26d5539e	supress error message Some people just want the opencv error gone. I keep on telling them that it is only for blurs and color conversion. It is the onnx runtime who is running the swap.	2026-04-09 16:04:00 +08:00
Kenneth Estanislao	fea5a4c2d2	Merge pull request #1707 from rohanrathi99/main Switch to FP32 model by default, add run script	2026-04-05 23:19:17 +08:00
Kenneth Estanislao	51fb7a6ad6	Merge pull request #1722 from mvanhorn/osc/1654-face-enhancer-v2 fix(face-enhancer): add missing process_frame_v2 method	2026-04-05 23:16:52 +08:00
yetval	11fb5bfbc6	Fix CUDA VRAM exhaustion during video processing (#1721 )	2026-04-02 22:59:41 -04:00
Kenneth Estanislao	1edc4bc298	DML Lock fixed for cuda and CPU	2026-04-01 23:56:01 +08:00
ozp3	ab834d5640	feat: AMD DML optimization - GPU face detection, detection throttle, pre-load fix	2026-04-01 23:56:01 +08:00
Kenneth Estanislao	bb4ef4a133	Apply suggestion from @sourcery-ai[bot] Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>	2026-04-01 23:13:59 +08:00
Karl	a3fd56a312	Fix missing video output reporting and encoding flow	2026-04-01 15:22:09 +08:00
Matt Van Horn	9525d45291	fix(face-enhancer): add missing process_frame_v2 method The live webcam preview in ui.py calls process_frame_v2() on all frame processors, but face_enhancer.py was missing this method. This caused an AttributeError crash when the GFPGAN face enhancer was enabled during live mode. Fixes https://github.com/hacksider/Deep-Live-Cam/issues/1654 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 23:49:12 -07:00
Kenneth Estanislao	97321a740d	Update face_analyser.py 320 was over optimized, put back to 640	2026-03-27 21:24:19 +08:00
RohanW11p	9207386e07	Switch to FP32 model by default, add run script Change default face swapper model to FP32 for better GPU compatibility and avoid NaN issues on certain GPUs. Revamped `run.py` to adjust PATH variables for dependencies setup and re-added with expanded configuration.	2026-03-27 17:29:01 +05:30
Kenneth Estanislao	ee9699ee70	Happy 80k! 2.1 Released! - Face randomizer added!	2026-03-13 22:09:18 +08:00
Kenneth Estanislao	3c8b259a3f	Some edits on the UI - Grouped the face enhancers - Make the mouth mask just a slider - Removed the redundant switches	2026-03-13 22:03:28 +08:00
Kenneth Estanislao	0d8f3b1f82	Fix on vulnerability report https://github.com/hacksider/Deep-Live-Cam/issues/1695	2026-03-06 23:26:48 +08:00
Lauri Gates	e340b0da8a	feat(ui): add hover tooltips to all controls Add ToolTip class (modules/ui_tooltip.py) and wire descriptive hover tooltips onto every button, switch, slider, and dropdown in the main window. Tooltips appear after a 500ms hover delay and are clamped to screen bounds. This requires no new dependencies — ToolTip uses only customtkinter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:41:24 +02:00
Kenneth Estanislao	d0f81ed755	Merge pull request #1671 from laurigates/pr/fix-macos-camera-enum fix(macos): replace cv2_enumerate_cameras with safe bounded loop	2026-02-24 14:29:00 +08:00
Kenneth Estanislao	de01b28802	Merge pull request #1678 from laurigates/pr/perf-opacity-handling perf(face-swapper): optimize opacity handling and frame copies	2026-02-24 14:28:17 +08:00
Lauri Gates	b645d5e60b	fix(macos): replace cv2_enumerate_cameras with safe bounded loop cv2_enumerate_cameras(CAP_AVFOUNDATION) probes indices 0-99 through OpenCV's AVFoundation backend, which intermittently segfaults (exit code 139) when invalid device indices are probed. Replace with a bounded cv2.VideoCapture loop (range(10)) that safely skips unavailable indices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:22:35 +02:00
Kenneth Estanislao	31b3a97003	Merge pull request #1680 from laurigates/pr/perf-float32-buffer-reuse perf(processing): optimize post-processing with float32 and buffer reuse	2026-02-23 15:13:03 +08:00
Lauri Gates	e93fb95903	perf(processing): optimize post-processing with float32 and buffer reuse - Replace float64 with float32 in apply_mouth_area() blending masks — float32 provides sufficient precision for 8-bit image blending and halves memory bandwidth - Use float32 in apply_mask_area() mask computations - Vectorize hull padding loop in create_face_mask() (face_masking.py) replacing per-point Python loop with NumPy array operations - Fix apply_color_transfer() to use proper [0,1] LAB conversion — cv2.cvtColor with float32 input expects [0,1] range, not [0,255] - Pre-compute inverse masks to avoid repeated (1.0 - mask) subtraction - Use np.broadcast_to instead of np.repeat for face mask expansion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:27:31 +02:00
Lauri Gates	aabf41050a	perf(face-swapper): optimize opacity handling and frame copies Move opacity calculation before frame copy to skip the copy when opacity is 1.0 (common case). Add early return path for full opacity. Clear PREVIOUS_FRAME_RESULT instead of caching when interpolation is disabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:12:02 +02:00
Lauri Gates	e57116de68	feat: add GPEN-BFR 256 and 512 ONNX face enhancers Add two new face enhancement processors using GPEN-BFR ONNX models at 256x256 and 512x512 resolutions. Models auto-download on first use from GitHub releases. Integrates into existing frame processor pipeline alongside GFPGAN enhancer with UI toggle switches. - modules/paths.py: Shared path constants module - modules/processors/frame/_onnx_enhancer.py: ONNX enhancement utilities - modules/processors/frame/face_enhancer_gpen256.py: GPEN-BFR 256 processor - modules/processors/frame/face_enhancer_gpen512.py: GPEN-BFR 512 processor - modules/core.py: Add GPEN choices to --frame-processor CLI arg - modules/globals.py: Add GPEN entries to fp_ui toggle dict - modules/ui.py: Add GPEN toggle switches and processing integration Closes #1663 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:39:12 +02:00
Lauri Gates	ca6cba9311	perf(ui): decouple face detection from swap in live webcam pipeline Add a dedicated detection thread that runs face detection continuously on the latest captured frame and publishes results to a shared dict. The processing/swap thread reads cached detection results instead of running detection inline, so it never blocks on the 15-30ms detection cost. Architecture change: 2 threads → 3 threads Before: capture → [detect + swap] → display After: capture → swap (uses cached detections) → display ↘ detect (async, writes to shared cache) ↗ Also replaces the blocking while/ROOT.update() display loop with ROOT.after()-based scheduling, which avoids Tk event loop re-entrancy issues and UI freezes. Closes #1664	2026-02-22 18:41:47 +02:00
Kenneth Estanislao	d89385457e	Merge pull request #1659 from laurigates/pr/fix-tk9-compat fix(ui): patch CTkOptionMenu for Tk 9.0 compatibility	2026-02-23 00:13:47 +08:00
Kenneth Estanislao	e56a79222e	Merge branch 'main' of https://github.com/hacksider/Deep-Live-Cam	2026-02-23 00:01:36 +08:00
Kenneth Estanislao	5b0bf735b5	use onnx on face enhancer	2026-02-23 00:01:22 +08:00
Kenneth Estanislao	36bb1a29b0	Merge pull request #1189 from davidstrouk/main Fix model download path and URL	2026-02-22 23:55:13 +08:00
Lauri Gates	a1722c7b2e	fix(ui): patch CTkOptionMenu for Tk 9.0 compatibility In Tk 9.0, Menu.index("end") returns "" instead of raising TclError on empty menus. CustomTkinter's DropdownMenu._add_menu_commands doesn't handle this case, causing a crash when creating CTkOptionMenu widgets (e.g., the camera selector dropdown). Add a monkey-patch that guards against the empty-string return value.	2026-02-22 11:59:51 +02:00
Kenneth Estanislao	f0ec0744f7	GPU Accelerated OpenCV	2026-02-12 19:44:04 +08:00
Kenneth Estanislao	36b6ea0019	Update ui.py DETECT_EVERY_N = 2 reuses cached face positions on alternate frames	2026-02-12 18:54:18 +08:00
Kenneth Estanislao	523ee53c34	Update ui.py Separate capture and processing threads with queue.Queue, dropping frames when queues are full	2026-02-12 18:50:40 +08:00
Kenneth Estanislao	e544889805	Lowers the face analyzer making it a bit faster	2026-02-12 18:47:42 +08:00
Kenneth Estanislao	a4c617af3e	Update metadata.py	2026-02-10 12:23:28 +08:00

1 2 3 4

194 Commits