hacksider-Deep-Live-Cam

mirror of https://github.com/hacksider/Deep-Live-Cam.git synced 2026-07-18 16:07:23 +02:00

Author	SHA1	Message	Date
Kenneth Estanislao	07e2e960c8	Update Quick Start version from v2.7 RC1 to v2.7 RC2	2026-05-24 18:55:35 +08:00
Dopan	08b2dd2526	Merge pull request #1844 from hklcf/fix/bugfix-batch lgtm	2026-05-23 16:54:41 +08:00
hklcf	886e64b320	Fix: resolve 5 confirmed bugs (imwrite_unicode, macOS memory, face_analyser None crash, silent sys.exit, core memory calc)	2026-05-23 10:37:20 +08:00
Kenneth Estanislao	aa6f2cbade	Update version from v2.7 beta to v2.7 RC1 in README	2026-05-21 05:11:41 +08:00
Kenneth Estanislao	a21ccf488c	Update version number in README.md to 2.1.6 2.7-RC1	2026-05-18 04:15:01 +08:00
Kenneth Estanislao	ca8e39e3bb	Fix mouth mask	2026-05-18 02:11:04 +08:00
Kenneth Estanislao	0e97e474e4	better swapping	2026-05-18 01:40:01 +08:00
Kenneth Estanislao	9c67a7aacc	fixed poisson blend	2026-05-18 01:36:24 +08:00
Dopan	4a674d33ef	Merge pull request #1826 from obook/pr/preload-nvidia-libs-linux Pre-load NVIDIA shared libraries on Linux	2026-05-17 00:04:06 +08:00
Olivier Booklage	682450755f	Avoid duplicating LD_LIBRARY_PATH entries Skip prepending a directory that is already on LD_LIBRARY_PATH, so a repeated import of run.py does not bloat the variable. Addresses review feedback on #1826.	2026-05-16 14:57:02 +02:00
Olivier Booklage	12a3f6a007	Pre-load NVIDIA shared libraries on Linux Mirrors the Windows preload block from #1775. When onnxruntime-gpu is installed via pip with nvidia-cudnn-cu12, the .so files sit under venv/lib/pythonX.Y/site-packages/nvidia/<pkg>/lib/ and the dynamic linker never sees them. LD_LIBRARY_PATH cannot be set after Python starts. Pre-loads every lib.so via ctypes.CDLL with RTLD_GLOBAL before onnxruntime opens its CUDA provider. Also extends LD_LIBRARY_PATH so child processes (ffmpeg) inherit the path. Fixes "libcudnn.so.9: cannot open shared object file" on pip-only Linux installs.	2026-05-16 14:45:54 +02:00
Kenneth Estanislao	cede099ccb	Update version number in README.md to 2.1.5	2026-05-15 16:33:57 +08:00
Kenneth Estanislao	81a1986ef8	Changed to pyqtUI Standardizing the UI from quickstart to github version	2026-05-15 16:33:27 +08:00
Kenneth Estanislao	ed758eb693	Speed optimization	2026-05-15 15:53:55 +08:00
Kenneth Estanislao	9c5f01c7f1	some fix for face enhancers	2026-05-15 15:13:57 +08:00
Kenneth Estanislao	8bdc348779	Update .gitignore	2026-05-15 14:52:56 +08:00
Makaru	e34d204c2e	Merge pull request #1803 from zuyua9/fix/get-one-face-detected-faces-zuyua9 fix(face): reuse pre-detected face list comment: tested, all good	2026-05-08 10:20:56 +08:00
zuyua9	d1376b07d1	fix(face): avoid hiding invalid face inputs	2026-05-08 01:50:25 +08:00
zuyua9	5deadaf428	fix(face): reuse pre-detected face list	2026-05-08 01:35:55 +08:00
Kenneth Estanislao	2fba52e11b	Merge pull request #1782 from iikuzmychov/fix/black-border-paste-back	2026-04-29 22:31:09 +08:00
Ihor Kuzmychov	0926b65aaf	Merge branch 'hacksider:main' into fix/black-border-paste-back	2026-04-23 19:58:12 +02:00
Ihor Kuzmychov	297acded3b	fix: use BORDER_REPLICATE for face warp to eliminate black border	2026-04-23 19:42:32 +02:00
KRSHH	014bce0704	Delete PERFORMANCE.md Removing Claude session summary	2026-04-23 22:12:55 +05:30
KRSHH	c962399669	Delete REVIEW_TODOS.md	2026-04-23 22:11:53 +05:30
Kenneth Estanislao	2dd42dfc75	Merge pull request #1777 from maxwbuckley/coreml-scalar-gather-fix Keep GFPGAN on ANE: widen scalar Gather indices for CoreML EP	2026-04-22 22:17:34 +08:00
Kenneth Estanislao	c38d669f7c	Merge pull request #1776 from maxwbuckley/paste-back-optimization Paste-back: O(crop_area) compositing + uint8 cv2 SIMD blend	2026-04-22 22:14:45 +08:00
Max Buckley	890a6d41b6	onnx_optimize: widen scalar Gather indices for CoreML EP ORT's CoreML EP GatherOpBuilder::IsOpSupportedImpl explicitly rejects rank-0 (scalar) index tensors. StyleGAN-derived models (GFPGAN's 1024 variant has 16 of them, one per style-code slice) hit this in the generator, and the resulting CPU fallbacks split the CoreML subgraph into multiple partitions with boundary crossings on every inference. Add a load-time ONNX rewrite that promotes each scalar index to [1] and squeezes the added axis on the Gather output — semantically identical but CoreML-compatible. GFPGAN now runs as a single CoreML partition with zero CPU-fallback nodes; inference drops from ~87 ms to ~81 ms on an M-series Mac. The fix has been filed upstream as microsoft/onnxruntime#28180 — the existing code comment in gather_op_builder.cc already describes this exact workaround, it just isn't applied. Once the upstream fix ships and the ORT floor is raised, this pass can be deleted.	2026-04-22 14:08:18 +02:00
Max Buckley	f95a0bb7fb	Make square aligned-face assumption explicit in _fast_paste_back Addresses Sourcery feedback on PR #1776: _get_soft_alpha caches a single NxN template keyed by N, which is correct for the inswapper model (128x128 aligned-face space) but would silently mis-warp if a caller ever passed a non-square aligned face. Assert the shape instead of silently assuming it.	2026-04-22 13:40:18 +02:00
Max Buckley	e957a7f4dd	Move BGR→RGB after resize in preview display path The processing thread was running cvtColor on the full-resolution 1920×1080 frame before queueing it for display. Since the display thread immediately resizes the frame to the preview window (~5× smaller pixel count), doing the colour conversion on the resized buffer is cheaper overall. Processing thread now queues BGR; display thread resizes then cvtColor.	2026-04-22 13:31:11 +02:00
Kenneth Estanislao	19416cb3cb	Merge pull request #1775 from maxwbuckley/unify-mac-windows Apple Silicon + Windows CUDA perf: 4-5x FPS, wider capture, platform routing	2026-04-22 18:38:32 +08:00
Max Buckley	cbf0859347	Paste-back blend: uint8 cv2 SIMD, no float32 round-trip Both face_swapper._fast_paste_back and face_enhancer._paste_back were doing a numpy float32 round-trip per frame: convert the target crop and the warped face to float32, blend, clip, cast back to uint8. That's four crop-sized allocations plus unvectorized elementwise math. Replace with a fused uint8 blend using cv2.merge + cv2.multiply + cv2.add, which cv2 dispatches to SIMD (NEON on Apple Silicon / AVX on x86). Stored alpha templates switched from float32 [0, 1] to uint8 [0, 255] so no conversion is needed per frame. CUDA paths also simplified — upload uint8 alpha (less bandwidth) and scale on device. Micro-bench on 1000x1000 RGB crop: current (float32 numpy): 9.43 ms cv2 uint8 fused: 1.16 ms (8.1× faster, max diff 2/255) Visual diff is imperceptible (quantization noise in the last step).	2026-04-22 12:05:39 +02:00
Max Buckley	a6c99607fc	Cut paste-back from quartic to linear in face size _fast_paste_back used to erode and Gaussian-blur the warped alpha mask in output coordinates with kernel sizes proportional to the on-screen face bbox. That made the per-frame cost ~O(area * k^2) — a face filling half the frame took ~8x the compositing work of one filling a quarter, which is why FPS fell off when leaning into the camera. Instead, build a feathered alpha template once at aligned-face resolution (128x128 for inswapper) and warp the soft mask per-frame. The affine transform preserves the relative feather width, so the visual output is equivalent; the per-frame cost is now O(crop_area) with no size-scaled erode/blur and no size-scaled padding. Also collapses the CPU fallback onto the same shape — it previously did a full-frame warpAffine twice per call, which scaled with the whole frame instead of the face crop.	2026-04-22 11:58:02 +02:00
Max Buckley	0a87d63560	Address PR #1775 review: pipelined-detection race and CUDA-graph monkey-patch - core._run_pipe_pipeline: hand the background detector its own copy of the frame. The frame processors mutate in place via paste-back, which was racing with concurrent face detection on the same buffer. - face_swapper._init_cuda_graph_session: replace the `swapper.session.run` monkey-patch with a `_CudaGraphSessionAdapter` that proxies every attribute to the underlying session and only overrides `.run()`. Guarded so repeat init does not double-wrap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:45:59 +02:00
Max Buckley	ea19030c74	Add PERFORMANCE.md and REVIEW_TODOS.md PERFORMANCE.md documents measured gains on MacBook Pro M3 Max vs hacksider/Deep-Live-Cam main@64d3f06: - Face swap only: <5 FPS -> >20 FPS - Face swap + GFPGAN: <2 FPS -> >10 FPS - Camera: 640x480 -> 960x540 MJPEG @ 60fps Breaks down the contributors (camera negotiation, CoreML graph rewrites with before/after op latencies, pipeline overlap, GFPGAN temporal cache, paste-back optimization, platform routing, Windows CUDA path) and how to reproduce. REVIEW_TODOS.md captures 12 findings from two independent reviews (Claude in-tree + Codex second opinion) grouped as Blockers / Should-fix / Consider, each with file:line and suggested fix. The two Blocker/Should-fix items are addressed in the preceding commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:08:33 +02:00
Max Buckley	4d04e830bc	Fix CUDA-graph replay race + many_faces enhancer regression Two issues surfaced in post-squash review of `f65aeae`: 1. CUDA-graph replay buffers were shared across threads with no lock. `_cuda_graph_swap_inference` mutates module-level ort_input/ort_latent and runs run_with_iobinding — concurrent swap calls on Windows/CUDA could overwrite each other's bound input buffers before replay, producing wrong-face output. Added `_cuda_graph_lock` around the full update/run/read sequence. 2. Face enhancer loop unconditionally broke after the first face, so `many_faces=True` silently enhanced only one face. Also, the single-slot temporal cache would paste the same enhancement onto every target if reused in many-faces mode. Gated the break on `not many_faces_mode` and disabled the cache path in that mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:08:23 +02:00
Max Buckley	f65aeae5db	Apple Silicon + Windows CUDA perf: 60 FPS pipeline, cross-platform routing Bundles CoreML graph rewrites, GPU-accelerated pipeline work, Windows CUDA fixes, and Mac/Windows runtime routing into a single drop. CoreML (Apple Silicon): - Decompose Pad(reflect) → Slice+Concat in inswapper_128 so the model runs in one CoreML partition instead of 14 (TEMPORARY: fixed upstream in microsoft/onnxruntime#28073, drop when ORT >= 1.26.0). - Fold Shape/Gather chains to constants in det_10g (21ms → 4ms). - Decompose Split(axis=1) → Slice pairs in GFPGAN (155ms → 89ms). - Route detection model to GPU so the ANE is free for the swap model. - Centralize provider/config selection in create_onnx_session. Pipeline (all platforms): - Parallelize face landmark + recognition post-detection; skip landmark_2d_106 when only face_swapper is active. - Pipeline face detection with swap for ANE overlap. - GPU-accelerated paste_back, MJPEG capture, zero-copy display path. - Standalone pipeline benchmark script. Windows / CUDA: - CUDA graphs + FP16 model + all-GPU pipeline for 1080p 60 FPS. - Auto-detect GPU provider and fix DLL discovery for Windows CUDA execution. Cross-platform: - platform_info helper for Mac/Windows runtime routing. - GFPGAN 30 fps + MSMF camera 60 fps with adaptive pipeline tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 10:44:59 +02:00
KRSHH	64d3f06089	Delete tests directory	2026-04-19 17:36:33 +05:30
Kenneth Estanislao	fceafcb234	Merge pull request #1751 from Gujiassh/fix/face-mask-none-frame-guard fix(face-mask): guard create_face_mask against None frame	2026-04-15 14:13:18 +08:00
Kenneth Estanislao	033475b89c	Update version in README from 2.1.2 to 2.1.3	2026-04-15 01:29:59 +08:00
Kenneth Estanislao	07711af712	Update contributors section in README.md	2026-04-15 01:29:44 +08:00
Kenneth Estanislao	44664d8a7f	Merge pull request #1746 from maxwbuckley/apple-silicon-perf-optimizations Apple Silicon performance: 1.5 → 10+ FPS (zero quality loss)	2026-04-15 01:25:51 +08:00
gujishh	15a3f537a4	test: cover additional invalid frame guards	2026-04-13 21:09:27 +09:00
gujishh	fbcea9e135	fix(face-mask): guard create_face_mask against None frame	2026-04-12 14:19:48 +09:00
Max Buckley	646b0f816f	Move hot-path imports to module scope Address Sourcery review feedback: move face_align and get_one_face imports from inside per-frame functions to module-level to avoid repeated attribute lookup overhead in the processing loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:34:53 +02:00
Max Buckley	bcdd0ce2dd	Apple Silicon performance: 1.5 → 10+ FPS (zero quality loss) Fix CoreML execution provider falling back to CPU silently, eliminate redundant per-frame face detection, and optimize the paste-back blend to operate on the face bounding box instead of the full frame. All changes are quality-neutral (pixel-identical output verified) and benefit non-Mac platforms via the shared detection and paste-back improvements. Changes: - Remove unsupported CoreML options (RequireStaticShapes, MaximumCacheSize) that caused ORT 1.24 to silently fall back to CPUExecutionProvider - Add _fast_paste_back(): bbox-restricted erode/blur/blend, skip dead fake_diff code in insightface's inswapper (computed but never used) - process_frame() accepts optional pre-detected target_face to avoid redundant get_one_face() call (~30-40ms saved per frame, all platforms) - In-memory pipeline detects face once and shares across processors - Fix get_face_swapper() to fall back to FP16 model when FP32 absent - Fix pre_start() to accept either model variant (was FP16-only check) - Make tensorflow import conditional (fixes crash on macOS) - Add missing tqdm dep, make tensorflow/pygrabber platform-conditional Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:28:07 +02:00
Kenneth Estanislao	8703d394d6	ONNX CUDA exhaustive convolution search + IO binding	2026-04-09 16:34:27 +08:00
Kenneth Estanislao	69e3fc5611	Rendering optimization The PNG encode/decode alone was consuming significant CPU time per frame. This is eliminated entirely.	2026-04-09 16:25:22 +08:00
Kenneth Estanislao	2b26d5539e	supress error message Some people just want the opencv error gone. I keep on telling them that it is only for blurs and color conversion. It is the onnx runtime who is running the swap.	2026-04-09 16:04:00 +08:00
Kenneth Estanislao	fea5a4c2d2	Merge pull request #1707 from rohanrathi99/main Switch to FP32 model by default, add run script	2026-04-05 23:19:17 +08:00
Kenneth Estanislao	51fb7a6ad6	Merge pull request #1722 from mvanhorn/osc/1654-face-enhancer-v2 fix(face-enhancer): add missing process_frame_v2 method	2026-04-05 23:16:52 +08:00

1 2 3 4 5 ...

612 Commits