Skip prepending a directory that is already on LD_LIBRARY_PATH, so a
repeated import of run.py does not bloat the variable.
Addresses review feedback on #1826.
Mirrors the Windows preload block from #1775. When onnxruntime-gpu is
installed via pip with nvidia-cudnn-cu12, the .so files sit under
venv/lib/pythonX.Y/site-packages/nvidia/<pkg>/lib/ and the dynamic
linker never sees them. LD_LIBRARY_PATH cannot be set after Python
starts.
Pre-loads every lib*.so* via ctypes.CDLL with RTLD_GLOBAL before
onnxruntime opens its CUDA provider. Also extends LD_LIBRARY_PATH so
child processes (ffmpeg) inherit the path.
Fixes "libcudnn.so.9: cannot open shared object file" on pip-only
Linux installs.
Bundles CoreML graph rewrites, GPU-accelerated pipeline work, Windows CUDA
fixes, and Mac/Windows runtime routing into a single drop.
CoreML (Apple Silicon):
- Decompose Pad(reflect) → Slice+Concat in inswapper_128 so the model
runs in one CoreML partition instead of 14 (TEMPORARY: fixed upstream
in microsoft/onnxruntime#28073, drop when ORT >= 1.26.0).
- Fold Shape/Gather chains to constants in det_10g (21ms → 4ms).
- Decompose Split(axis=1) → Slice pairs in GFPGAN (155ms → 89ms).
- Route detection model to GPU so the ANE is free for the swap model.
- Centralize provider/config selection in create_onnx_session.
Pipeline (all platforms):
- Parallelize face landmark + recognition post-detection; skip landmark_2d_106
when only face_swapper is active.
- Pipeline face detection with swap for ANE overlap.
- GPU-accelerated paste_back, MJPEG capture, zero-copy display path.
- Standalone pipeline benchmark script.
Windows / CUDA:
- CUDA graphs + FP16 model + all-GPU pipeline for 1080p 60 FPS.
- Auto-detect GPU provider and fix DLL discovery for Windows CUDA execution.
Cross-platform:
- platform_info helper for Mac/Windows runtime routing.
- GFPGAN 30 fps + MSMF camera 60 fps with adaptive pipeline tuning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Change default face swapper model to FP32 for better GPU compatibility and avoid NaN issues on certain GPUs.
Revamped `run.py` to adjust PATH variables for dependencies setup and re-added with expanded configuration.