Files
facefusion/facefusion/vision.py
T
Henry Ruhs 7609df6747 Next (#436)
* Rename landmark 5 variables

* Mark as NEXT

* Render tabs for multiple ui layout usage

* Allow many face detectors at once, Add face detector tweaks

* Remove face detector tweaks for now (kinda placebo)

* Fix lint issues

* Allow rendering the landmark-5 and landmark-5/68 via debugger

* Fix naming

* Convert face landmark based on confidence score

* Convert face landmark based on confidence score

* Add scrfd face detector model (#397)

* Add scrfd face detector model

* Switch to scrfd_2.5g.onnx model

* Just some renaming

* Downgrade OpenCV, Add SYSTEM_VERSION_COMPAT=0 for MacOS

* Improve naming

* prepare detect frame outside of semaphore

* Feat/process manager (#399)

* Minor naming

* Introduce process manager to start and stop

* Introduce process manager to start and stop

* Introduce process manager to start and stop

* Introduce process manager to start and stop

* Introduce process manager to start and stop

* Remove useless test for now

* Avoid useless variables

* Show stop once is_processing is True

* Allow to stop ffmpeg processing too

* Implement output image resolution (#403)

* Implement output image resolution

* Reorder code

* Simplify output logic and therefore fix bug

* Frame-enhancer-onnx (#404)

* changes

* changes

* changes

* changes

* add models

* update workflow

* Some cleanup

* Some cleanup

* Feat/frame enhancer polishing (#410)

* Some cleanup

* Polish the frame enhancer

* Frame Enhancer: Add more models, optimize processing

* Minor changes

* Improve readability of create_tile_frames and merge_tile_frames

* We don't have enough models yet

* Feat/face landmarker score (#413)

* Introduce face landmarker score

* Fix testing

* Fix testing

* Use release for score related sliders

* Reduce face landmark fallbacks

* Scores and landmarks in Face dict, Change color-theme in face debugger

* Scores and landmarks in Face dict, Change color-theme in face debugger

* Fix some naming

* Add 8K support (for whatever reasons)

* Fix testing

* Using get() for face.landmarks

* Introduce statistics

* More statistics

* Limit the histogram equalization

* Enable queue() for default layout

* Improve copy_image()

* Fix error when switching detector model

* Always set UI values with globals if possible

* Use different logic for output image and output video resolutions

* Enforce re-download if file size is off

* Remove unused method

* Remove unused method

* Remove unused warning filter

* Improved output path normalization (#419)

* Handle some exceptions

* Handle some exceptions

* Cleanup

* Prevent countless thread locks

* Listen to user feedback

* Fix webp edge case

* Feat/cuda device detection (#424)

* Introduce cuda device detection

* Introduce cuda device detection

* it's gtx

* Move logic to run_nvidia_smi()

* Finalize execution device naming

* Finalize execution device naming

* Merge execution_helper.py to execution.py

* Undo lowercase of values

* Undo lowercase of values

* Finalize naming

* Add missing entry to ini

* fix lip_syncer preview (#426)

* fix lip_syncer preview

* change

* Refresh preview on trim changes

* Cleanup frame enhancers and remove useless scale in merge_video() (#428)

* Keep lips over the whole video once lip syncer is enabled (#430)

* Keep lips over the whole video once lip syncer is enabled

* changes

* changes

* Fix spacing

* Use empty audio frame on silence

* Use empty audio frame on silence

* Fix ConfigParser encoding (#431)

facefusion.ini is UTF8 encoded but config.py doesn't specify encoding which results in corrupted entries when non english characters are used. 

Affected entries:
source_paths
target_path
output_path

* Adjust spacing

* Improve the GTX 16 series detection

* Use general exception to catch ParseError

* Use general exception to catch ParseError

* Host frame enhancer models4

* Use latest onnxruntime

* Minor changes in benchmark UI

* Different approach to cancel ffmpeg process

* Add support for amd amf encoders (#433)

* Add amd_amf encoders

* remove -rc cqp from amf encoder parameters

* Improve terminal output, move success messages to debug mode

* Improve terminal output, move success messages to debug mode

* Minor update

* Minor update

* onnxruntime 1.17.1 matches cuda 12.2

* Feat/improved scaling (#435)

* Prevent useless temp upscaling, Show resolution and fps in terminal output

* Remove temp frame quality

* Remove temp frame quality

* Tiny cleanup

* Default back to png for temp frames, Remove pix_fmt from frame extraction due mjpeg error

* Fix inswapper fallback by onnxruntime

* Fix inswapper fallback by major onnxruntime

* Fix inswapper fallback by major onnxruntime

* Add testing for vision restrict methods

* Fix left / right face mask regions, add left-ear and right-ear

* Flip right and left again

* Undo ears - does not work with box mask

* Prepare next release

* Fix spacing

* 100% quality when using jpg for temp frames

* Use span_kendata_x4 as default as of speed

* benchmark optimal tile and pad

* Undo commented out code

* Add real_esrgan_x4_fp16 model

* Be strict when using many face detectors

---------

Co-authored-by: Harisreedhar <46858047+harisreedhar@users.noreply.github.com>
Co-authored-by: aldemoth <159712934+aldemoth@users.noreply.github.com>
2024-03-14 19:56:54 +01:00

219 lines
7.4 KiB
Python

from typing import Optional, List, Tuple
from functools import lru_cache
import cv2
import numpy
from cv2.typing import Size
from facefusion.typing import VisionFrame, Resolution, Fps
from facefusion.choices import image_template_sizes, video_template_sizes
from facefusion.filesystem import is_image, is_video
@lru_cache(maxsize = 128)
def read_static_image(image_path : str) -> Optional[VisionFrame]:
return read_image(image_path)
def read_static_images(image_paths : List[str]) -> Optional[List[VisionFrame]]:
frames = []
if image_paths:
for image_path in image_paths:
frames.append(read_static_image(image_path))
return frames
def read_image(image_path : str) -> Optional[VisionFrame]:
if is_image(image_path):
return cv2.imread(image_path)
return None
def write_image(image_path : str, vision_frame : VisionFrame) -> bool:
if image_path:
return cv2.imwrite(image_path, vision_frame)
return False
def detect_image_resolution(image_path : str) -> Optional[Resolution]:
if is_image(image_path):
image = read_image(image_path)
height, width = image.shape[:2]
return width, height
return None
def restrict_image_resolution(image_path : str, resolution : Resolution) -> Resolution:
if is_image(image_path):
image_resolution = detect_image_resolution(image_path)
if image_resolution < resolution:
return image_resolution
return resolution
def get_video_frame(video_path : str, frame_number : int = 0) -> Optional[VisionFrame]:
if is_video(video_path):
video_capture = cv2.VideoCapture(video_path)
if video_capture.isOpened():
frame_total = video_capture.get(cv2.CAP_PROP_FRAME_COUNT)
video_capture.set(cv2.CAP_PROP_POS_FRAMES, min(frame_total, frame_number - 1))
has_vision_frame, vision_frame = video_capture.read()
video_capture.release()
if has_vision_frame:
return vision_frame
return None
def create_image_resolutions(resolution : Resolution) -> List[str]:
resolutions = []
temp_resolutions = []
if resolution:
width, height = resolution
temp_resolutions.append(normalize_resolution(resolution))
for template_size in image_template_sizes:
temp_resolutions.append(normalize_resolution((width * template_size, height * template_size)))
temp_resolutions = sorted(set(temp_resolutions))
for temp_resolution in temp_resolutions:
resolutions.append(pack_resolution(temp_resolution))
return resolutions
def count_video_frame_total(video_path : str) -> int:
if is_video(video_path):
video_capture = cv2.VideoCapture(video_path)
if video_capture.isOpened():
video_frame_total = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
video_capture.release()
return video_frame_total
return 0
def detect_video_fps(video_path : str) -> Optional[float]:
if is_video(video_path):
video_capture = cv2.VideoCapture(video_path)
if video_capture.isOpened():
video_fps = video_capture.get(cv2.CAP_PROP_FPS)
video_capture.release()
return video_fps
return None
def restrict_video_fps(video_path : str, fps : Fps) -> Fps:
if is_video(video_path):
video_fps = detect_video_fps(video_path)
if video_fps < fps:
return video_fps
return fps
def detect_video_resolution(video_path : str) -> Optional[Resolution]:
if is_video(video_path):
video_capture = cv2.VideoCapture(video_path)
if video_capture.isOpened():
width = video_capture.get(cv2.CAP_PROP_FRAME_WIDTH)
height = video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT)
video_capture.release()
return int(width), int(height)
return None
def restrict_video_resolution(video_path : str, resolution : Resolution) -> Resolution:
if is_video(video_path):
video_resolution = detect_video_resolution(video_path)
if video_resolution < resolution:
return video_resolution
return resolution
def create_video_resolutions(resolution : Resolution) -> List[str]:
resolutions = []
temp_resolutions = []
if resolution:
width, height = resolution
temp_resolutions.append(normalize_resolution(resolution))
for template_size in video_template_sizes:
if width > height:
temp_resolutions.append(normalize_resolution((template_size * width / height, template_size)))
else:
temp_resolutions.append(normalize_resolution((template_size, template_size * height / width)))
temp_resolutions = sorted(set(temp_resolutions))
for temp_resolution in temp_resolutions:
resolutions.append(pack_resolution(temp_resolution))
return resolutions
def normalize_resolution(resolution : Tuple[float, float]) -> Resolution:
width, height = resolution
if width and height:
normalize_width = round(width / 2) * 2
normalize_height = round(height / 2) * 2
return normalize_width, normalize_height
return 0, 0
def pack_resolution(resolution : Resolution) -> str:
width, height = normalize_resolution(resolution)
return str(width) + 'x' + str(height)
def unpack_resolution(resolution : str) -> Resolution:
width, height = map(int, resolution.split('x'))
return width, height
def resize_frame_resolution(vision_frame : VisionFrame, max_resolution : Resolution) -> VisionFrame:
height, width = vision_frame.shape[:2]
max_width, max_height = max_resolution
if height > max_height or width > max_width:
scale = min(max_height / height, max_width / width)
new_width = int(width * scale)
new_height = int(height * scale)
return cv2.resize(vision_frame, (new_width, new_height))
return vision_frame
def normalize_frame_color(vision_frame : VisionFrame) -> VisionFrame:
return cv2.cvtColor(vision_frame, cv2.COLOR_BGR2RGB)
def create_tile_frames(vision_frame : VisionFrame, size : Size) -> Tuple[List[VisionFrame], int, int]:
vision_frame = numpy.pad(vision_frame, ((size[1], size[1]), (size[1], size[1]), (0, 0)))
tile_width = size[0] - 2 * size[2]
pad_size_bottom = size[2] + tile_width - vision_frame.shape[0] % tile_width
pad_size_right = size[2] + tile_width - vision_frame.shape[1] % tile_width
pad_vision_frame = numpy.pad(vision_frame, ((size[2], pad_size_bottom), (size[2], pad_size_right), (0, 0)))
pad_height, pad_width = pad_vision_frame.shape[:2]
row_range = range(size[2], pad_height - size[2], tile_width)
col_range = range(size[2], pad_width - size[2], tile_width)
tile_vision_frames = []
for row_vision_frame in row_range:
top = row_vision_frame - size[2]
bottom = row_vision_frame + size[2] + tile_width
for column_vision_frame in col_range:
left = column_vision_frame - size[2]
right = column_vision_frame + size[2] + tile_width
tile_vision_frames.append(pad_vision_frame[top:bottom, left:right, :])
return tile_vision_frames, pad_width, pad_height
def merge_tile_frames(tile_vision_frames : List[VisionFrame], temp_width : int, temp_height : int, pad_width : int, pad_height : int, size : Size) -> VisionFrame:
merge_vision_frame = numpy.zeros((pad_height, pad_width, 3)).astype(numpy.uint8)
tile_width = tile_vision_frames[0].shape[1] - 2 * size[2]
tiles_per_row = min(pad_width // tile_width, len(tile_vision_frames))
for index, tile_vision_frame in enumerate(tile_vision_frames):
tile_vision_frame = tile_vision_frame[size[2]:-size[2], size[2]:-size[2]]
row_index = index // tiles_per_row
col_index = index % tiles_per_row
top = row_index * tile_vision_frame.shape[0]
bottom = top + tile_vision_frame.shape[0]
left = col_index * tile_vision_frame.shape[1]
right = left + tile_vision_frame.shape[1]
merge_vision_frame[top:bottom, left:right, :] = tile_vision_frame
merge_vision_frame = merge_vision_frame[size[1] : size[1] + temp_height, size[1]: size[1] + temp_width, :]
return merge_vision_frame