I made a little research and suggestions for code compatibility with the new training model #292

Closed
opened 2022-07-01 15:43:15 +02:00 by netrunner-exe · 6 comments
netrunner-exe commented 2022-07-01 15:43:15 +02:00 (Migrated from github.com)

Hi all. I did a little research in order to make the test code compatible with the new training model. I really hope that @neuralchen or @NNNNAI based on this research will make the necessary adaptation of the code in the repository to make everything work perfectly!
Also many thanks to @boreas-l for the idea and hints on how to implement it. Some points were not able to make it work, please improve it to work properly!

  1. Create a new option for compatibility with old checkpoints. I will not write all the details, I will just give little explanations and post the finished code with changes

SimSwap/options/test_options.py

'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-23 17:08:08
Description: 
'''
from .base_options import BaseOptions


def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
        return False

    
class TestOptions(BaseOptions):
    def initialize(self):
        BaseOptions.initialize(self)
        self.parser.add_argument('--ntest', type=int, default=float("inf"), help='# of test examples.')
        self.parser.add_argument('--results_dir', type=str, default='./results/', help='saves results here.')
        self.parser.add_argument('--aspect_ratio', type=float, default=1.0, help='aspect ratio of result images')
        self.parser.add_argument('--phase', type=str, default='test', help='train, val, test, etc')
        self.parser.add_argument('--which_epoch', type=str, default='latest', help='which epoch to load? set to latest to use latest cached model')
        self.parser.add_argument('--how_many', type=int, default=50, help='how many test images to run')       
        self.parser.add_argument('--cluster_path', type=str, default='features_clustered_010.npy', help='the path for clustered results of encoded features')
        self.parser.add_argument('--use_encoded_image', action='store_true', help='if specified, encode the real image to get the feature map')
        self.parser.add_argument("--export_onnx", type=str, help="export ONNX model to a given file")
        self.parser.add_argument("--engine", type=str, help="run serialized TRT engine")
        self.parser.add_argument("--onnx", type=str, help="run ONNX model via TRT")        
        self.parser.add_argument("--Arc_path", type=str, default='models/BEST_checkpoint.tar', help="run ONNX model via TRT")
        self.parser.add_argument("--pic_a_path", type=str, default='./crop_224/gdg.jpg', help="Person who provides identity information")
        self.parser.add_argument("--pic_b_path", type=str, default='./crop_224/zrf.jpg', help="Person who provides information other than their identity")
        self.parser.add_argument("--pic_specific_path", type=str, default='./crop_224/zrf.jpg', help="The specific person to be swapped")
        self.parser.add_argument("--multisepcific_dir", type=str, default='./demo_file/multispecific', help="Dir for multi specific")
        self.parser.add_argument("--video_path", type=str, default='./demo_file/multi_people_1080p.mp4', help="path for the video to swap")
        self.parser.add_argument("--temp_path", type=str, default='./temp_results', help="path to save temporarily images")
        self.parser.add_argument("--output_path", type=str, default='./output/', help="results path")
        self.parser.add_argument('--id_thres', type=float, default=0.03, help='how many test images to run')
        self.parser.add_argument('--no_simswaplogo', action='store_true', help='Remove the watermark')
        self.parser.add_argument('--use_mask', action='store_true', help='Use mask for better result')
        self.parser.add_argument('--crop_size', type=int, default=224, help='Crop of size of input image')
        self.parser.add_argument('--new_model', type=str2bool, default=False, const=False, nargs='?', help='Use new pretrained model')
        self.parser.add_argument('--Gdeep', type=str2bool, default=False)
        self.isTrain = False
  1. Create a new file with the necessary functions
    SimSwap/util/swap_new_model.py
# -*- coding: utf-8 -*-
# @Author: netrunner-exe
# @Date:   2022-07-01 13:45:41
# @Last Modified by:   netrunner-exe
# @Last Modified time: 2022-07-01 13:47:06
import cv2
import numpy as np
import torch
from PIL import Image
from torchvision import transforms

def img2tensor(imgs, bgr2rgb=True, float32=True):
    """Numpy array to tensor.
    Args:
        imgs (list[ndarray] | ndarray): Input images.
        bgr2rgb (bool): Whether to change bgr to rgb.
        float32 (bool): Whether to change to float32.
    Returns:
        list[tensor] | tensor: Tensor images. If returned results only have
            one element, just return tensor.
    """

    def _totensor(img, bgr2rgb, float32):
        if img.shape[2] == 3 and bgr2rgb:
            if img.dtype == 'float64':
                img = img.astype('float32')
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = torch.from_numpy(img.transpose(2, 0, 1))
        if float32:
            img = img.float()
        return img

    if isinstance(imgs, list):
        return [_totensor(img, bgr2rgb, float32) for img in imgs]
    else:
        return _totensor(imgs, bgr2rgb, float32)


def swap_result_new_model(face_align_crop, model, latend_id):
    img_align_crop = Image.fromarray(cv2.cvtColor(face_align_crop, cv2.COLOR_BGR2RGB))

    img_tensor = transforms.ToTensor()(img_align_crop)
    img_tensor = img_tensor.view(-1, 3, img_align_crop.size[0], img_align_crop.size[1])

    mean = torch.tensor([0.485, 0.456, 0.406]).cuda().view(1, 3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).cuda().view(1, 3, 1, 1)

    img_tensor = img_tensor.cuda(non_blocking=True)
    img_tensor = img_tensor.sub_(mean).div_(std)

    imagenet_std = torch.Tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
    imagenet_mean = torch.Tensor([0.485, 0.456, 0.406]).view(3, 1, 1)

    swap_res = model.netG(img_tensor, latend_id).cpu()
    swap_res = (swap_res * imagenet_std + imagenet_mean).numpy()
    swap_res = swap_res.squeeze(0).transpose((1, 2, 0))

    swap_result = np.clip(255*swap_res, 0, 255)
    swap_result = img2tensor(swap_result / 255., bgr2rgb=False, float32=True)
    return swap_result
  1. Unfortunately for multispecific and swapspecific I could not make it work. I will take test_wholeimage_swapsingle.py as an example. Making small changes to work with the new model and compatibility with the old ones. The only point: if you are using the beta 512 model, you will need to add --name 512 instead of only --crop_size 512 to make the beta 512 model work in the future.
'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-24 19:19:43
Description: 
'''
import cv2
import torch
import fractions
import numpy as np
from PIL import Image
import torch.nn.functional as F
from torchvision import transforms
from models.models import create_model
from models.projected_model import fsModel
from options.test_options import TestOptions
from insightface_func.face_detect_crop_single import Face_detect_crop
from util.reverse2original import reverse2wholeimage
from util.swap_new_model import swap_result_new_model
import os
from util.add_watermark import watermark_image
from util.norm import SpecificNorm
from parsing_model.model import BiSeNet


def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0

transformer_Arcface = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])


def _totensor(array):
    tensor = torch.from_numpy(array)
    img = tensor.transpose(0, 1).transpose(0, 2).contiguous()
    return img.float().div(255)


if __name__ == '__main__':
    opt = TestOptions().parse()
    start_epoch, epoch_iter = 1, 0
    crop_size = opt.crop_size

    torch.nn.Module.dump_patches = True
    if crop_size == 512:
      if opt.name == str(512):
        opt.which_epoch = 550000
      else:
        opt.Gdeep = True
        opt.new_model = True

      mode = 'ffhq'
    else:
      mode = 'None'

    logoclass = watermark_image('./simswaplogo/simswaplogo.png')

    if opt.new_model == True:
        model = fsModel()
        model.initialize(opt)
        model.netG.eval()
    else:            
        model = create_model(opt)
        model.eval()
       
    spNorm = SpecificNorm()
    app = Face_detect_crop(name='antelope', root='./insightface_func/models')
    app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640), mode=mode)

    with torch.no_grad():
        pic_a = opt.pic_a_path
        img_a_whole = cv2.imread(pic_a)
        img_a_align_crop, _ = app.get(img_a_whole,crop_size)
        img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB))
        
        img_a = transformer_Arcface(img_a_align_crop_pil)
        img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])

        # convert numpy to tensor
        img_id = img_id.cuda()

        #create latent id
        img_id_downsample = F.interpolate(img_id, size=(112,112))
        latend_id = model.netArc(img_id_downsample)
        latend_id = F.normalize(latend_id, p=2, dim=1)


        ############## Forward Pass ######################

        pic_b = opt.pic_b_path
        img_b_whole = cv2.imread(pic_b)

        img_b_align_crop_list, b_mat_list = app.get(img_b_whole, crop_size)
        # detect_results = None

        swap_result_list = []
        b_align_crop_tenor_list = []

        for b_align_crop in img_b_align_crop_list:
            b_align_crop_tenor = _totensor(cv2.cvtColor(b_align_crop[0], cv2.COLOR_BGR2RGB))[None,...].cuda()

            if opt.new_model == True:
              swap_result = swap_result_new_model(b_align_crop, model, latend_id)
            else:
              swap_result = model(None, b_align_crop_tenor, latend_id, None, True)[0]

            swap_result_list.append(swap_result)
            b_align_crop_tenor_list.append(b_align_crop_tenor)

        if opt.use_mask:
            n_classes = 19
            net = BiSeNet(n_classes=n_classes)
            net.cuda()
            save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
            net.load_state_dict(torch.load(save_pth))
            net.eval()
        else:
            net = None

        reverse2wholeimage(b_align_crop_tenor_list, swap_result_list, b_mat_list, crop_size, img_b_whole, logoclass, \
            os.path.join(opt.output_path, 'result_whole_swapsingle.jpg'), opt.no_simswaplogo, pasring_model=net, use_mask=opt.use_mask, norm=spNorm)

        print(' ')
        print('************ Done ! ************')
  1. To work with video - test_video_swapsingle.py
'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-24 19:00:38
Description: 
'''
import cv2
import torch
import fractions
import numpy as np
from PIL import Image
import torch.nn.functional as F
from torchvision import transforms
from models.models import create_model
from models.projected_model import fsModel
from options.test_options import TestOptions
from insightface_func.face_detect_crop_single import Face_detect_crop
from util.videoswap import video_swap
import os

def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0

transformer = transforms.Compose([
        transforms.ToTensor(),
        #transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

transformer_Arcface = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])


# detransformer = transforms.Compose([
#         transforms.Normalize([0, 0, 0], [1/0.229, 1/0.224, 1/0.225]),
#         transforms.Normalize([-0.485, -0.456, -0.406], [1, 1, 1])
#     ])


if __name__ == '__main__':
    opt = TestOptions().parse()
    start_epoch, epoch_iter = 1, 0
    crop_size = opt.crop_size

    torch.nn.Module.dump_patches = True
    if crop_size == 512:
      if opt.name == str(512):
        opt.which_epoch = 550000
      else:
        opt.Gdeep = True
        opt.new_model = True

      mode = 'ffhq'
    else:
      mode = 'None'

    if opt.new_model == True:
        model = fsModel()
        model.initialize(opt)
        model.netG.eval()
    else:            
        model = create_model(opt)
        model.eval()

    app = Face_detect_crop(name='antelope', root='./insightface_func/models')
    app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640),mode=mode)

    with torch.no_grad():
        pic_a = opt.pic_a_path
        # img_a = Image.open(pic_a).convert('RGB')
        img_a_whole = cv2.imread(pic_a)
        img_a_align_crop, _ = app.get(img_a_whole,crop_size)
        img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB)) 
        img_a = transformer_Arcface(img_a_align_crop_pil)
        img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2])

        # pic_b = opt.pic_b_path
        # img_b_whole = cv2.imread(pic_b)
        # img_b_align_crop, b_mat = app.get(img_b_whole,crop_size)
        # img_b_align_crop_pil = Image.fromarray(cv2.cvtColor(img_b_align_crop,cv2.COLOR_BGR2RGB)) 
        # img_b = transformer(img_b_align_crop_pil)
        # img_att = img_b.view(-1, img_b.shape[0], img_b.shape[1], img_b.shape[2])

        # convert numpy to tensor
        img_id = img_id.cuda()
        # img_att = img_att.cuda()

        #create latent id
        img_id_downsample = F.interpolate(img_id, size=(112,112))
        latend_id = model.netArc(img_id_downsample)
        latend_id = F.normalize(latend_id, p=2, dim=1)

        video_swap(opt.video_path, latend_id, model, app, opt.output_path, temp_results_dir=opt.temp_path,\
            no_simswaplogo=opt.no_simswaplogo, use_mask=opt.use_mask, crop_size=crop_size, new_model=opt.new_model)

and videoswap.py

'''
Author: Naiyuan liu
Github: https://github.com/NNNNAI
Date: 2021-11-23 17:03:58
LastEditors: Naiyuan liu
LastEditTime: 2021-11-24 19:19:52
Description: 
'''
import os 
import cv2
import glob
import torch
import shutil
import numpy as np
from tqdm import tqdm
from util.reverse2original import reverse2wholeimage
import moviepy.editor as mp
from moviepy.editor import AudioFileClip, VideoFileClip 
from moviepy.video.io.ImageSequenceClip import ImageSequenceClip
import  time
from util.add_watermark import watermark_image
from util.norm import SpecificNorm
from util.swap_new_model import swap_result_new_model
from parsing_model.model import BiSeNet


def _totensor(array):
    tensor = torch.from_numpy(array)
    img = tensor.transpose(0, 1).transpose(0, 2).contiguous()
    return img.float().div(255)


def video_swap(video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo=False, use_mask=False, new_model=False):
    video_forcheck = VideoFileClip(video_path)
    if video_forcheck.audio is None:
        no_audio = True
    else:
        no_audio = False

    del video_forcheck

    if not no_audio:
        video_audio_clip = AudioFileClip(video_path)

    video = cv2.VideoCapture(video_path)
    logoclass = watermark_image('./simswaplogo/simswaplogo.png')
    ret = True
    frame_index = 0

    frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

    # video_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))

    # video_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    fps = video.get(cv2.CAP_PROP_FPS)
    if  os.path.exists(temp_results_dir):
            shutil.rmtree(temp_results_dir)

    spNorm = SpecificNorm()
    if use_mask:
        n_classes = 19
        net = BiSeNet(n_classes=n_classes)
        net.cuda()
        save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
        net.load_state_dict(torch.load(save_pth))
        net.eval()
    else:
        net = None

    # while ret:
    for frame_index in tqdm(range(frame_count)): 
        ret, frame = video.read()
        if  ret:
            detect_results = detect_model.get(frame,crop_size)

            if detect_results is not None:
                # print(frame_index)
                if not os.path.exists(temp_results_dir):
                        os.mkdir(temp_results_dir)
                frame_align_crop_list = detect_results[0]
                frame_mat_list = detect_results[1]
                swap_result_list = []
                frame_align_crop_tenor_list = []
                for frame_align_crop in frame_align_crop_list:

                    # BGR TO RGB
                    # frame_align_crop_RGB = frame_align_crop[...,::-1]

                    frame_align_crop_tenor = _totensor(cv2.cvtColor(frame_align_crop,cv2.COLOR_BGR2RGB))[None,...].cuda()

                    if new_model == True:
                        swap_result = swap_result_new_model(frame_align_crop, swap_model, id_vetor)
                    else:
                        swap_result = swap_model(None, frame_align_crop_tenor, id_vetor, None, True)[0]

                    cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)), frame)
                    swap_result_list.append(swap_result)
                    frame_align_crop_tenor_list.append(frame_align_crop_tenor)

                    

                reverse2wholeimage(frame_align_crop_tenor_list,swap_result_list, frame_mat_list, crop_size, frame, logoclass,\
                    os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)),no_simswaplogo,pasring_model =net,use_mask=use_mask, norm = spNorm)

            else:
                if not os.path.exists(temp_results_dir):
                    os.mkdir(temp_results_dir)
                frame = frame.astype(np.uint8)
                if not no_simswaplogo:
                    frame = logoclass.apply_frames(frame)
                cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)), frame)
        else:
            break

    video.release()

    # image_filename_list = []
    path = os.path.join(temp_results_dir,'*.jpg')
    image_filenames = sorted(glob.glob(path))

    clips = ImageSequenceClip(image_filenames,fps = fps)

    if not no_audio:
        clips = clips.set_audio(video_audio_clip)


    clips.write_videofile(save_path,audio_codec='aac')

Next as reference i took 512 checkpoint that was posted by @mittalgovind Link. It has 390000 it. Next in checkpoints folder I created a folder simswap_512_test and copied nessesary files to the root of this folder

Full example of the command:
For video:
python test_video_swapsingle.py --which_epoch 390000 --new_model True --checkpoints_dir './checkpoints/simswap_512_test' --isTrain false --crop_size 512 --Arc_path arcface_model/arcface_checkpoint.tar --pic_a_path ./demo_file/Iron_man.jpg --video_path ./demo_file/multi_people_1080p.mp4 --output_path ./output/multi_test_swapsingle.mp4 --temp_path ./temp_results --no_simswaplogo --use_mask

For image:
python test_wholeimage_swapsingle.py --which_epoch 390000 --new_model True --checkpoints_dir './checkpoints/simswap_512_test' --Arc_path arcface_model/arcface_checkpoint.tar --pic_a_path ./demo_file/Iron_man.jpg --pic_b_path ./demo_file/multi_people.jpg --output_path ./output --isTrain false --crop_size 512 --use_mask --no_simswaplogo

All these explanations are for people who have at least a little experience in modifying SimSwap files. Please check this code and examples carefully, maybe I made a typo somewhere.

Results:
res
Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural
res(1)
result_whole_swapsingle(4)
frame_0000000(1)
frame_0000000

https://user-images.githubusercontent.com/81887288/176904847-ba3b71f1-b1d3-4208-94b3-98361c0eaac2.mp4

https://user-images.githubusercontent.com/81887288/176905239-096e33a5-6d80-457a-903b-9e3fb92e302f.mp4

Hi all. I did a little research in order to make the test code compatible with the new training model. I really hope that @neuralchen or @NNNNAI based on this research will make the necessary adaptation of the code in the repository to make everything work perfectly! Also many thanks to @boreas-l for the idea and hints on how to implement it. Some points were not able to make it work, please improve it to work properly! 1. Create a new option for compatibility with old checkpoints. I will not write all the details, I will just give little explanations and post the finished code with changes `SimSwap/options/test_options.py` ``` ''' Author: Naiyuan liu Github: https://github.com/NNNNAI Date: 2021-11-23 17:03:58 LastEditors: Naiyuan liu LastEditTime: 2021-11-23 17:08:08 Description: ''' from .base_options import BaseOptions def str2bool(v): if isinstance(v, bool): return v if v.lower() in ('yes', 'true', 't', 'y', '1'): return True elif v.lower() in ('no', 'false', 'f', 'n', '0'): return False class TestOptions(BaseOptions): def initialize(self): BaseOptions.initialize(self) self.parser.add_argument('--ntest', type=int, default=float("inf"), help='# of test examples.') self.parser.add_argument('--results_dir', type=str, default='./results/', help='saves results here.') self.parser.add_argument('--aspect_ratio', type=float, default=1.0, help='aspect ratio of result images') self.parser.add_argument('--phase', type=str, default='test', help='train, val, test, etc') self.parser.add_argument('--which_epoch', type=str, default='latest', help='which epoch to load? set to latest to use latest cached model') self.parser.add_argument('--how_many', type=int, default=50, help='how many test images to run') self.parser.add_argument('--cluster_path', type=str, default='features_clustered_010.npy', help='the path for clustered results of encoded features') self.parser.add_argument('--use_encoded_image', action='store_true', help='if specified, encode the real image to get the feature map') self.parser.add_argument("--export_onnx", type=str, help="export ONNX model to a given file") self.parser.add_argument("--engine", type=str, help="run serialized TRT engine") self.parser.add_argument("--onnx", type=str, help="run ONNX model via TRT") self.parser.add_argument("--Arc_path", type=str, default='models/BEST_checkpoint.tar', help="run ONNX model via TRT") self.parser.add_argument("--pic_a_path", type=str, default='./crop_224/gdg.jpg', help="Person who provides identity information") self.parser.add_argument("--pic_b_path", type=str, default='./crop_224/zrf.jpg', help="Person who provides information other than their identity") self.parser.add_argument("--pic_specific_path", type=str, default='./crop_224/zrf.jpg', help="The specific person to be swapped") self.parser.add_argument("--multisepcific_dir", type=str, default='./demo_file/multispecific', help="Dir for multi specific") self.parser.add_argument("--video_path", type=str, default='./demo_file/multi_people_1080p.mp4', help="path for the video to swap") self.parser.add_argument("--temp_path", type=str, default='./temp_results', help="path to save temporarily images") self.parser.add_argument("--output_path", type=str, default='./output/', help="results path") self.parser.add_argument('--id_thres', type=float, default=0.03, help='how many test images to run') self.parser.add_argument('--no_simswaplogo', action='store_true', help='Remove the watermark') self.parser.add_argument('--use_mask', action='store_true', help='Use mask for better result') self.parser.add_argument('--crop_size', type=int, default=224, help='Crop of size of input image') self.parser.add_argument('--new_model', type=str2bool, default=False, const=False, nargs='?', help='Use new pretrained model') self.parser.add_argument('--Gdeep', type=str2bool, default=False) self.isTrain = False ``` 2. Create a new file with the necessary functions `SimSwap/util/swap_new_model.py` ``` # -*- coding: utf-8 -*- # @Author: netrunner-exe # @Date: 2022-07-01 13:45:41 # @Last Modified by: netrunner-exe # @Last Modified time: 2022-07-01 13:47:06 import cv2 import numpy as np import torch from PIL import Image from torchvision import transforms def img2tensor(imgs, bgr2rgb=True, float32=True): """Numpy array to tensor. Args: imgs (list[ndarray] | ndarray): Input images. bgr2rgb (bool): Whether to change bgr to rgb. float32 (bool): Whether to change to float32. Returns: list[tensor] | tensor: Tensor images. If returned results only have one element, just return tensor. """ def _totensor(img, bgr2rgb, float32): if img.shape[2] == 3 and bgr2rgb: if img.dtype == 'float64': img = img.astype('float32') img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = torch.from_numpy(img.transpose(2, 0, 1)) if float32: img = img.float() return img if isinstance(imgs, list): return [_totensor(img, bgr2rgb, float32) for img in imgs] else: return _totensor(imgs, bgr2rgb, float32) def swap_result_new_model(face_align_crop, model, latend_id): img_align_crop = Image.fromarray(cv2.cvtColor(face_align_crop, cv2.COLOR_BGR2RGB)) img_tensor = transforms.ToTensor()(img_align_crop) img_tensor = img_tensor.view(-1, 3, img_align_crop.size[0], img_align_crop.size[1]) mean = torch.tensor([0.485, 0.456, 0.406]).cuda().view(1, 3, 1, 1) std = torch.tensor([0.229, 0.224, 0.225]).cuda().view(1, 3, 1, 1) img_tensor = img_tensor.cuda(non_blocking=True) img_tensor = img_tensor.sub_(mean).div_(std) imagenet_std = torch.Tensor([0.229, 0.224, 0.225]).view(3, 1, 1) imagenet_mean = torch.Tensor([0.485, 0.456, 0.406]).view(3, 1, 1) swap_res = model.netG(img_tensor, latend_id).cpu() swap_res = (swap_res * imagenet_std + imagenet_mean).numpy() swap_res = swap_res.squeeze(0).transpose((1, 2, 0)) swap_result = np.clip(255*swap_res, 0, 255) swap_result = img2tensor(swap_result / 255., bgr2rgb=False, float32=True) return swap_result ``` 3. Unfortunately for multispecific and swapspecific I could not make it work. I will take `test_wholeimage_swapsingle.py` as an example. Making small changes to work with the new model and compatibility with the old ones. The only point: if you are using the beta 512 model, you will need to add `--name 512` instead of only `--crop_size 512 `to make the beta 512 model work in the future. ``` ''' Author: Naiyuan liu Github: https://github.com/NNNNAI Date: 2021-11-23 17:03:58 LastEditors: Naiyuan liu LastEditTime: 2021-11-24 19:19:43 Description: ''' import cv2 import torch import fractions import numpy as np from PIL import Image import torch.nn.functional as F from torchvision import transforms from models.models import create_model from models.projected_model import fsModel from options.test_options import TestOptions from insightface_func.face_detect_crop_single import Face_detect_crop from util.reverse2original import reverse2wholeimage from util.swap_new_model import swap_result_new_model import os from util.add_watermark import watermark_image from util.norm import SpecificNorm from parsing_model.model import BiSeNet def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0 transformer_Arcface = transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) def _totensor(array): tensor = torch.from_numpy(array) img = tensor.transpose(0, 1).transpose(0, 2).contiguous() return img.float().div(255) if __name__ == '__main__': opt = TestOptions().parse() start_epoch, epoch_iter = 1, 0 crop_size = opt.crop_size torch.nn.Module.dump_patches = True if crop_size == 512: if opt.name == str(512): opt.which_epoch = 550000 else: opt.Gdeep = True opt.new_model = True mode = 'ffhq' else: mode = 'None' logoclass = watermark_image('./simswaplogo/simswaplogo.png') if opt.new_model == True: model = fsModel() model.initialize(opt) model.netG.eval() else: model = create_model(opt) model.eval() spNorm = SpecificNorm() app = Face_detect_crop(name='antelope', root='./insightface_func/models') app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640), mode=mode) with torch.no_grad(): pic_a = opt.pic_a_path img_a_whole = cv2.imread(pic_a) img_a_align_crop, _ = app.get(img_a_whole,crop_size) img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB)) img_a = transformer_Arcface(img_a_align_crop_pil) img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2]) # convert numpy to tensor img_id = img_id.cuda() #create latent id img_id_downsample = F.interpolate(img_id, size=(112,112)) latend_id = model.netArc(img_id_downsample) latend_id = F.normalize(latend_id, p=2, dim=1) ############## Forward Pass ###################### pic_b = opt.pic_b_path img_b_whole = cv2.imread(pic_b) img_b_align_crop_list, b_mat_list = app.get(img_b_whole, crop_size) # detect_results = None swap_result_list = [] b_align_crop_tenor_list = [] for b_align_crop in img_b_align_crop_list: b_align_crop_tenor = _totensor(cv2.cvtColor(b_align_crop[0], cv2.COLOR_BGR2RGB))[None,...].cuda() if opt.new_model == True: swap_result = swap_result_new_model(b_align_crop, model, latend_id) else: swap_result = model(None, b_align_crop_tenor, latend_id, None, True)[0] swap_result_list.append(swap_result) b_align_crop_tenor_list.append(b_align_crop_tenor) if opt.use_mask: n_classes = 19 net = BiSeNet(n_classes=n_classes) net.cuda() save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth') net.load_state_dict(torch.load(save_pth)) net.eval() else: net = None reverse2wholeimage(b_align_crop_tenor_list, swap_result_list, b_mat_list, crop_size, img_b_whole, logoclass, \ os.path.join(opt.output_path, 'result_whole_swapsingle.jpg'), opt.no_simswaplogo, pasring_model=net, use_mask=opt.use_mask, norm=spNorm) print(' ') print('************ Done ! ************') ``` 4. To work with video - `test_video_swapsingle.py` ``` ''' Author: Naiyuan liu Github: https://github.com/NNNNAI Date: 2021-11-23 17:03:58 LastEditors: Naiyuan liu LastEditTime: 2021-11-24 19:00:38 Description: ''' import cv2 import torch import fractions import numpy as np from PIL import Image import torch.nn.functional as F from torchvision import transforms from models.models import create_model from models.projected_model import fsModel from options.test_options import TestOptions from insightface_func.face_detect_crop_single import Face_detect_crop from util.videoswap import video_swap import os def lcm(a, b): return abs(a * b) / fractions.gcd(a, b) if a and b else 0 transformer = transforms.Compose([ transforms.ToTensor(), #transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) transformer_Arcface = transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) # detransformer = transforms.Compose([ # transforms.Normalize([0, 0, 0], [1/0.229, 1/0.224, 1/0.225]), # transforms.Normalize([-0.485, -0.456, -0.406], [1, 1, 1]) # ]) if __name__ == '__main__': opt = TestOptions().parse() start_epoch, epoch_iter = 1, 0 crop_size = opt.crop_size torch.nn.Module.dump_patches = True if crop_size == 512: if opt.name == str(512): opt.which_epoch = 550000 else: opt.Gdeep = True opt.new_model = True mode = 'ffhq' else: mode = 'None' if opt.new_model == True: model = fsModel() model.initialize(opt) model.netG.eval() else: model = create_model(opt) model.eval() app = Face_detect_crop(name='antelope', root='./insightface_func/models') app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640),mode=mode) with torch.no_grad(): pic_a = opt.pic_a_path # img_a = Image.open(pic_a).convert('RGB') img_a_whole = cv2.imread(pic_a) img_a_align_crop, _ = app.get(img_a_whole,crop_size) img_a_align_crop_pil = Image.fromarray(cv2.cvtColor(img_a_align_crop[0],cv2.COLOR_BGR2RGB)) img_a = transformer_Arcface(img_a_align_crop_pil) img_id = img_a.view(-1, img_a.shape[0], img_a.shape[1], img_a.shape[2]) # pic_b = opt.pic_b_path # img_b_whole = cv2.imread(pic_b) # img_b_align_crop, b_mat = app.get(img_b_whole,crop_size) # img_b_align_crop_pil = Image.fromarray(cv2.cvtColor(img_b_align_crop,cv2.COLOR_BGR2RGB)) # img_b = transformer(img_b_align_crop_pil) # img_att = img_b.view(-1, img_b.shape[0], img_b.shape[1], img_b.shape[2]) # convert numpy to tensor img_id = img_id.cuda() # img_att = img_att.cuda() #create latent id img_id_downsample = F.interpolate(img_id, size=(112,112)) latend_id = model.netArc(img_id_downsample) latend_id = F.normalize(latend_id, p=2, dim=1) video_swap(opt.video_path, latend_id, model, app, opt.output_path, temp_results_dir=opt.temp_path,\ no_simswaplogo=opt.no_simswaplogo, use_mask=opt.use_mask, crop_size=crop_size, new_model=opt.new_model) ``` **and `videoswap.py`** ``` ''' Author: Naiyuan liu Github: https://github.com/NNNNAI Date: 2021-11-23 17:03:58 LastEditors: Naiyuan liu LastEditTime: 2021-11-24 19:19:52 Description: ''' import os import cv2 import glob import torch import shutil import numpy as np from tqdm import tqdm from util.reverse2original import reverse2wholeimage import moviepy.editor as mp from moviepy.editor import AudioFileClip, VideoFileClip from moviepy.video.io.ImageSequenceClip import ImageSequenceClip import time from util.add_watermark import watermark_image from util.norm import SpecificNorm from util.swap_new_model import swap_result_new_model from parsing_model.model import BiSeNet def _totensor(array): tensor = torch.from_numpy(array) img = tensor.transpose(0, 1).transpose(0, 2).contiguous() return img.float().div(255) def video_swap(video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo=False, use_mask=False, new_model=False): video_forcheck = VideoFileClip(video_path) if video_forcheck.audio is None: no_audio = True else: no_audio = False del video_forcheck if not no_audio: video_audio_clip = AudioFileClip(video_path) video = cv2.VideoCapture(video_path) logoclass = watermark_image('./simswaplogo/simswaplogo.png') ret = True frame_index = 0 frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT)) # video_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH)) # video_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT)) fps = video.get(cv2.CAP_PROP_FPS) if os.path.exists(temp_results_dir): shutil.rmtree(temp_results_dir) spNorm = SpecificNorm() if use_mask: n_classes = 19 net = BiSeNet(n_classes=n_classes) net.cuda() save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth') net.load_state_dict(torch.load(save_pth)) net.eval() else: net = None # while ret: for frame_index in tqdm(range(frame_count)): ret, frame = video.read() if ret: detect_results = detect_model.get(frame,crop_size) if detect_results is not None: # print(frame_index) if not os.path.exists(temp_results_dir): os.mkdir(temp_results_dir) frame_align_crop_list = detect_results[0] frame_mat_list = detect_results[1] swap_result_list = [] frame_align_crop_tenor_list = [] for frame_align_crop in frame_align_crop_list: # BGR TO RGB # frame_align_crop_RGB = frame_align_crop[...,::-1] frame_align_crop_tenor = _totensor(cv2.cvtColor(frame_align_crop,cv2.COLOR_BGR2RGB))[None,...].cuda() if new_model == True: swap_result = swap_result_new_model(frame_align_crop, swap_model, id_vetor) else: swap_result = swap_model(None, frame_align_crop_tenor, id_vetor, None, True)[0] cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)), frame) swap_result_list.append(swap_result) frame_align_crop_tenor_list.append(frame_align_crop_tenor) reverse2wholeimage(frame_align_crop_tenor_list,swap_result_list, frame_mat_list, crop_size, frame, logoclass,\ os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)),no_simswaplogo,pasring_model =net,use_mask=use_mask, norm = spNorm) else: if not os.path.exists(temp_results_dir): os.mkdir(temp_results_dir) frame = frame.astype(np.uint8) if not no_simswaplogo: frame = logoclass.apply_frames(frame) cv2.imwrite(os.path.join(temp_results_dir, 'frame_{:0>7d}.jpg'.format(frame_index)), frame) else: break video.release() # image_filename_list = [] path = os.path.join(temp_results_dir,'*.jpg') image_filenames = sorted(glob.glob(path)) clips = ImageSequenceClip(image_filenames,fps = fps) if not no_audio: clips = clips.set_audio(video_audio_clip) clips.write_videofile(save_path,audio_codec='aac') ``` Next as reference i took 512 checkpoint that was posted by @mittalgovind [Link](https://github.com/neuralchen/SimSwap/issues/255#issuecomment-1118983049). It has 390000 it. Next in **checkpoints** folder I created a folder **simswap_512_test** and copied nessesary files to the root of this folder Full example of the command: For video: `python test_video_swapsingle.py --which_epoch 390000 --new_model True --checkpoints_dir './checkpoints/simswap_512_test' --isTrain false --crop_size 512 --Arc_path arcface_model/arcface_checkpoint.tar --pic_a_path ./demo_file/Iron_man.jpg --video_path ./demo_file/multi_people_1080p.mp4 --output_path ./output/multi_test_swapsingle.mp4 --temp_path ./temp_results --no_simswaplogo --use_mask` For image: `python test_wholeimage_swapsingle.py --which_epoch 390000 --new_model True --checkpoints_dir './checkpoints/simswap_512_test' --Arc_path arcface_model/arcface_checkpoint.tar --pic_a_path ./demo_file/Iron_man.jpg --pic_b_path ./demo_file/multi_people.jpg --output_path ./output --isTrain false --crop_size 512 --use_mask --no_simswaplogo` **All these explanations are for people who have at least a little experience in modifying SimSwap files. Please check this code and examples carefully, maybe I made a typo somewhere.** **Results:** ![res](https://user-images.githubusercontent.com/81887288/176904152-0e33fd03-5151-46ac-abb7-2314c388c1db.jpg) Also if you change` mode = 'ffhq'` to `mode = 'None'` in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural ![res(1)](https://user-images.githubusercontent.com/81887288/176904391-e8a4aac5-d272-4e12-8e5c-a5f3b14a2263.jpg) ![result_whole_swapsingle(4)](https://user-images.githubusercontent.com/81887288/176904613-f856b0a8-bdb3-4157-a9a5-f3494d90b169.jpg) ![frame_0000000(1)](https://user-images.githubusercontent.com/81887288/176904666-0d80e847-db9e-433b-911d-6dfd35c652bd.jpg) ![frame_0000000](https://user-images.githubusercontent.com/81887288/176904710-c7f94791-5791-449b-9230-f855fbea9ff1.jpg) https://user-images.githubusercontent.com/81887288/176904847-ba3b71f1-b1d3-4208-94b3-98361c0eaac2.mp4 https://user-images.githubusercontent.com/81887288/176905239-096e33a5-6d80-457a-903b-9e3fb92e302f.mp4
zwang970201 commented 2022-07-02 23:58:13 +02:00 (Migrated from github.com)

My output looks like this way, do you meet similar problems?
result_whole_swapsingle

My output looks like this way, do you meet similar problems? ![result_whole_swapsingle](https://user-images.githubusercontent.com/105756635/177017219-adf5480b-b5af-4596-b2c4-9eb91f41fc94.jpg)
MARCOCHEUNG0124 commented 2022-08-03 04:21:19 +02:00 (Migrated from github.com)

THANK YOU SO MUCH FOR PROVIDING THESE CODES !!

THANK YOU SO MUCH FOR PROVIDING THESE CODES !!
BbChip0103 commented 2022-09-15 07:58:41 +02:00 (Migrated from github.com)

Really thank you for your great work!
It works well

Really thank you for your great work! It works well
renmengyuan commented 2023-04-06 08:22:28 +02:00 (Migrated from github.com)

Hi, as you said, Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural.

I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?

Hi, as you said, **_Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural_**. I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?
netrunner-exe commented 2023-04-06 09:28:42 +02:00 (Migrated from github.com)

Hi, as you said, Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural.

I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?

I don't think I said anywhere that I used ffhq_face_aligned to train the model (as you say). Moreover, I didn't train the model at all, but used a model that another user posted for the test. In this case, 'none' or 'ffhq' means mode – exactly how to crop and align the face before sending it to further processing. Which mode to use depends on how you cropped and aligned the dataset on which the model was trained.

> Hi, as you said, **_Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural_**. > > I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ? I don't think I said anywhere that I used ffhq_face_aligned to train the model (as you say). Moreover, I didn't train the model at all, but used a model that another user posted for the test. In this case, 'none' or 'ffhq' means mode – exactly how to crop and align the face before sending it to further processing. Which mode to use depends on how you cropped and aligned the dataset on which the model was trained.
renmengyuan commented 2023-04-06 09:52:50 +02:00 (Migrated from github.com)

Hi, as you said, Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural.
I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ?

I don't think I said anywhere that I used ffhq_face_aligned to train the model (as you say). Moreover, I didn't train the model at all, but used a model that another user posted for the test. In this case, 'none' or 'ffhq' means mode – exactly how to crop and align the face before sending it to further processing. Which mode to use depends on how you cropped and aligned the dataset on which the model was trained.

I see, maybe the model that another user posted was trained using arc_face_align.

> > Hi, as you said, **_Also if you change mode = 'ffhq' to mode = 'None' in test_wholeimage_swapsingle and test_video_swapsingle. It looks more natural_**. > > I am confused, as you said, ffhq_face_aligned was used when you trained the model, arc_face_align will be better than ffhq_face_align when you test the model ? > > I don't think I said anywhere that I used ffhq_face_aligned to train the model (as you say). Moreover, I didn't train the model at all, but used a model that another user posted for the test. In this case, 'none' or 'ffhq' means mode – exactly how to crop and align the face before sending it to further processing. Which mode to use depends on how you cropped and aligned the dataset on which the model was trained. I see, maybe the model that another user posted was trained using arc_face_align.
Sign in to join this conversation.