This commit is contained in:
chenxuanhong
2022-03-19 00:43:03 +08:00
parent f65c0dfa09
commit 489a6c5f68
49 changed files with 10736 additions and 5 deletions
+5
View File
@@ -0,0 +1,5 @@
**__pycache__/
.vscode
bak*/
work_dirs/
models/
+136
View File
@@ -0,0 +1,136 @@
# Distributed Arcface Training in Pytorch
This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions
identity on a single server.
## Requirements
- Install [PyTorch](http://pytorch.org) (torch>=1.6.0), our doc for [install.md](docs/install.md).
- (Optional) Install [DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/), our doc for [install_dali.md](docs/install_dali.md).
- `pip install -r requirement.txt`.
## How to Training
To train a model, run `train.py` with the path to the configs.
The example commands below show how to run
distributed training.
### 1. To run on a machine with 8 GPUs:
```shell
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=12581 train.py configs/ms1mv3_r50_lr02
```
### 2. To run on 2 machines with 8 GPUs each:
Node 0:
```shell
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus
```
Node 1:
```shell
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=12581 train.py configs/webface42m_r100_lr01_pfc02_bs4k_16gpus
```
## Download Datasets or Prepare Datasets
- [MS1MV3](https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_#ms1m-retinaface) (93k IDs, 5.2M images)
- [Glint360K](https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc#4-download) (360k IDs, 17.1M images)
- [WebFace42M](docs/prepare_webface42m.md) (2M IDs, 42.5M images)
## Model Zoo
- The models are available for non-commercial research purposes only.
- All models can be found in here.
- [Baidu Yun Pan](https://pan.baidu.com/s/1CL-l4zWqsI1oDuEEYVhj-g): e8pw
- [OneDrive](https://1drv.ms/u/s!AswpsDO2toNKq0lWY69vN58GR6mw?e=p9Ov5d)
### Performance on IJB-C and [**ICCV2021-MFR**](https://github.com/deepinsight/insightface/blob/master/challenges/mfr/README.md)
ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face
recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities.
As the result, we can evaluate the FAIR performance for different algorithms.
For **ICCV2021-MFR-ALL** set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The
globalised multi-racial testset contains 242,143 identities and 1,624,305 images.
| Datasets | Backbone | **MFR-ALL** | IJB-C(1E-4) | IJB-C(1E-5) | Training Throughout | log |
|:-------------------------|:-----------|:------------|:------------|:------------|:--------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MS1MV3 | mobileface | 65.76 | 94.44 | 91.85 | ~13000 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_mobileface_lr02/training.log)\|[config](configs/ms1mv3_mobileface_lr02.py) |
| Glint360K | mobileface | 69.83 | 95.17 | 92.58 | -11000 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/glint360k_mobileface_lr02_bs4k/training.log)\|[config](configs/glint360k_mobileface_lr02_bs4k.py) |
| WebFace42M-PartialFC-0.2 | mobileface | 73.80 | 95.40 | 92.64 | (16GPUs)~18583 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/webface42m_mobilefacenet_pfc02_bs8k_16gpus/training.log)\|[config](configs/webface42m_mobilefacenet_pfc02_bs8k_16gpus.py) |
| MS1MV3 | r100 | 83.23 | 96.88 | 95.31 | ~3400 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/ms1mv3_r100_lr02/training.log)\|[config](configs/ms1mv3_r100_lr02.py) |
| Glint360K | r100 | 90.86 | 97.53 | 96.43 | ~5000 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/glint360k_r100_lr02_bs4k_16gpus/training.log)\|[config](configs/glint360k_r100_lr02_bs4k_16gpus.py) |
| WebFace42M-PartialFC-0.2 | r50(bs4k) | 93.83 | 97.53 | 96.16 | (8 GPUs)~5900 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/webface42m_r50_bs4k_pfc02/training.log)\|[config](configs/webface42m_r50_lr01_pfc02_bs4k_8gpus.py) |
| WebFace42M-PartialFC-0.2 | r50(bs8k) | 93.96 | 97.46 | 96.12 | (16GPUs)~11000 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/webface42m_r50_lr01_pfc02_bs8k_16gpus/training.log)\|[config](configs/webface42m_r50_lr01_pfc02_bs8k_16gpus.py) |
| WebFace42M-PartialFC-0.2 | r50(bs4k) | 94.04 | 97.48 | 95.94 | (32GPUs)~17000 | log\|[config](configs/webface42m_r50_lr01_pfc02_bs4k_32gpus.py) |
| WebFace42M-PartialFC-0.2 | r100(bs4k) | 96.69 | 97.85 | 96.63 | (16GPUs)~5200 | [log](https://raw.githubusercontent.com/anxiangsir/insightface_arcface_log/master/webface42m_r100_bs4k_pfc02/training.log)\|[config](configs/webface42m_r100_lr01_pfc02_bs4k_16gpus.py) |
| WebFace42M-PartialFC-0.2 | r200 | - | - | - | - | log\|config |
`PartialFC-0.2` means negivate class centers sample rate is 0.2.
## Speed Benchmark
`arcface_torch` can train large-scale face recognition training set efficiently and quickly. When the number of
classes in training sets is greater than 1 Million, partial fc sampling strategy will get same
accuracy with several times faster training performance and smaller GPU memory.
Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a
sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a
sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC,
we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed
training and mixed precision training.
![Image text](https://github.com/anxiangsir/insightface_arcface_log/blob/master/partial_fc_v2.png)
More details see
[speed_benchmark.md](docs/speed_benchmark.md) in docs.
### 1. Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)
`-` means training failed because of gpu memory limitations.
| Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
|:--------------------------------|:--------------|:---------------|:---------------|
| 125000 | 4681 | 4824 | 5004 |
| 1400000 | **1672** | 3043 | 4738 |
| 5500000 | **-** | **1389** | 3975 |
| 8000000 | **-** | **-** | 3565 |
| 16000000 | **-** | **-** | 2679 |
| 29000000 | **-** | **-** | **1855** |
### 2. GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)
| Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
|:--------------------------------|:--------------|:---------------|:---------------|
| 125000 | 7358 | 5306 | 4868 |
| 1400000 | 32252 | 11178 | 6056 |
| 5500000 | **-** | 32188 | 9854 |
| 8000000 | **-** | **-** | 12310 |
| 16000000 | **-** | **-** | 19950 |
| 29000000 | **-** | **-** | 32324 |
## Citations
```
@inproceedings{deng2019arcface,
title={Arcface: Additive angular margin loss for deep face recognition},
author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={4690--4699},
year={2019}
}
@inproceedings{an2020partical_fc,
title={Partial FC: Training 10 Million Identities on a Single Machine},
author={An, Xiang and Zhu, Xuhan and Xiao, Yang and Wu, Lan and Zhang, Ming and Gao, Yuan and Qin, Bin and
Zhang, Debing and Fu Ying},
booktitle={Arxiv 2010.05222},
year={2020}
}
```
+25
View File
@@ -0,0 +1,25 @@
from .iresnet import iresnet18, iresnet34, iresnet50, iresnet100, iresnet200
from .mobilefacenet import get_mbf
def get_model(name, **kwargs):
# resnet
if name == "r18":
return iresnet18(False, **kwargs)
elif name == "r34":
return iresnet34(False, **kwargs)
elif name == "r50":
return iresnet50(False, **kwargs)
elif name == "r100":
return iresnet100(False, **kwargs)
elif name == "r200":
return iresnet200(False, **kwargs)
elif name == "r2060":
from .iresnet2060 import iresnet2060
return iresnet2060(False, **kwargs)
elif name == "mbf":
fp16 = kwargs.get("fp16", False)
num_features = kwargs.get("num_features", 512)
return get_mbf(fp16=fp16, num_features=num_features)
else:
raise ValueError()
+186
View File
@@ -0,0 +1,186 @@
import torch
from torch import nn
__all__ = ['iresnet18', 'iresnet34', 'iresnet50', 'iresnet100', 'iresnet200']
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes,
out_planes,
kernel_size=3,
stride=stride,
padding=dilation,
groups=groups,
bias=False,
dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_planes,
out_planes,
kernel_size=1,
stride=stride,
bias=False)
class IBasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None,
groups=1, base_width=64, dilation=1):
super(IBasicBlock, self).__init__()
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
self.bn1 = nn.BatchNorm2d(inplanes, eps=1e-05,)
self.conv1 = conv3x3(inplanes, planes)
self.bn2 = nn.BatchNorm2d(planes, eps=1e-05,)
self.prelu = nn.PReLU(planes)
self.conv2 = conv3x3(planes, planes, stride)
self.bn3 = nn.BatchNorm2d(planes, eps=1e-05,)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.bn1(x)
out = self.conv1(out)
out = self.bn2(out)
out = self.prelu(out)
out = self.conv2(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
return out
class IResNet(nn.Module):
fc_scale = 7 * 7
def __init__(self,
block, layers, dropout=0, num_features=512, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None, fp16=False):
super(IResNet, self).__init__()
self.fp16 = fp16
self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes, eps=1e-05)
self.prelu = nn.PReLU(self.inplanes)
self.layer1 = self._make_layer(block, 64, layers[0], stride=2)
self.layer2 = self._make_layer(block,
128,
layers[1],
stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block,
256,
layers[2],
stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block,
512,
layers[3],
stride=2,
dilate=replace_stride_with_dilation[2])
self.bn2 = nn.BatchNorm2d(512 * block.expansion, eps=1e-05,)
self.dropout = nn.Dropout(p=dropout, inplace=True)
self.fc = nn.Linear(512 * block.expansion * self.fc_scale, num_features)
self.features = nn.BatchNorm1d(num_features, eps=1e-05)
nn.init.constant_(self.features.weight, 1.0)
self.features.weight.requires_grad = False
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, 0, 0.1)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
if zero_init_residual:
for m in self.modules():
if isinstance(m, IBasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
nn.BatchNorm2d(planes * block.expansion, eps=1e-05, ),
)
layers = []
layers.append(
block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(
block(self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
dilation=self.dilation))
return nn.Sequential(*layers)
def forward(self, x):
with torch.cuda.amp.autocast(self.fp16):
x = self.conv1(x)
x = self.bn1(x)
x = self.prelu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.bn2(x)
x = torch.flatten(x, 1)
x = self.dropout(x)
x = self.fc(x.float() if self.fp16 else x)
x = self.features(x)
return x
def _iresnet(arch, block, layers, pretrained, progress, **kwargs):
model = IResNet(block, layers, **kwargs)
if pretrained:
raise ValueError()
return model
def iresnet18(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet18', IBasicBlock, [2, 2, 2, 2], pretrained,
progress, **kwargs)
def iresnet34(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet34', IBasicBlock, [3, 4, 6, 3], pretrained,
progress, **kwargs)
def iresnet50(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet50', IBasicBlock, [3, 4, 14, 3], pretrained,
progress, **kwargs)
def iresnet100(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet100', IBasicBlock, [3, 13, 30, 3], pretrained,
progress, **kwargs)
def iresnet200(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet200', IBasicBlock, [6, 26, 60, 6], pretrained,
progress, **kwargs)
+176
View File
@@ -0,0 +1,176 @@
import torch
from torch import nn
assert torch.__version__ >= "1.8.1"
from torch.utils.checkpoint import checkpoint_sequential
__all__ = ['iresnet2060']
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes,
out_planes,
kernel_size=3,
stride=stride,
padding=dilation,
groups=groups,
bias=False,
dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_planes,
out_planes,
kernel_size=1,
stride=stride,
bias=False)
class IBasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None,
groups=1, base_width=64, dilation=1):
super(IBasicBlock, self).__init__()
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
self.bn1 = nn.BatchNorm2d(inplanes, eps=1e-05, )
self.conv1 = conv3x3(inplanes, planes)
self.bn2 = nn.BatchNorm2d(planes, eps=1e-05, )
self.prelu = nn.PReLU(planes)
self.conv2 = conv3x3(planes, planes, stride)
self.bn3 = nn.BatchNorm2d(planes, eps=1e-05, )
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.bn1(x)
out = self.conv1(out)
out = self.bn2(out)
out = self.prelu(out)
out = self.conv2(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
return out
class IResNet(nn.Module):
fc_scale = 7 * 7
def __init__(self,
block, layers, dropout=0, num_features=512, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None, fp16=False):
super(IResNet, self).__init__()
self.fp16 = fp16
self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes, eps=1e-05)
self.prelu = nn.PReLU(self.inplanes)
self.layer1 = self._make_layer(block, 64, layers[0], stride=2)
self.layer2 = self._make_layer(block,
128,
layers[1],
stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block,
256,
layers[2],
stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block,
512,
layers[3],
stride=2,
dilate=replace_stride_with_dilation[2])
self.bn2 = nn.BatchNorm2d(512 * block.expansion, eps=1e-05, )
self.dropout = nn.Dropout(p=dropout, inplace=True)
self.fc = nn.Linear(512 * block.expansion * self.fc_scale, num_features)
self.features = nn.BatchNorm1d(num_features, eps=1e-05)
nn.init.constant_(self.features.weight, 1.0)
self.features.weight.requires_grad = False
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, 0, 0.1)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
if zero_init_residual:
for m in self.modules():
if isinstance(m, IBasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
nn.BatchNorm2d(planes * block.expansion, eps=1e-05, ),
)
layers = []
layers.append(
block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(
block(self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
dilation=self.dilation))
return nn.Sequential(*layers)
def checkpoint(self, func, num_seg, x):
if self.training:
return checkpoint_sequential(func, num_seg, x)
else:
return func(x)
def forward(self, x):
with torch.cuda.amp.autocast(self.fp16):
x = self.conv1(x)
x = self.bn1(x)
x = self.prelu(x)
x = self.layer1(x)
x = self.checkpoint(self.layer2, 20, x)
x = self.checkpoint(self.layer3, 100, x)
x = self.layer4(x)
x = self.bn2(x)
x = torch.flatten(x, 1)
x = self.dropout(x)
x = self.fc(x.float() if self.fp16 else x)
x = self.features(x)
return x
def _iresnet(arch, block, layers, pretrained, progress, **kwargs):
model = IResNet(block, layers, **kwargs)
if pretrained:
raise ValueError()
return model
def iresnet2060(pretrained=False, progress=True, **kwargs):
return _iresnet('iresnet2060', IBasicBlock, [3, 128, 1024 - 128, 3], pretrained, progress, **kwargs)
+130
View File
@@ -0,0 +1,130 @@
'''
Adapted from https://github.com/cavalleria/cavaface.pytorch/blob/master/backbone/mobilefacenet.py
Original author cavalleria
'''
import torch.nn as nn
from torch.nn import Linear, Conv2d, BatchNorm1d, BatchNorm2d, PReLU, Sequential, Module
import torch
class Flatten(Module):
def forward(self, x):
return x.view(x.size(0), -1)
class ConvBlock(Module):
def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1):
super(ConvBlock, self).__init__()
self.layers = nn.Sequential(
Conv2d(in_c, out_c, kernel, groups=groups, stride=stride, padding=padding, bias=False),
BatchNorm2d(num_features=out_c),
PReLU(num_parameters=out_c)
)
def forward(self, x):
return self.layers(x)
class LinearBlock(Module):
def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1):
super(LinearBlock, self).__init__()
self.layers = nn.Sequential(
Conv2d(in_c, out_c, kernel, stride, padding, groups=groups, bias=False),
BatchNorm2d(num_features=out_c)
)
def forward(self, x):
return self.layers(x)
class DepthWise(Module):
def __init__(self, in_c, out_c, residual=False, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=1):
super(DepthWise, self).__init__()
self.residual = residual
self.layers = nn.Sequential(
ConvBlock(in_c, out_c=groups, kernel=(1, 1), padding=(0, 0), stride=(1, 1)),
ConvBlock(groups, groups, groups=groups, kernel=kernel, padding=padding, stride=stride),
LinearBlock(groups, out_c, kernel=(1, 1), padding=(0, 0), stride=(1, 1))
)
def forward(self, x):
short_cut = None
if self.residual:
short_cut = x
x = self.layers(x)
if self.residual:
output = short_cut + x
else:
output = x
return output
class Residual(Module):
def __init__(self, c, num_block, groups, kernel=(3, 3), stride=(1, 1), padding=(1, 1)):
super(Residual, self).__init__()
modules = []
for _ in range(num_block):
modules.append(DepthWise(c, c, True, kernel, stride, padding, groups))
self.layers = Sequential(*modules)
def forward(self, x):
return self.layers(x)
class GDC(Module):
def __init__(self, embedding_size):
super(GDC, self).__init__()
self.layers = nn.Sequential(
LinearBlock(512, 512, groups=512, kernel=(7, 7), stride=(1, 1), padding=(0, 0)),
Flatten(),
Linear(512, embedding_size, bias=False),
BatchNorm1d(embedding_size))
def forward(self, x):
return self.layers(x)
class MobileFaceNet(Module):
def __init__(self, fp16=False, num_features=512):
super(MobileFaceNet, self).__init__()
scale = 2
self.fp16 = fp16
self.layers = nn.Sequential(
ConvBlock(3, 64 * scale, kernel=(3, 3), stride=(2, 2), padding=(1, 1)),
ConvBlock(64 * scale, 64 * scale, kernel=(3, 3), stride=(1, 1), padding=(1, 1), groups=64),
DepthWise(64 * scale, 64 * scale, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=128),
Residual(64 * scale, num_block=4, groups=128, kernel=(3, 3), stride=(1, 1), padding=(1, 1)),
DepthWise(64 * scale, 128 * scale, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=256),
Residual(128 * scale, num_block=6, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1)),
DepthWise(128 * scale, 128 * scale, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=512),
Residual(128 * scale, num_block=2, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1)),
)
self.conv_sep = ConvBlock(128 * scale, 512, kernel=(1, 1), stride=(1, 1), padding=(0, 0))
self.features = GDC(num_features)
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
m.bias.data.zero_()
def forward(self, x):
with torch.cuda.amp.autocast(self.fp16):
x = self.layers(x)
x = self.conv_sep(x.float() if self.fp16 else x)
x = self.features(x)
return x
def get_mbf(fp16, num_features):
return MobileFaceNet(fp16, num_features)
+22
View File
@@ -0,0 +1,22 @@
from easydict import EasyDict as edict
# configs for test speed
config = edict()
config.loss = "cosface"
config.network = "r50"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 0.99
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 64 # total_batch_size = batch_size * num_gpus
config.lr = 0.1 # batch size is 512
config.rec = "synthetic"
config.num_classes = 300 * 10000
config.num_epoch = 30
config.warmup_epoch = -1
config.val_targets = []
View File
+47
View File
@@ -0,0 +1,47 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "arcface"
config.network = "r50"
config.resume = False
config.output = "ms1mv3_arcface_r50"
config.embedding_size = 512
config.sample_rate = 1
config.fp16 = False
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 128
config.lr = 0.1 # batch size is 512
config.dali = False
config.verbose = 2000
config.frequent = 10
config.score = None
# if config.dataset == "emore":
# config.rec = "/train_tmp/faces_emore"
# config.num_classes = 85742
# config.num_image = 5822653
# config.num_epoch = 16
# config.warmup_epoch = -1
# config.val_targets = ["lfw", ]
# elif config.dataset == "ms1m-retinaface-t1":
# config.rec = "/train_tmp/ms1m-retinaface-t1"
# config.num_classes = 93431
# config.num_image = 5179510
# config.num_epoch = 25
# config.warmup_epoch = -1
# config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
# elif config.dataset == "glint360k":
# config.rec = "/train_tmp/glint360k"
# config.num_classes = 360232
# config.num_image = 17091657
# config.num_epoch = 20
# config.warmup_epoch = -1
# config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "mbf"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 1.0
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 1e-4
config.batch_size = 512
config.lr = 0.4
config.verbose = 5000
config.dali = False
config.rec = "/train_tmp/glint360k"
config.num_classes = 360232
config.num_image = 17091657
config.num_epoch = 20
config.warmup_epoch = 2
config.val_targets = ['lfw', 'cfp_fp', "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "r100"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 1.0
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 256
config.lr = 0.4
config.verbose = 5000
config.dali = False
config.rec = "/train_tmp/glint360k"
config.num_classes = 360232
config.num_image = 17091657
config.num_epoch = 20
config.warmup_epoch = 2
config.val_targets = ['lfw', 'cfp_fp', "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "arcface"
config.network = "mbf"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 1.0
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 1e-4
config.batch_size = 256
config.lr = 0.2
config.verbose = 5000
config.dali = False
config.rec = "/train_tmp/ms1m-retinaface-t1"
config.num_classes = 93431
config.num_image = 5179510
config.num_epoch = 40
config.warmup_epoch = 2
config.val_targets = ['lfw', 'cfp_fp', "agedb_30"]
+27
View File
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "arcface"
config.network = "r100"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 1.0
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 128
config.lr = 0.2
config.verbose = 2000
config.dali = False
config.rec = "/train_tmp/ms1m-retinaface-t1"
config.num_classes = 93431
config.num_image = 5179510
config.num_epoch = 25
config.warmup_epoch = 0
config.val_targets = ['lfw', 'cfp_fp', "agedb_30"]
+27
View File
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "arcface"
config.network = "r50"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 1.0
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 128
config.lr = 0.2
config.verbose = 2000
config.dali = False
config.rec = "/train_tmp/ms1m-retinaface-t1"
config.num_classes = 93431
config.num_image = 5179510
config.num_epoch = 25
config.warmup_epoch = 2
config.val_targets = ['lfw', 'cfp_fp', "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "mbf"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 0.2
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 1e-4
config.batch_size = 512
config.lr = 0.4
config.verbose = 10000
config.dali = False
config.rec = "/train_tmp/WebFace42M"
config.num_classes = 2059906
config.num_image = 42474557
config.num_epoch = 20
config.warmup_epoch = 2
config.val_targets = []
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "r100"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 0.2
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 256
config.lr = 0.3
config.verbose = 2000
config.dali = False
config.rec = "/train_tmp/WebFace42M"
config.num_classes = 2059906
config.num_image = 42474557
config.num_epoch = 20
config.warmup_epoch = 1
config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "r50"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 0.2
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 128
config.lr = 0.4
config.verbose = 10000
config.dali = False
config.rec = "/train_tmp/WebFace42M"
config.num_classes = 2059906
config.num_image = 42474557
config.num_epoch = 20
config.warmup_epoch = 2
config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "r50"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 0.2
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 512
config.lr = 0.4
config.verbose = 10000
config.dali = False
config.rec = "/train_tmp/WebFace42M"
config.num_classes = 2059906
config.num_image = 42474557
config.num_epoch = 20
config.warmup_epoch = 2
config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
@@ -0,0 +1,27 @@
from easydict import EasyDict as edict
# make training faster
# our RAM is 256G
# mount -t tmpfs -o size=140G tmpfs /train_tmp
config = edict()
config.loss = "cosface"
config.network = "r50"
config.resume = False
config.output = None
config.embedding_size = 512
config.sample_rate = 0.2
config.fp16 = True
config.momentum = 0.9
config.weight_decay = 5e-4
config.batch_size = 512
config.lr = 0.6
config.verbose = 10000
config.dali = False
config.rec = "/train_tmp/WebFace42M"
config.num_classes = 2059906
config.num_image = 42474557
config.num_epoch = 20
config.warmup_epoch = 4
config.val_targets = ["lfw", "cfp_fp", "agedb_30"]
+209
View File
@@ -0,0 +1,209 @@
import numbers
import os
import queue as Queue
import threading
from typing import Iterable
import mxnet as mx
import numpy as np
import torch
from torch import distributed
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
def get_dataloader(
root_dir: str,
local_rank: int,
batch_size: int,
dali = False) -> Iterable:
if dali and root_dir != "synthetic":
rec = os.path.join(root_dir, 'train.rec')
idx = os.path.join(root_dir, 'train.idx')
return dali_data_iter(
batch_size=batch_size, rec_file=rec,
idx_file=idx, num_threads=2, local_rank=local_rank)
else:
if root_dir == "synthetic":
train_set = SyntheticDataset()
else:
train_set = MXFaceDataset(root_dir=root_dir, local_rank=local_rank)
train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, shuffle=True)
train_loader = DataLoaderX(
local_rank=local_rank,
dataset=train_set,
batch_size=batch_size,
sampler=train_sampler,
num_workers=2,
pin_memory=True,
drop_last=True,
)
return train_loader
class BackgroundGenerator(threading.Thread):
def __init__(self, generator, local_rank, max_prefetch=6):
super(BackgroundGenerator, self).__init__()
self.queue = Queue.Queue(max_prefetch)
self.generator = generator
self.local_rank = local_rank
self.daemon = True
self.start()
def run(self):
torch.cuda.set_device(self.local_rank)
for item in self.generator:
self.queue.put(item)
self.queue.put(None)
def next(self):
next_item = self.queue.get()
if next_item is None:
raise StopIteration
return next_item
def __next__(self):
return self.next()
def __iter__(self):
return self
class DataLoaderX(DataLoader):
def __init__(self, local_rank, **kwargs):
super(DataLoaderX, self).__init__(**kwargs)
self.stream = torch.cuda.Stream(local_rank)
self.local_rank = local_rank
def __iter__(self):
self.iter = super(DataLoaderX, self).__iter__()
self.iter = BackgroundGenerator(self.iter, self.local_rank)
self.preload()
return self
def preload(self):
self.batch = next(self.iter, None)
if self.batch is None:
return None
with torch.cuda.stream(self.stream):
for k in range(len(self.batch)):
self.batch[k] = self.batch[k].to(device=self.local_rank, non_blocking=True)
def __next__(self):
torch.cuda.current_stream().wait_stream(self.stream)
batch = self.batch
if batch is None:
raise StopIteration
self.preload()
return batch
class MXFaceDataset(Dataset):
def __init__(self, root_dir, local_rank):
super(MXFaceDataset, self).__init__()
self.transform = transforms.Compose(
[transforms.ToPILImage(),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])
self.root_dir = root_dir
self.local_rank = local_rank
path_imgrec = os.path.join(root_dir, 'train.rec')
path_imgidx = os.path.join(root_dir, 'train.idx')
self.imgrec = mx.recordio.MXIndexedRecordIO(path_imgidx, path_imgrec, 'r')
s = self.imgrec.read_idx(0)
header, _ = mx.recordio.unpack(s)
if header.flag > 0:
self.header0 = (int(header.label[0]), int(header.label[1]))
self.imgidx = np.array(range(1, int(header.label[0])))
else:
self.imgidx = np.array(list(self.imgrec.keys))
def __getitem__(self, index):
idx = self.imgidx[index]
s = self.imgrec.read_idx(idx)
header, img = mx.recordio.unpack(s)
label = header.label
if not isinstance(label, numbers.Number):
label = label[0]
label = torch.tensor(label, dtype=torch.long)
sample = mx.image.imdecode(img).asnumpy()
if self.transform is not None:
sample = self.transform(sample)
return sample, label
def __len__(self):
return len(self.imgidx)
class SyntheticDataset(Dataset):
def __init__(self):
super(SyntheticDataset, self).__init__()
img = np.random.randint(0, 255, size=(112, 112, 3), dtype=np.int32)
img = np.transpose(img, (2, 0, 1))
img = torch.from_numpy(img).squeeze(0).float()
img = ((img / 255) - 0.5) / 0.5
self.img = img
self.label = 1
def __getitem__(self, index):
return self.img, self.label
def __len__(self):
return 1000000
def dali_data_iter(
batch_size: int, rec_file: str, idx_file: str, num_threads: int,
initial_fill=32768, random_shuffle=True,
prefetch_queue_depth=1, local_rank=0, name="reader",
mean=(127.5, 127.5, 127.5),
std=(127.5, 127.5, 127.5)):
"""
Parameters:
----------
initial_fill: int
Size of the buffer that is used for shuffling. If random_shuffle is False, this parameter is ignored.
"""
rank: int = distributed.get_rank()
world_size: int = distributed.get_world_size()
import nvidia.dali.fn as fn
import nvidia.dali.types as types
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin.pytorch import DALIClassificationIterator
pipe = Pipeline(
batch_size=batch_size, num_threads=num_threads,
device_id=local_rank, prefetch_queue_depth=prefetch_queue_depth, )
condition_flip = fn.random.coin_flip(probability=0.5)
with pipe:
jpegs, labels = fn.readers.mxnet(
path=rec_file, index_path=idx_file, initial_fill=initial_fill,
num_shards=world_size, shard_id=rank,
random_shuffle=random_shuffle, pad_last_batch=False, name=name)
images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
images = fn.crop_mirror_normalize(
images, dtype=types.FLOAT, mean=mean, std=std, mirror=condition_flip)
pipe.set_outputs(images, labels)
pipe.build()
return DALIWarper(DALIClassificationIterator(pipelines=[pipe], reader_name=name, ))
@torch.no_grad()
class DALIWarper(object):
def __init__(self, dali_iter):
self.iter = dali_iter
def __next__(self):
data_dict = self.iter.__next__()[0]
tensor_data = data_dict['data'].cuda()
tensor_label: torch.Tensor = data_dict['label'].cuda().long()
tensor_label.squeeze_()
return tensor_data, tensor_label
def __iter__(self):
return self
def reset(self):
self.iter.reset()
+31
View File
@@ -0,0 +1,31 @@
## Eval on ICCV2021-MFR
coming soon.
## Eval IJBC
You can eval ijbc with pytorch or onnx.
1. Eval IJBC With Onnx
```shell
CUDA_VISIBLE_DEVICES=0 python onnx_ijbc.py --model-root ms1mv3_arcface_r50 --image-path IJB_release/IJBC --result-dir ms1mv3_arcface_r50
```
2. Eval IJBC With Pytorch
```shell
CUDA_VISIBLE_DEVICES=0,1 python eval_ijbc.py \
--model-prefix ms1mv3_arcface_r50/backbone.pth \
--image-path IJB_release/IJBC \
--result-dir ms1mv3_arcface_r50 \
--batch-size 128 \
--job ms1mv3_arcface_r50 \
--target IJBC \
--network iresnet50
```
## Inference
```shell
python inference.py --weight ms1mv3_arcface_r50/backbone.pth --network r50
```
+51
View File
@@ -0,0 +1,51 @@
## v1.8.0
### Linux and Windows
```shell
# CUDA 11.0
pip --default-timeout=100 install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip --default-timeout=100 install torch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0
# CPU only
pip --default-timeout=100 install torch==1.8.0+cpu torchvision==0.9.0+cpu torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
```
## v1.7.1
### Linux and Windows
```shell
# CUDA 11.0
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
# CUDA 10.1
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 9.2
pip install torch==1.7.1+cu92 torchvision==0.8.2+cu92 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
# CPU only
pip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
```
## v1.6.0
### Linux and Windows
```shell
# CUDA 10.2
pip install torch==1.6.0 torchvision==0.7.0
# CUDA 10.1
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 9.2
pip install torch==1.6.0+cu92 torchvision==0.7.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
# CPU only
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
```
+1
View File
@@ -0,0 +1 @@
TODO
View File
+22
View File
@@ -0,0 +1,22 @@
## 1. Download Datasets and Unzip
Download WebFace42M from [https://www.face-benchmark.org/download.html](https://www.face-benchmark.org/download.html).
## 2. Create **Pre-shuffle** Rec File for DALI
Note: preshuffled rec is very important to DALI, and rec without preshuffled can cause performance degradation, origin insightface style rec file
do not support Nvidia DALI, you must follow this command [mxnet.tools.im2rec](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) to generate a pre-shuffle rec file.
```shell
# 1) create train.lst using follow command
python -m mxnet.tools.im2rec --list --recursive train "Your WebFace42M Root"
# 2) create train.rec and train.idx using train.lst using following command
python -m mxnet.tools.im2rec --num-thread 16 --quality 100 train "Your WebFace42M Root"
```
Finally, you will get three files: `train.lst`, `train.rec`, `train.idx`. which `train.idx`, `train.rec` are using for training.
+93
View File
@@ -0,0 +1,93 @@
## Test Training Speed
- Test Commands
You need to use the following two commands to test the Partial FC training performance.
The number of identites is **3 millions** (synthetic data), turn mixed precision training on, backbone is resnet50,
batch size is 1024.
```shell
# Model Parallel
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/3millions
# Partial FC 0.1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/3millions_pfc
```
- GPU Memory
```
# (Model Parallel) gpustat -i
[0] Tesla V100-SXM2-32GB | 64'C, 94 % | 30338 / 32510 MB
[1] Tesla V100-SXM2-32GB | 60'C, 99 % | 28876 / 32510 MB
[2] Tesla V100-SXM2-32GB | 60'C, 99 % | 28872 / 32510 MB
[3] Tesla V100-SXM2-32GB | 69'C, 99 % | 28872 / 32510 MB
[4] Tesla V100-SXM2-32GB | 66'C, 99 % | 28888 / 32510 MB
[5] Tesla V100-SXM2-32GB | 60'C, 99 % | 28932 / 32510 MB
[6] Tesla V100-SXM2-32GB | 68'C, 100 % | 28916 / 32510 MB
[7] Tesla V100-SXM2-32GB | 65'C, 99 % | 28860 / 32510 MB
# (Partial FC 0.1) gpustat -i
[0] Tesla V100-SXM2-32GB | 60'C, 95 % | 10488 / 32510 MB │·······················
[1] Tesla V100-SXM2-32GB | 60'C, 97 % | 10344 / 32510 MB │·······················
[2] Tesla V100-SXM2-32GB | 61'C, 95 % | 10340 / 32510 MB │·······················
[3] Tesla V100-SXM2-32GB | 66'C, 95 % | 10340 / 32510 MB │·······················
[4] Tesla V100-SXM2-32GB | 65'C, 94 % | 10356 / 32510 MB │·······················
[5] Tesla V100-SXM2-32GB | 61'C, 95 % | 10400 / 32510 MB │·······················
[6] Tesla V100-SXM2-32GB | 68'C, 96 % | 10384 / 32510 MB │·······················
[7] Tesla V100-SXM2-32GB | 64'C, 95 % | 10328 / 32510 MB │·······················
```
- Training Speed
```python
# (Model Parallel) trainging.log
Training: Speed 2271.33 samples/sec Loss 1.1624 LearningRate 0.2000 Epoch: 0 Global Step: 100
Training: Speed 2269.94 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 150
Training: Speed 2272.67 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 200
Training: Speed 2266.55 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 250
Training: Speed 2272.54 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 300
# (Partial FC 0.1) trainging.log
Training: Speed 5299.56 samples/sec Loss 1.0965 LearningRate 0.2000 Epoch: 0 Global Step: 100
Training: Speed 5296.37 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 150
Training: Speed 5304.37 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 200
Training: Speed 5274.43 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 250
Training: Speed 5300.10 samples/sec Loss 0.0000 LearningRate 0.2000 Epoch: 0 Global Step: 300
```
In this test case, Partial FC 0.1 only use1 1/3 of the GPU memory of the model parallel,
and the training speed is 2.5 times faster than the model parallel.
## Speed Benchmark
1. Training speed of different parallel methods (samples/second), Tesla V100 32GB * 8. (Larger is better)
| Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
| :--- | :--- | :--- | :--- |
|125000 | 4681 | 4824 | 5004 |
|250000 | 4047 | 4521 | 4976 |
|500000 | 3087 | 4013 | 4900 |
|1000000 | 2090 | 3449 | 4803 |
|1400000 | 1672 | 3043 | 4738 |
|2000000 | - | 2593 | 4626 |
|4000000 | - | 1748 | 4208 |
|5500000 | - | 1389 | 3975 |
|8000000 | - | - | 3565 |
|16000000 | - | - | 2679 |
|29000000 | - | - | 1855 |
2. GPU memory cost of different parallel methods (GB per GPU), Tesla V100 32GB * 8. (Smaller is better)
| Number of Identities in Dataset | Data Parallel | Model Parallel | Partial FC 0.1 |
| :--- | :--- | :--- | :--- |
|125000 | 7358 | 5306 | 4868 |
|250000 | 9940 | 5826 | 5004 |
|500000 | 14220 | 7114 | 5202 |
|1000000 | 23708 | 9966 | 5620 |
|1400000 | 32252 | 11178 | 6056 |
|2000000 | - | 13978 | 6472 |
|4000000 | - | 23238 | 8284 |
|5500000 | - | 32188 | 9854 |
|8000000 | - | - | 12310 |
|16000000 | - | - | 19950 |
|29000000 | - | - | 32324 |
View File
+409
View File
@@ -0,0 +1,409 @@
"""Helper for evaluation on the Labeled Faces in the Wild dataset
"""
# MIT License
#
# Copyright (c) 2016 David Sandberg
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import datetime
import os
import pickle
import mxnet as mx
import numpy as np
import sklearn
import torch
from mxnet import ndarray as nd
from scipy import interpolate
from sklearn.decomposition import PCA
from sklearn.model_selection import KFold
class LFold:
def __init__(self, n_splits=2, shuffle=False):
self.n_splits = n_splits
if self.n_splits > 1:
self.k_fold = KFold(n_splits=n_splits, shuffle=shuffle)
def split(self, indices):
if self.n_splits > 1:
return self.k_fold.split(indices)
else:
return [(indices, indices)]
def calculate_roc(thresholds,
embeddings1,
embeddings2,
actual_issame,
nrof_folds=10,
pca=0):
assert (embeddings1.shape[0] == embeddings2.shape[0])
assert (embeddings1.shape[1] == embeddings2.shape[1])
nrof_pairs = min(len(actual_issame), embeddings1.shape[0])
nrof_thresholds = len(thresholds)
k_fold = LFold(n_splits=nrof_folds, shuffle=False)
tprs = np.zeros((nrof_folds, nrof_thresholds))
fprs = np.zeros((nrof_folds, nrof_thresholds))
accuracy = np.zeros((nrof_folds))
indices = np.arange(nrof_pairs)
if pca == 0:
diff = np.subtract(embeddings1, embeddings2)
dist = np.sum(np.square(diff), 1)
for fold_idx, (train_set, test_set) in enumerate(k_fold.split(indices)):
if pca > 0:
print('doing pca on', fold_idx)
embed1_train = embeddings1[train_set]
embed2_train = embeddings2[train_set]
_embed_train = np.concatenate((embed1_train, embed2_train), axis=0)
pca_model = PCA(n_components=pca)
pca_model.fit(_embed_train)
embed1 = pca_model.transform(embeddings1)
embed2 = pca_model.transform(embeddings2)
embed1 = sklearn.preprocessing.normalize(embed1)
embed2 = sklearn.preprocessing.normalize(embed2)
diff = np.subtract(embed1, embed2)
dist = np.sum(np.square(diff), 1)
# Find the best threshold for the fold
acc_train = np.zeros((nrof_thresholds))
for threshold_idx, threshold in enumerate(thresholds):
_, _, acc_train[threshold_idx] = calculate_accuracy(
threshold, dist[train_set], actual_issame[train_set])
best_threshold_index = np.argmax(acc_train)
for threshold_idx, threshold in enumerate(thresholds):
tprs[fold_idx, threshold_idx], fprs[fold_idx, threshold_idx], _ = calculate_accuracy(
threshold, dist[test_set],
actual_issame[test_set])
_, _, accuracy[fold_idx] = calculate_accuracy(
thresholds[best_threshold_index], dist[test_set],
actual_issame[test_set])
tpr = np.mean(tprs, 0)
fpr = np.mean(fprs, 0)
return tpr, fpr, accuracy
def calculate_accuracy(threshold, dist, actual_issame):
predict_issame = np.less(dist, threshold)
tp = np.sum(np.logical_and(predict_issame, actual_issame))
fp = np.sum(np.logical_and(predict_issame, np.logical_not(actual_issame)))
tn = np.sum(
np.logical_and(np.logical_not(predict_issame),
np.logical_not(actual_issame)))
fn = np.sum(np.logical_and(np.logical_not(predict_issame), actual_issame))
tpr = 0 if (tp + fn == 0) else float(tp) / float(tp + fn)
fpr = 0 if (fp + tn == 0) else float(fp) / float(fp + tn)
acc = float(tp + tn) / dist.size
return tpr, fpr, acc
def calculate_val(thresholds,
embeddings1,
embeddings2,
actual_issame,
far_target,
nrof_folds=10):
assert (embeddings1.shape[0] == embeddings2.shape[0])
assert (embeddings1.shape[1] == embeddings2.shape[1])
nrof_pairs = min(len(actual_issame), embeddings1.shape[0])
nrof_thresholds = len(thresholds)
k_fold = LFold(n_splits=nrof_folds, shuffle=False)
val = np.zeros(nrof_folds)
far = np.zeros(nrof_folds)
diff = np.subtract(embeddings1, embeddings2)
dist = np.sum(np.square(diff), 1)
indices = np.arange(nrof_pairs)
for fold_idx, (train_set, test_set) in enumerate(k_fold.split(indices)):
# Find the threshold that gives FAR = far_target
far_train = np.zeros(nrof_thresholds)
for threshold_idx, threshold in enumerate(thresholds):
_, far_train[threshold_idx] = calculate_val_far(
threshold, dist[train_set], actual_issame[train_set])
if np.max(far_train) >= far_target:
f = interpolate.interp1d(far_train, thresholds, kind='slinear')
threshold = f(far_target)
else:
threshold = 0.0
val[fold_idx], far[fold_idx] = calculate_val_far(
threshold, dist[test_set], actual_issame[test_set])
val_mean = np.mean(val)
far_mean = np.mean(far)
val_std = np.std(val)
return val_mean, val_std, far_mean
def calculate_val_far(threshold, dist, actual_issame):
predict_issame = np.less(dist, threshold)
true_accept = np.sum(np.logical_and(predict_issame, actual_issame))
false_accept = np.sum(
np.logical_and(predict_issame, np.logical_not(actual_issame)))
n_same = np.sum(actual_issame)
n_diff = np.sum(np.logical_not(actual_issame))
# print(true_accept, false_accept)
# print(n_same, n_diff)
val = float(true_accept) / float(n_same)
far = float(false_accept) / float(n_diff)
return val, far
def evaluate(embeddings, actual_issame, nrof_folds=10, pca=0):
# Calculate evaluation metrics
thresholds = np.arange(0, 4, 0.01)
embeddings1 = embeddings[0::2]
embeddings2 = embeddings[1::2]
tpr, fpr, accuracy = calculate_roc(thresholds,
embeddings1,
embeddings2,
np.asarray(actual_issame),
nrof_folds=nrof_folds,
pca=pca)
thresholds = np.arange(0, 4, 0.001)
val, val_std, far = calculate_val(thresholds,
embeddings1,
embeddings2,
np.asarray(actual_issame),
1e-3,
nrof_folds=nrof_folds)
return tpr, fpr, accuracy, val, val_std, far
@torch.no_grad()
def load_bin(path, image_size):
try:
with open(path, 'rb') as f:
bins, issame_list = pickle.load(f) # py2
except UnicodeDecodeError as e:
with open(path, 'rb') as f:
bins, issame_list = pickle.load(f, encoding='bytes') # py3
data_list = []
for flip in [0, 1]:
data = torch.empty((len(issame_list) * 2, 3, image_size[0], image_size[1]))
data_list.append(data)
for idx in range(len(issame_list) * 2):
_bin = bins[idx]
img = mx.image.imdecode(_bin)
if img.shape[1] != image_size[0]:
img = mx.image.resize_short(img, image_size[0])
img = nd.transpose(img, axes=(2, 0, 1))
for flip in [0, 1]:
if flip == 1:
img = mx.ndarray.flip(data=img, axis=2)
data_list[flip][idx][:] = torch.from_numpy(img.asnumpy())
if idx % 1000 == 0:
print('loading bin', idx)
print(data_list[0].shape)
return data_list, issame_list
@torch.no_grad()
def test(data_set, backbone, batch_size, nfolds=10):
print('testing verification..')
data_list = data_set[0]
issame_list = data_set[1]
embeddings_list = []
time_consumed = 0.0
for i in range(len(data_list)):
data = data_list[i]
embeddings = None
ba = 0
while ba < data.shape[0]:
bb = min(ba + batch_size, data.shape[0])
count = bb - ba
_data = data[bb - batch_size: bb]
time0 = datetime.datetime.now()
img = ((_data / 255) - 0.5) / 0.5
net_out: torch.Tensor = backbone(img)
_embeddings = net_out.detach().cpu().numpy()
time_now = datetime.datetime.now()
diff = time_now - time0
time_consumed += diff.total_seconds()
if embeddings is None:
embeddings = np.zeros((data.shape[0], _embeddings.shape[1]))
embeddings[ba:bb, :] = _embeddings[(batch_size - count):, :]
ba = bb
embeddings_list.append(embeddings)
_xnorm = 0.0
_xnorm_cnt = 0
for embed in embeddings_list:
for i in range(embed.shape[0]):
_em = embed[i]
_norm = np.linalg.norm(_em)
_xnorm += _norm
_xnorm_cnt += 1
_xnorm /= _xnorm_cnt
embeddings = embeddings_list[0].copy()
embeddings = sklearn.preprocessing.normalize(embeddings)
acc1 = 0.0
std1 = 0.0
embeddings = embeddings_list[0] + embeddings_list[1]
embeddings = sklearn.preprocessing.normalize(embeddings)
print(embeddings.shape)
print('infer time', time_consumed)
_, _, accuracy, val, val_std, far = evaluate(embeddings, issame_list, nrof_folds=nfolds)
acc2, std2 = np.mean(accuracy), np.std(accuracy)
return acc1, std1, acc2, std2, _xnorm, embeddings_list
def dumpR(data_set,
backbone,
batch_size,
name='',
data_extra=None,
label_shape=None):
print('dump verification embedding..')
data_list = data_set[0]
issame_list = data_set[1]
embeddings_list = []
time_consumed = 0.0
for i in range(len(data_list)):
data = data_list[i]
embeddings = None
ba = 0
while ba < data.shape[0]:
bb = min(ba + batch_size, data.shape[0])
count = bb - ba
_data = nd.slice_axis(data, axis=0, begin=bb - batch_size, end=bb)
time0 = datetime.datetime.now()
if data_extra is None:
db = mx.io.DataBatch(data=(_data,), label=(_label,))
else:
db = mx.io.DataBatch(data=(_data, _data_extra),
label=(_label,))
model.forward(db, is_train=False)
net_out = model.get_outputs()
_embeddings = net_out[0].asnumpy()
time_now = datetime.datetime.now()
diff = time_now - time0
time_consumed += diff.total_seconds()
if embeddings is None:
embeddings = np.zeros((data.shape[0], _embeddings.shape[1]))
embeddings[ba:bb, :] = _embeddings[(batch_size - count):, :]
ba = bb
embeddings_list.append(embeddings)
embeddings = embeddings_list[0] + embeddings_list[1]
embeddings = sklearn.preprocessing.normalize(embeddings)
actual_issame = np.asarray(issame_list)
outname = os.path.join('temp.bin')
with open(outname, 'wb') as f:
pickle.dump((embeddings, issame_list),
f,
protocol=pickle.HIGHEST_PROTOCOL)
# if __name__ == '__main__':
#
# parser = argparse.ArgumentParser(description='do verification')
# # general
# parser.add_argument('--data-dir', default='', help='')
# parser.add_argument('--model',
# default='../model/softmax,50',
# help='path to load model.')
# parser.add_argument('--target',
# default='lfw,cfp_ff,cfp_fp,agedb_30',
# help='test targets.')
# parser.add_argument('--gpu', default=0, type=int, help='gpu id')
# parser.add_argument('--batch-size', default=32, type=int, help='')
# parser.add_argument('--max', default='', type=str, help='')
# parser.add_argument('--mode', default=0, type=int, help='')
# parser.add_argument('--nfolds', default=10, type=int, help='')
# args = parser.parse_args()
# image_size = [112, 112]
# print('image_size', image_size)
# ctx = mx.gpu(args.gpu)
# nets = []
# vec = args.model.split(',')
# prefix = args.model.split(',')[0]
# epochs = []
# if len(vec) == 1:
# pdir = os.path.dirname(prefix)
# for fname in os.listdir(pdir):
# if not fname.endswith('.params'):
# continue
# _file = os.path.join(pdir, fname)
# if _file.startswith(prefix):
# epoch = int(fname.split('.')[0].split('-')[1])
# epochs.append(epoch)
# epochs = sorted(epochs, reverse=True)
# if len(args.max) > 0:
# _max = [int(x) for x in args.max.split(',')]
# assert len(_max) == 2
# if len(epochs) > _max[1]:
# epochs = epochs[_max[0]:_max[1]]
#
# else:
# epochs = [int(x) for x in vec[1].split('|')]
# print('model number', len(epochs))
# time0 = datetime.datetime.now()
# for epoch in epochs:
# print('loading', prefix, epoch)
# sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
# # arg_params, aux_params = ch_dev(arg_params, aux_params, ctx)
# all_layers = sym.get_internals()
# sym = all_layers['fc1_output']
# model = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
# # model.bind(data_shapes=[('data', (args.batch_size, 3, image_size[0], image_size[1]))], label_shapes=[('softmax_label', (args.batch_size,))])
# model.bind(data_shapes=[('data', (args.batch_size, 3, image_size[0],
# image_size[1]))])
# model.set_params(arg_params, aux_params)
# nets.append(model)
# time_now = datetime.datetime.now()
# diff = time_now - time0
# print('model loading time', diff.total_seconds())
#
# ver_list = []
# ver_name_list = []
# for name in args.target.split(','):
# path = os.path.join(args.data_dir, name + ".bin")
# if os.path.exists(path):
# print('loading.. ', name)
# data_set = load_bin(path, image_size)
# ver_list.append(data_set)
# ver_name_list.append(name)
#
# if args.mode == 0:
# for i in range(len(ver_list)):
# results = []
# for model in nets:
# acc1, std1, acc2, std2, xnorm, embeddings_list = test(
# ver_list[i], model, args.batch_size, args.nfolds)
# print('[%s]XNorm: %f' % (ver_name_list[i], xnorm))
# print('[%s]Accuracy: %1.5f+-%1.5f' % (ver_name_list[i], acc1, std1))
# print('[%s]Accuracy-Flip: %1.5f+-%1.5f' % (ver_name_list[i], acc2, std2))
# results.append(acc2)
# print('Max of [%s] is %1.5f' % (ver_name_list[i], np.max(results)))
# elif args.mode == 1:
# raise ValueError
# else:
# model = nets[0]
# dumpR(ver_list[0], model, args.batch_size, args.target)
+483
View File
@@ -0,0 +1,483 @@
# coding: utf-8
import os
import pickle
import matplotlib
import pandas as pd
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import timeit
import sklearn
import argparse
import cv2
import numpy as np
import torch
from skimage import transform as trans
from backbones import get_model
from sklearn.metrics import roc_curve, auc
from menpo.visualize.viewmatplotlib import sample_colours_from_colourmap
from prettytable import PrettyTable
from pathlib import Path
import sys
import warnings
sys.path.insert(0, "../")
warnings.filterwarnings("ignore")
parser = argparse.ArgumentParser(description='do ijb test')
# general
parser.add_argument('--model-prefix', default='', help='path to load model.')
parser.add_argument('--image-path', default='', type=str, help='')
parser.add_argument('--result-dir', default='.', type=str, help='')
parser.add_argument('--batch-size', default=128, type=int, help='')
parser.add_argument('--network', default='iresnet50', type=str, help='')
parser.add_argument('--job', default='insightface', type=str, help='job name')
parser.add_argument('--target', default='IJBC', type=str, help='target, set to IJBC or IJBB')
args = parser.parse_args()
target = args.target
model_path = args.model_prefix
image_path = args.image_path
result_dir = args.result_dir
gpu_id = None
use_norm_score = True # if Ture, TestMode(N1)
use_detector_score = True # if Ture, TestMode(D1)
use_flip_test = True # if Ture, TestMode(F1)
job = args.job
batch_size = args.batch_size
class Embedding(object):
def __init__(self, prefix, data_shape, batch_size=1):
image_size = (112, 112)
self.image_size = image_size
weight = torch.load(prefix)
resnet = get_model(args.network, dropout=0, fp16=False).cuda()
resnet.load_state_dict(weight)
model = torch.nn.DataParallel(resnet)
self.model = model
self.model.eval()
src = np.array([
[30.2946, 51.6963],
[65.5318, 51.5014],
[48.0252, 71.7366],
[33.5493, 92.3655],
[62.7299, 92.2041]], dtype=np.float32)
src[:, 0] += 8.0
self.src = src
self.batch_size = batch_size
self.data_shape = data_shape
def get(self, rimg, landmark):
assert landmark.shape[0] == 68 or landmark.shape[0] == 5
assert landmark.shape[1] == 2
if landmark.shape[0] == 68:
landmark5 = np.zeros((5, 2), dtype=np.float32)
landmark5[0] = (landmark[36] + landmark[39]) / 2
landmark5[1] = (landmark[42] + landmark[45]) / 2
landmark5[2] = landmark[30]
landmark5[3] = landmark[48]
landmark5[4] = landmark[54]
else:
landmark5 = landmark
tform = trans.SimilarityTransform()
tform.estimate(landmark5, self.src)
M = tform.params[0:2, :]
img = cv2.warpAffine(rimg,
M, (self.image_size[1], self.image_size[0]),
borderValue=0.0)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_flip = np.fliplr(img)
img = np.transpose(img, (2, 0, 1)) # 3*112*112, RGB
img_flip = np.transpose(img_flip, (2, 0, 1))
input_blob = np.zeros((2, 3, self.image_size[1], self.image_size[0]), dtype=np.uint8)
input_blob[0] = img
input_blob[1] = img_flip
return input_blob
@torch.no_grad()
def forward_db(self, batch_data):
imgs = torch.Tensor(batch_data).cuda()
imgs.div_(255).sub_(0.5).div_(0.5)
feat = self.model(imgs)
feat = feat.reshape([self.batch_size, 2 * feat.shape[1]])
return feat.cpu().numpy()
# 将一个list尽量均分成n份,限制len(list)==n,份数大于原list内元素个数则分配空list[]
def divideIntoNstrand(listTemp, n):
twoList = [[] for i in range(n)]
for i, e in enumerate(listTemp):
twoList[i % n].append(e)
return twoList
def read_template_media_list(path):
# ijb_meta = np.loadtxt(path, dtype=str)
ijb_meta = pd.read_csv(path, sep=' ', header=None).values
templates = ijb_meta[:, 1].astype(np.int)
medias = ijb_meta[:, 2].astype(np.int)
return templates, medias
# In[ ]:
def read_template_pair_list(path):
# pairs = np.loadtxt(path, dtype=str)
pairs = pd.read_csv(path, sep=' ', header=None).values
# print(pairs.shape)
# print(pairs[:, 0].astype(np.int))
t1 = pairs[:, 0].astype(np.int)
t2 = pairs[:, 1].astype(np.int)
label = pairs[:, 2].astype(np.int)
return t1, t2, label
# In[ ]:
def read_image_feature(path):
with open(path, 'rb') as fid:
img_feats = pickle.load(fid)
return img_feats
# In[ ]:
def get_image_feature(img_path, files_list, model_path, epoch, gpu_id):
batch_size = args.batch_size
data_shape = (3, 112, 112)
files = files_list
print('files:', len(files))
rare_size = len(files) % batch_size
faceness_scores = []
batch = 0
img_feats = np.empty((len(files), 1024), dtype=np.float32)
batch_data = np.empty((2 * batch_size, 3, 112, 112))
embedding = Embedding(model_path, data_shape, batch_size)
for img_index, each_line in enumerate(files[:len(files) - rare_size]):
name_lmk_score = each_line.strip().split(' ')
img_name = os.path.join(img_path, name_lmk_score[0])
img = cv2.imread(img_name)
lmk = np.array([float(x) for x in name_lmk_score[1:-1]],
dtype=np.float32)
lmk = lmk.reshape((5, 2))
input_blob = embedding.get(img, lmk)
batch_data[2 * (img_index - batch * batch_size)][:] = input_blob[0]
batch_data[2 * (img_index - batch * batch_size) + 1][:] = input_blob[1]
if (img_index + 1) % batch_size == 0:
print('batch', batch)
img_feats[batch * batch_size:batch * batch_size +
batch_size][:] = embedding.forward_db(batch_data)
batch += 1
faceness_scores.append(name_lmk_score[-1])
batch_data = np.empty((2 * rare_size, 3, 112, 112))
embedding = Embedding(model_path, data_shape, rare_size)
for img_index, each_line in enumerate(files[len(files) - rare_size:]):
name_lmk_score = each_line.strip().split(' ')
img_name = os.path.join(img_path, name_lmk_score[0])
img = cv2.imread(img_name)
lmk = np.array([float(x) for x in name_lmk_score[1:-1]],
dtype=np.float32)
lmk = lmk.reshape((5, 2))
input_blob = embedding.get(img, lmk)
batch_data[2 * img_index][:] = input_blob[0]
batch_data[2 * img_index + 1][:] = input_blob[1]
if (img_index + 1) % rare_size == 0:
print('batch', batch)
img_feats[len(files) -
rare_size:][:] = embedding.forward_db(batch_data)
batch += 1
faceness_scores.append(name_lmk_score[-1])
faceness_scores = np.array(faceness_scores).astype(np.float32)
# img_feats = np.ones( (len(files), 1024), dtype=np.float32) * 0.01
# faceness_scores = np.ones( (len(files), ), dtype=np.float32 )
return img_feats, faceness_scores
# In[ ]:
def image2template_feature(img_feats=None, templates=None, medias=None):
# ==========================================================
# 1. face image feature l2 normalization. img_feats:[number_image x feats_dim]
# 2. compute media feature.
# 3. compute template feature.
# ==========================================================
unique_templates = np.unique(templates)
template_feats = np.zeros((len(unique_templates), img_feats.shape[1]))
for count_template, uqt in enumerate(unique_templates):
(ind_t,) = np.where(templates == uqt)
face_norm_feats = img_feats[ind_t]
face_medias = medias[ind_t]
unique_medias, unique_media_counts = np.unique(face_medias,
return_counts=True)
media_norm_feats = []
for u, ct in zip(unique_medias, unique_media_counts):
(ind_m,) = np.where(face_medias == u)
if ct == 1:
media_norm_feats += [face_norm_feats[ind_m]]
else: # image features from the same video will be aggregated into one feature
media_norm_feats += [
np.mean(face_norm_feats[ind_m], axis=0, keepdims=True)
]
media_norm_feats = np.array(media_norm_feats)
# media_norm_feats = media_norm_feats / np.sqrt(np.sum(media_norm_feats ** 2, -1, keepdims=True))
template_feats[count_template] = np.sum(media_norm_feats, axis=0)
if count_template % 2000 == 0:
print('Finish Calculating {} template features.'.format(
count_template))
# template_norm_feats = template_feats / np.sqrt(np.sum(template_feats ** 2, -1, keepdims=True))
template_norm_feats = sklearn.preprocessing.normalize(template_feats)
# print(template_norm_feats.shape)
return template_norm_feats, unique_templates
# In[ ]:
def verification(template_norm_feats=None,
unique_templates=None,
p1=None,
p2=None):
# ==========================================================
# Compute set-to-set Similarity Score.
# ==========================================================
template2id = np.zeros((max(unique_templates) + 1, 1), dtype=int)
for count_template, uqt in enumerate(unique_templates):
template2id[uqt] = count_template
score = np.zeros((len(p1),)) # save cosine distance between pairs
total_pairs = np.array(range(len(p1)))
batchsize = 100000 # small batchsize instead of all pairs in one batch due to the memory limiation
sublists = [
total_pairs[i:i + batchsize] for i in range(0, len(p1), batchsize)
]
total_sublists = len(sublists)
for c, s in enumerate(sublists):
feat1 = template_norm_feats[template2id[p1[s]]]
feat2 = template_norm_feats[template2id[p2[s]]]
similarity_score = np.sum(feat1 * feat2, -1)
score[s] = similarity_score.flatten()
if c % 10 == 0:
print('Finish {}/{} pairs.'.format(c, total_sublists))
return score
# In[ ]:
def verification2(template_norm_feats=None,
unique_templates=None,
p1=None,
p2=None):
template2id = np.zeros((max(unique_templates) + 1, 1), dtype=int)
for count_template, uqt in enumerate(unique_templates):
template2id[uqt] = count_template
score = np.zeros((len(p1),)) # save cosine distance between pairs
total_pairs = np.array(range(len(p1)))
batchsize = 100000 # small batchsize instead of all pairs in one batch due to the memory limiation
sublists = [
total_pairs[i:i + batchsize] for i in range(0, len(p1), batchsize)
]
total_sublists = len(sublists)
for c, s in enumerate(sublists):
feat1 = template_norm_feats[template2id[p1[s]]]
feat2 = template_norm_feats[template2id[p2[s]]]
similarity_score = np.sum(feat1 * feat2, -1)
score[s] = similarity_score.flatten()
if c % 10 == 0:
print('Finish {}/{} pairs.'.format(c, total_sublists))
return score
def read_score(path):
with open(path, 'rb') as fid:
img_feats = pickle.load(fid)
return img_feats
# # Step1: Load Meta Data
# In[ ]:
assert target == 'IJBC' or target == 'IJBB'
# =============================================================
# load image and template relationships for template feature embedding
# tid --> template id, mid --> media id
# format:
# image_name tid mid
# =============================================================
start = timeit.default_timer()
templates, medias = read_template_media_list(
os.path.join('%s/meta' % image_path,
'%s_face_tid_mid.txt' % target.lower()))
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
# In[ ]:
# =============================================================
# load template pairs for template-to-template verification
# tid : template id, label : 1/0
# format:
# tid_1 tid_2 label
# =============================================================
start = timeit.default_timer()
p1, p2, label = read_template_pair_list(
os.path.join('%s/meta' % image_path,
'%s_template_pair_label.txt' % target.lower()))
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
# # Step 2: Get Image Features
# In[ ]:
# =============================================================
# load image features
# format:
# img_feats: [image_num x feats_dim] (227630, 512)
# =============================================================
start = timeit.default_timer()
img_path = '%s/loose_crop' % image_path
img_list_path = '%s/meta/%s_name_5pts_score.txt' % (image_path, target.lower())
img_list = open(img_list_path)
files = img_list.readlines()
# files_list = divideIntoNstrand(files, rank_size)
files_list = files
# img_feats
# for i in range(rank_size):
img_feats, faceness_scores = get_image_feature(img_path, files_list,
model_path, 0, gpu_id)
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
print('Feature Shape: ({} , {}) .'.format(img_feats.shape[0],
img_feats.shape[1]))
# # Step3: Get Template Features
# In[ ]:
# =============================================================
# compute template features from image features.
# =============================================================
start = timeit.default_timer()
# ==========================================================
# Norm feature before aggregation into template feature?
# Feature norm from embedding network and faceness score are able to decrease weights for noise samples (not face).
# ==========================================================
# 1. FaceScore Feature Norm
# 2. FaceScore Detector
if use_flip_test:
# concat --- F1
# img_input_feats = img_feats
# add --- F2
img_input_feats = img_feats[:, 0:img_feats.shape[1] //
2] + img_feats[:, img_feats.shape[1] // 2:]
else:
img_input_feats = img_feats[:, 0:img_feats.shape[1] // 2]
if use_norm_score:
img_input_feats = img_input_feats
else:
# normalise features to remove norm information
img_input_feats = img_input_feats / np.sqrt(
np.sum(img_input_feats ** 2, -1, keepdims=True))
if use_detector_score:
print(img_input_feats.shape, faceness_scores.shape)
img_input_feats = img_input_feats * faceness_scores[:, np.newaxis]
else:
img_input_feats = img_input_feats
template_norm_feats, unique_templates = image2template_feature(
img_input_feats, templates, medias)
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
# # Step 4: Get Template Similarity Scores
# In[ ]:
# =============================================================
# compute verification scores between template pairs.
# =============================================================
start = timeit.default_timer()
score = verification(template_norm_feats, unique_templates, p1, p2)
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
# In[ ]:
save_path = os.path.join(result_dir, args.job)
# save_path = result_dir + '/%s_result' % target
if not os.path.exists(save_path):
os.makedirs(save_path)
score_save_file = os.path.join(save_path, "%s.npy" % target.lower())
np.save(score_save_file, score)
# # Step 5: Get ROC Curves and TPR@FPR Table
# In[ ]:
files = [score_save_file]
methods = []
scores = []
for file in files:
methods.append(Path(file).stem)
scores.append(np.load(file))
methods = np.array(methods)
scores = dict(zip(methods, scores))
colours = dict(
zip(methods, sample_colours_from_colourmap(methods.shape[0], 'Set2')))
x_labels = [10 ** -6, 10 ** -5, 10 ** -4, 10 ** -3, 10 ** -2, 10 ** -1]
tpr_fpr_table = PrettyTable(['Methods'] + [str(x) for x in x_labels])
fig = plt.figure()
for method in methods:
fpr, tpr, _ = roc_curve(label, scores[method])
roc_auc = auc(fpr, tpr)
fpr = np.flipud(fpr)
tpr = np.flipud(tpr) # select largest tpr at same fpr
plt.plot(fpr,
tpr,
color=colours[method],
lw=1,
label=('[%s (AUC = %0.4f %%)]' %
(method.split('-')[-1], roc_auc * 100)))
tpr_fpr_row = []
tpr_fpr_row.append("%s-%s" % (method, target))
for fpr_iter in np.arange(len(x_labels)):
_, min_index = min(
list(zip(abs(fpr - x_labels[fpr_iter]), range(len(fpr)))))
tpr_fpr_row.append('%.2f' % (tpr[min_index] * 100))
tpr_fpr_table.add_row(tpr_fpr_row)
plt.xlim([10 ** -6, 0.1])
plt.ylim([0.3, 1.0])
plt.grid(linestyle='--', linewidth=1)
plt.xticks(x_labels)
plt.yticks(np.linspace(0.3, 1.0, 8, endpoint=True))
plt.xscale('log')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC on IJB')
plt.legend(loc="lower right")
fig.savefig(os.path.join(save_path, '%s.pdf' % target.lower()))
print(tpr_fpr_table)
+35
View File
@@ -0,0 +1,35 @@
import argparse
import cv2
import numpy as np
import torch
from backbones import get_model
@torch.no_grad()
def inference(weight, name, img):
if img is None:
img = np.random.randint(0, 255, size=(112, 112, 3), dtype=np.uint8)
else:
img = cv2.imread(img)
img = cv2.resize(img, (112, 112))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.transpose(img, (2, 0, 1))
img = torch.from_numpy(img).unsqueeze(0).float()
img.div_(255).sub_(0.5).div_(0.5)
net = get_model(name, fp16=False)
net.load_state_dict(torch.load(weight))
net.eval()
feat = net(img).numpy()
print(feat)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='PyTorch ArcFace Training')
parser.add_argument('--network', type=str, default='r50', help='backbone network')
parser.add_argument('--weight', type=str, default='')
parser.add_argument('--img', type=str, default=None)
args = parser.parse_args()
inference(args.weight, args.network, args.img)
+47
View File
@@ -0,0 +1,47 @@
import torch
import math
class ArcFace(torch.nn.Module):
""" ArcFace (https://arxiv.org/pdf/1801.07698v1.pdf):
"""
def __init__(self, s=64.0, margin=0.5):
super(ArcFace, self).__init__()
self.scale = s
self.cos_m = math.cos(margin)
self.sin_m = math.sin(margin)
self.theta = math.cos(math.pi - margin)
self.sinmm = math.sin(math.pi - margin) * margin
self.easy_margin = False
def forward(self, logits: torch.Tensor, labels: torch.Tensor):
index = torch.where(labels != -1)[0]
target_logit = logits[index, labels[index].view(-1)]
sin_theta = torch.sqrt(1.0 - torch.pow(target_logit, 2))
cos_theta_m = target_logit * self.cos_m - sin_theta * self.sin_m # cos(target+margin)
if self.easy_margin:
final_target_logit = torch.where(
target_logit > 0, cos_theta_m, target_logit)
else:
final_target_logit = torch.where(
target_logit > self.theta, cos_theta_m, target_logit - self.sinmm)
logits[index, labels[index].view(-1)] = final_target_logit
logits = logits * self.scale
return logits
class CosFace(torch.nn.Module):
def __init__(self, s=64.0, m=0.40):
super(CosFace, self).__init__()
self.s = s
self.m = m
def forward(self, logits: torch.Tensor, labels: torch.Tensor):
index = torch.where(labels != -1)[0]
target_logit = logits[index, labels[index].view(-1)]
final_target_logit = target_logit - self.m
logits[index, labels[index].view(-1)] = final_target_logit
logits = logits * self.s
return logits
+29
View File
@@ -0,0 +1,29 @@
from torch.optim.lr_scheduler import _LRScheduler
class PolyScheduler(_LRScheduler):
def __init__(self, optimizer, base_lr, max_steps, warmup_steps, last_epoch=-1):
self.base_lr = base_lr
self.warmup_lr_init = 0.0001
self.max_steps: int = max_steps
self.warmup_steps: int = warmup_steps
self.power = 2
super(PolyScheduler, self).__init__(optimizer, last_epoch, False)
def get_warmup_lr(self):
alpha = float(self.last_epoch) / float(self.warmup_steps)
return [self.base_lr * alpha for _ in self.optimizer.param_groups]
def get_lr(self):
if self.last_epoch == -1:
return [self.warmup_lr_init for _ in self.optimizer.param_groups]
if self.last_epoch < self.warmup_steps:
return self.get_warmup_lr()
else:
alpha = pow(
1
- float(self.last_epoch - self.warmup_steps)
/ float(self.max_steps - self.warmup_steps),
self.power,
)
return [self.base_lr * alpha for _ in self.optimizer.param_groups]
+250
View File
@@ -0,0 +1,250 @@
from __future__ import division
import datetime
import os
import os.path as osp
import glob
import numpy as np
import cv2
import sys
import onnxruntime
import onnx
import argparse
from onnx import numpy_helper
from insightface.data import get_image
class ArcFaceORT:
def __init__(self, model_path, cpu=False):
self.model_path = model_path
# providers = None will use available provider, for onnxruntime-gpu it will be "CUDAExecutionProvider"
self.providers = ['CPUExecutionProvider'] if cpu else None
#input_size is (w,h), return error message, return None if success
def check(self, track='cfat', test_img = None):
#default is cfat
max_model_size_mb=1024
max_feat_dim=512
max_time_cost=15
if track.startswith('ms1m'):
max_model_size_mb=1024
max_feat_dim=512
max_time_cost=10
elif track.startswith('glint'):
max_model_size_mb=1024
max_feat_dim=1024
max_time_cost=20
elif track.startswith('cfat'):
max_model_size_mb = 1024
max_feat_dim = 512
max_time_cost = 15
elif track.startswith('unconstrained'):
max_model_size_mb=1024
max_feat_dim=1024
max_time_cost=30
else:
return "track not found"
if not os.path.exists(self.model_path):
return "model_path not exists"
if not os.path.isdir(self.model_path):
return "model_path should be directory"
onnx_files = []
for _file in os.listdir(self.model_path):
if _file.endswith('.onnx'):
onnx_files.append(osp.join(self.model_path, _file))
if len(onnx_files)==0:
return "do not have onnx files"
self.model_file = sorted(onnx_files)[-1]
print('use onnx-model:', self.model_file)
try:
session = onnxruntime.InferenceSession(self.model_file, providers=self.providers)
except:
return "load onnx failed"
input_cfg = session.get_inputs()[0]
input_shape = input_cfg.shape
print('input-shape:', input_shape)
if len(input_shape)!=4:
return "length of input_shape should be 4"
if not isinstance(input_shape[0], str):
#return "input_shape[0] should be str to support batch-inference"
print('reset input-shape[0] to None')
model = onnx.load(self.model_file)
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = 'None'
new_model_file = osp.join(self.model_path, 'zzzzrefined.onnx')
onnx.save(model, new_model_file)
self.model_file = new_model_file
print('use new onnx-model:', self.model_file)
try:
session = onnxruntime.InferenceSession(self.model_file, providers=self.providers)
except:
return "load onnx failed"
input_cfg = session.get_inputs()[0]
input_shape = input_cfg.shape
print('new-input-shape:', input_shape)
self.image_size = tuple(input_shape[2:4][::-1])
#print('image_size:', self.image_size)
input_name = input_cfg.name
outputs = session.get_outputs()
output_names = []
for o in outputs:
output_names.append(o.name)
#print(o.name, o.shape)
if len(output_names)!=1:
return "number of output nodes should be 1"
self.session = session
self.input_name = input_name
self.output_names = output_names
#print(self.output_names)
model = onnx.load(self.model_file)
graph = model.graph
if len(graph.node)<8:
return "too small onnx graph"
input_size = (112,112)
self.crop = None
if track=='cfat':
crop_file = osp.join(self.model_path, 'crop.txt')
if osp.exists(crop_file):
lines = open(crop_file,'r').readlines()
if len(lines)!=6:
return "crop.txt should contain 6 lines"
lines = [int(x) for x in lines]
self.crop = lines[:4]
input_size = tuple(lines[4:6])
if input_size!=self.image_size:
return "input-size is inconsistant with onnx model input, %s vs %s"%(input_size, self.image_size)
self.model_size_mb = os.path.getsize(self.model_file) / float(1024*1024)
if self.model_size_mb > max_model_size_mb:
return "max model size exceed, given %.3f-MB"%self.model_size_mb
input_mean = None
input_std = None
if track=='cfat':
pn_file = osp.join(self.model_path, 'pixel_norm.txt')
if osp.exists(pn_file):
lines = open(pn_file,'r').readlines()
if len(lines)!=2:
return "pixel_norm.txt should contain 2 lines"
input_mean = float(lines[0])
input_std = float(lines[1])
if input_mean is not None or input_std is not None:
if input_mean is None or input_std is None:
return "please set input_mean and input_std simultaneously"
else:
find_sub = False
find_mul = False
for nid, node in enumerate(graph.node[:8]):
print(nid, node.name)
if node.name.startswith('Sub') or node.name.startswith('_minus'):
find_sub = True
if node.name.startswith('Mul') or node.name.startswith('_mul') or node.name.startswith('Div'):
find_mul = True
if find_sub and find_mul:
print("find sub and mul")
#mxnet arcface model
input_mean = 0.0
input_std = 1.0
else:
input_mean = 127.5
input_std = 127.5
self.input_mean = input_mean
self.input_std = input_std
for initn in graph.initializer:
weight_array = numpy_helper.to_array(initn)
dt = weight_array.dtype
if dt.itemsize<4:
return 'invalid weight type - (%s:%s)' % (initn.name, dt.name)
if test_img is None:
test_img = get_image('Tom_Hanks_54745')
test_img = cv2.resize(test_img, self.image_size)
else:
test_img = cv2.resize(test_img, self.image_size)
feat, cost = self.benchmark(test_img)
batch_result = self.check_batch(test_img)
batch_result_sum = float(np.sum(batch_result))
if batch_result_sum in [float('inf'), -float('inf')] or batch_result_sum != batch_result_sum:
print(batch_result)
print(batch_result_sum)
return "batch result output contains NaN!"
if len(feat.shape) < 2:
return "the shape of the feature must be two, but get {}".format(str(feat.shape))
if feat.shape[1] > max_feat_dim:
return "max feat dim exceed, given %d"%feat.shape[1]
self.feat_dim = feat.shape[1]
cost_ms = cost*1000
if cost_ms>max_time_cost:
return "max time cost exceed, given %.4f"%cost_ms
self.cost_ms = cost_ms
print('check stat:, model-size-mb: %.4f, feat-dim: %d, time-cost-ms: %.4f, input-mean: %.3f, input-std: %.3f'%(self.model_size_mb, self.feat_dim, self.cost_ms, self.input_mean, self.input_std))
return None
def check_batch(self, img):
if not isinstance(img, list):
imgs = [img, ] * 32
if self.crop is not None:
nimgs = []
for img in imgs:
nimg = img[self.crop[1]:self.crop[3], self.crop[0]:self.crop[2], :]
if nimg.shape[0] != self.image_size[1] or nimg.shape[1] != self.image_size[0]:
nimg = cv2.resize(nimg, self.image_size)
nimgs.append(nimg)
imgs = nimgs
blob = cv2.dnn.blobFromImages(
images=imgs, scalefactor=1.0 / self.input_std, size=self.image_size,
mean=(self.input_mean, self.input_mean, self.input_mean), swapRB=True)
net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
return net_out
def meta_info(self):
return {'model-size-mb':self.model_size_mb, 'feature-dim':self.feat_dim, 'infer': self.cost_ms}
def forward(self, imgs):
if not isinstance(imgs, list):
imgs = [imgs]
input_size = self.image_size
if self.crop is not None:
nimgs = []
for img in imgs:
nimg = img[self.crop[1]:self.crop[3],self.crop[0]:self.crop[2],:]
if nimg.shape[0]!=input_size[1] or nimg.shape[1]!=input_size[0]:
nimg = cv2.resize(nimg, input_size)
nimgs.append(nimg)
imgs = nimgs
blob = cv2.dnn.blobFromImages(imgs, 1.0/self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
net_out = self.session.run(self.output_names, {self.input_name : blob})[0]
return net_out
def benchmark(self, img):
input_size = self.image_size
if self.crop is not None:
nimg = img[self.crop[1]:self.crop[3],self.crop[0]:self.crop[2],:]
if nimg.shape[0]!=input_size[1] or nimg.shape[1]!=input_size[0]:
nimg = cv2.resize(nimg, input_size)
img = nimg
blob = cv2.dnn.blobFromImage(img, 1.0/self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
costs = []
for _ in range(50):
ta = datetime.datetime.now()
net_out = self.session.run(self.output_names, {self.input_name : blob})[0]
tb = datetime.datetime.now()
cost = (tb-ta).total_seconds()
costs.append(cost)
costs = sorted(costs)
cost = costs[5]
return net_out, cost
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='')
# general
parser.add_argument('workdir', help='submitted work dir', type=str)
parser.add_argument('--track', help='track name, for different challenge', type=str, default='cfat')
args = parser.parse_args()
handler = ArcFaceORT(args.workdir)
err = handler.check(args.track)
print('err:', err)
+269
View File
@@ -0,0 +1,269 @@
import argparse
import os
import pickle
import timeit
import cv2
import mxnet as mx
import numpy as np
import pandas as pd
import prettytable
import skimage.transform
import torch
from sklearn.metrics import roc_curve
from sklearn.preprocessing import normalize
from torch.utils.data import DataLoader
from onnx_helper import ArcFaceORT
SRC = np.array(
[
[30.2946, 51.6963],
[65.5318, 51.5014],
[48.0252, 71.7366],
[33.5493, 92.3655],
[62.7299, 92.2041]]
, dtype=np.float32)
SRC[:, 0] += 8.0
@torch.no_grad()
class AlignedDataSet(mx.gluon.data.Dataset):
def __init__(self, root, lines, align=True):
self.lines = lines
self.root = root
self.align = align
def __len__(self):
return len(self.lines)
def __getitem__(self, idx):
each_line = self.lines[idx]
name_lmk_score = each_line.strip().split(' ')
name = os.path.join(self.root, name_lmk_score[0])
img = cv2.cvtColor(cv2.imread(name), cv2.COLOR_BGR2RGB)
landmark5 = np.array([float(x) for x in name_lmk_score[1:-1]], dtype=np.float32).reshape((5, 2))
st = skimage.transform.SimilarityTransform()
st.estimate(landmark5, SRC)
img = cv2.warpAffine(img, st.params[0:2, :], (112, 112), borderValue=0.0)
img_1 = np.expand_dims(img, 0)
img_2 = np.expand_dims(np.fliplr(img), 0)
output = np.concatenate((img_1, img_2), axis=0).astype(np.float32)
output = np.transpose(output, (0, 3, 1, 2))
return torch.from_numpy(output)
@torch.no_grad()
def extract(model_root, dataset):
model = ArcFaceORT(model_path=model_root)
model.check()
feat_mat = np.zeros(shape=(len(dataset), 2 * model.feat_dim))
def collate_fn(data):
return torch.cat(data, dim=0)
data_loader = DataLoader(
dataset, batch_size=128, drop_last=False, num_workers=4, collate_fn=collate_fn, )
num_iter = 0
for batch in data_loader:
batch = batch.numpy()
batch = (batch - model.input_mean) / model.input_std
feat = model.session.run(model.output_names, {model.input_name: batch})[0]
feat = np.reshape(feat, (-1, model.feat_dim * 2))
feat_mat[128 * num_iter: 128 * num_iter + feat.shape[0], :] = feat
num_iter += 1
if num_iter % 50 == 0:
print(num_iter)
return feat_mat
def read_template_media_list(path):
ijb_meta = pd.read_csv(path, sep=' ', header=None).values
templates = ijb_meta[:, 1].astype(np.int)
medias = ijb_meta[:, 2].astype(np.int)
return templates, medias
def read_template_pair_list(path):
pairs = pd.read_csv(path, sep=' ', header=None).values
t1 = pairs[:, 0].astype(np.int)
t2 = pairs[:, 1].astype(np.int)
label = pairs[:, 2].astype(np.int)
return t1, t2, label
def read_image_feature(path):
with open(path, 'rb') as fid:
img_feats = pickle.load(fid)
return img_feats
def image2template_feature(img_feats=None,
templates=None,
medias=None):
unique_templates = np.unique(templates)
template_feats = np.zeros((len(unique_templates), img_feats.shape[1]))
for count_template, uqt in enumerate(unique_templates):
(ind_t,) = np.where(templates == uqt)
face_norm_feats = img_feats[ind_t]
face_medias = medias[ind_t]
unique_medias, unique_media_counts = np.unique(face_medias, return_counts=True)
media_norm_feats = []
for u, ct in zip(unique_medias, unique_media_counts):
(ind_m,) = np.where(face_medias == u)
if ct == 1:
media_norm_feats += [face_norm_feats[ind_m]]
else: # image features from the same video will be aggregated into one feature
media_norm_feats += [np.mean(face_norm_feats[ind_m], axis=0, keepdims=True), ]
media_norm_feats = np.array(media_norm_feats)
template_feats[count_template] = np.sum(media_norm_feats, axis=0)
if count_template % 2000 == 0:
print('Finish Calculating {} template features.'.format(
count_template))
template_norm_feats = normalize(template_feats)
return template_norm_feats, unique_templates
def verification(template_norm_feats=None,
unique_templates=None,
p1=None,
p2=None):
template2id = np.zeros((max(unique_templates) + 1, 1), dtype=int)
for count_template, uqt in enumerate(unique_templates):
template2id[uqt] = count_template
score = np.zeros((len(p1),))
total_pairs = np.array(range(len(p1)))
batchsize = 100000
sublists = [total_pairs[i: i + batchsize] for i in range(0, len(p1), batchsize)]
total_sublists = len(sublists)
for c, s in enumerate(sublists):
feat1 = template_norm_feats[template2id[p1[s]]]
feat2 = template_norm_feats[template2id[p2[s]]]
similarity_score = np.sum(feat1 * feat2, -1)
score[s] = similarity_score.flatten()
if c % 10 == 0:
print('Finish {}/{} pairs.'.format(c, total_sublists))
return score
def verification2(template_norm_feats=None,
unique_templates=None,
p1=None,
p2=None):
template2id = np.zeros((max(unique_templates) + 1, 1), dtype=int)
for count_template, uqt in enumerate(unique_templates):
template2id[uqt] = count_template
score = np.zeros((len(p1),)) # save cosine distance between pairs
total_pairs = np.array(range(len(p1)))
batchsize = 100000 # small batchsize instead of all pairs in one batch due to the memory limiation
sublists = [total_pairs[i:i + batchsize] for i in range(0, len(p1), batchsize)]
total_sublists = len(sublists)
for c, s in enumerate(sublists):
feat1 = template_norm_feats[template2id[p1[s]]]
feat2 = template_norm_feats[template2id[p2[s]]]
similarity_score = np.sum(feat1 * feat2, -1)
score[s] = similarity_score.flatten()
if c % 10 == 0:
print('Finish {}/{} pairs.'.format(c, total_sublists))
return score
def main(args):
use_norm_score = True # if Ture, TestMode(N1)
use_detector_score = True # if Ture, TestMode(D1)
use_flip_test = True # if Ture, TestMode(F1)
assert args.target == 'IJBC' or args.target == 'IJBB'
start = timeit.default_timer()
templates, medias = read_template_media_list(
os.path.join('%s/meta' % args.image_path, '%s_face_tid_mid.txt' % args.target.lower()))
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
start = timeit.default_timer()
p1, p2, label = read_template_pair_list(
os.path.join('%s/meta' % args.image_path,
'%s_template_pair_label.txt' % args.target.lower()))
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
start = timeit.default_timer()
img_path = '%s/loose_crop' % args.image_path
img_list_path = '%s/meta/%s_name_5pts_score.txt' % (args.image_path, args.target.lower())
img_list = open(img_list_path)
files = img_list.readlines()
dataset = AlignedDataSet(root=img_path, lines=files, align=True)
img_feats = extract(args.model_root, dataset)
faceness_scores = []
for each_line in files:
name_lmk_score = each_line.split()
faceness_scores.append(name_lmk_score[-1])
faceness_scores = np.array(faceness_scores).astype(np.float32)
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
print('Feature Shape: ({} , {}) .'.format(img_feats.shape[0], img_feats.shape[1]))
start = timeit.default_timer()
if use_flip_test:
img_input_feats = img_feats[:, 0:img_feats.shape[1] // 2] + img_feats[:, img_feats.shape[1] // 2:]
else:
img_input_feats = img_feats[:, 0:img_feats.shape[1] // 2]
if use_norm_score:
img_input_feats = img_input_feats
else:
img_input_feats = img_input_feats / np.sqrt(np.sum(img_input_feats ** 2, -1, keepdims=True))
if use_detector_score:
print(img_input_feats.shape, faceness_scores.shape)
img_input_feats = img_input_feats * faceness_scores[:, np.newaxis]
else:
img_input_feats = img_input_feats
template_norm_feats, unique_templates = image2template_feature(
img_input_feats, templates, medias)
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
start = timeit.default_timer()
score = verification(template_norm_feats, unique_templates, p1, p2)
stop = timeit.default_timer()
print('Time: %.2f s. ' % (stop - start))
result_dir = args.model_root
save_path = os.path.join(result_dir, "{}_result".format(args.target))
if not os.path.exists(save_path):
os.makedirs(save_path)
score_save_file = os.path.join(save_path, "{}.npy".format(args.target))
np.save(score_save_file, score)
files = [score_save_file]
methods = []
scores = []
for file in files:
methods.append(os.path.basename(file))
scores.append(np.load(file))
methods = np.array(methods)
scores = dict(zip(methods, scores))
x_labels = [10 ** -6, 10 ** -5, 10 ** -4, 10 ** -3, 10 ** -2, 10 ** -1]
tpr_fpr_table = prettytable.PrettyTable(['Methods'] + [str(x) for x in x_labels])
for method in methods:
fpr, tpr, _ = roc_curve(label, scores[method])
fpr = np.flipud(fpr)
tpr = np.flipud(tpr)
tpr_fpr_row = []
tpr_fpr_row.append("%s-%s" % (method, args.target))
for fpr_iter in np.arange(len(x_labels)):
_, min_index = min(
list(zip(abs(fpr - x_labels[fpr_iter]), range(len(fpr)))))
tpr_fpr_row.append('%.2f' % (tpr[min_index] * 100))
tpr_fpr_table.add_row(tpr_fpr_row)
print(tpr_fpr_table)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='do ijb test')
# general
parser.add_argument('--model-root', default='', help='path to load model.')
parser.add_argument('--image-path', default='/train_tmp/IJB_release/IJBC', type=str, help='')
parser.add_argument('--target', default='IJBC', type=str, help='target, set to IJBC or IJBB')
main(parser.parse_args())
+330
View File
@@ -0,0 +1,330 @@
import collections
from typing import Callable
import torch
from torch import distributed
from torch.nn.functional import linear, normalize
class PartialFC(torch.nn.Module):
"""
https://arxiv.org/abs/2010.05222
A distributed sparsely updating variant of the FC layer, named Partial FC (PFC).
When sample rate less than 1, in each iteration, positive class centers and a random subset of
negative class centers are selected to compute the margin-based softmax loss, all class
centers are still maintained throughout the whole training process, but only a subset is
selected and updated in each iteration.
.. note::
When sample rate equal to 1, Partial FC is equal to model parallelism(default sample rate is 1).
Example:
--------
>>> module_pfc = PartialFC(embedding_size=512, num_classes=8000000, sample_rate=0.2)
>>> for img, labels in data_loader:
>>> embeddings = net(img)
>>> loss = module_pfc(embeddings, labels, optimizer)
>>> loss.backward()
>>> optimizer.step()
"""
_version = 1
def __init__(
self,
margin_loss: Callable,
embedding_size: int,
num_classes: int,
sample_rate: float = 1.0,
fp16: bool = False,
):
"""
Paramenters:
-----------
embedding_size: int
The dimension of embedding, required
num_classes: int
Total number of classes, required
sample_rate: float
The rate of negative centers participating in the calculation, default is 1.0.
"""
super(PartialFC, self).__init__()
assert (
distributed.is_initialized()
), "must initialize distributed before create this"
self.rank = distributed.get_rank()
self.world_size = distributed.get_world_size()
self.dist_cross_entropy = DistCrossEntropy()
self.embedding_size = embedding_size
self.sample_rate: float = sample_rate
self.fp16 = fp16
self.num_local: int = num_classes // self.world_size + int(
self.rank < num_classes % self.world_size
)
self.class_start: int = num_classes // self.world_size * self.rank + min(
self.rank, num_classes % self.world_size
)
self.num_sample: int = int(self.sample_rate * self.num_local)
self.last_batch_size: int = 0
self.weight: torch.Tensor
self.weight_mom: torch.Tensor
self.weight_activated: torch.nn.Parameter
self.weight_activated_mom: torch.Tensor
self.is_updated: bool = True
self.init_weight_update: bool = True
if self.sample_rate < 1:
self.register_buffer("weight",
tensor=torch.normal(0, 0.01, (self.num_local, embedding_size)))
self.register_buffer("weight_mom",
tensor=torch.zeros_like(self.weight))
self.register_parameter("weight_activated",
param=torch.nn.Parameter(torch.empty(0, 0)))
self.register_buffer("weight_activated_mom",
tensor=torch.empty(0, 0))
self.register_buffer("weight_index",
tensor=torch.empty(0, 0))
else:
self.weight_activated = torch.nn.Parameter(torch.normal(0, 0.01, (self.num_local, embedding_size)))
# margin_loss
if isinstance(margin_loss, Callable):
self.margin_softmax = margin_loss
else:
raise
@torch.no_grad()
def sample(self,
labels: torch.Tensor,
index_positive: torch.Tensor,
optimizer: torch.optim.Optimizer):
"""
This functions will change the value of labels
Parameters:
-----------
labels: torch.Tensor
pass
index_positive: torch.Tensor
pass
optimizer: torch.optim.Optimizer
pass
"""
positive = torch.unique(labels[index_positive], sorted=True).cuda()
if self.num_sample - positive.size(0) >= 0:
perm = torch.rand(size=[self.num_local]).cuda()
perm[positive] = 2.0
index = torch.topk(perm, k=self.num_sample)[1].cuda()
index = index.sort()[0].cuda()
else:
index = positive
self.weight_index = index
labels[index_positive] = torch.searchsorted(index, labels[index_positive])
self.weight_activated = torch.nn.Parameter(self.weight[self.weight_index])
self.weight_activated_mom = self.weight_mom[self.weight_index]
if isinstance(optimizer, torch.optim.SGD):
# TODO the params of partial fc must be last in the params list
optimizer.state.pop(optimizer.param_groups[-1]["params"][0], None)
optimizer.param_groups[-1]["params"][0] = self.weight_activated
optimizer.state[self.weight_activated][
"momentum_buffer"
] = self.weight_activated_mom
else:
raise
@torch.no_grad()
def update(self):
""" partial weight to global
"""
if self.init_weight_update:
self.init_weight_update = False
return
if self.sample_rate < 1:
self.weight[self.weight_index] = self.weight_activated
self.weight_mom[self.weight_index] = self.weight_activated_mom
def forward(
self,
local_embeddings: torch.Tensor,
local_labels: torch.Tensor,
optimizer: torch.optim.Optimizer,
):
"""
Parameters:
----------
local_embeddings: torch.Tensor
feature embeddings on each GPU(Rank).
local_labels: torch.Tensor
labels on each GPU(Rank).
Returns:
-------
loss: torch.Tensor
pass
"""
local_labels.squeeze_()
local_labels = local_labels.long()
self.update()
batch_size = local_embeddings.size(0)
if self.last_batch_size == 0:
self.last_batch_size = batch_size
assert self.last_batch_size == batch_size, (
"last batch size do not equal current batch size: {} vs {}".format(
self.last_batch_size, batch_size))
_gather_embeddings = [
torch.zeros((batch_size, self.embedding_size)).cuda()
for _ in range(self.world_size)
]
_gather_labels = [
torch.zeros(batch_size).long().cuda() for _ in range(self.world_size)
]
_list_embeddings = AllGather(local_embeddings, *_gather_embeddings)
distributed.all_gather(_gather_labels, local_labels)
embeddings = torch.cat(_list_embeddings)
labels = torch.cat(_gather_labels)
labels = labels.view(-1, 1)
index_positive = (self.class_start <= labels) & (
labels < self.class_start + self.num_local
)
labels[~index_positive] = -1
labels[index_positive] -= self.class_start
if self.sample_rate < 1:
self.sample(labels, index_positive, optimizer)
with torch.cuda.amp.autocast(self.fp16):
norm_embeddings = normalize(embeddings)
norm_weight_activated = normalize(self.weight_activated)
logits = linear(norm_embeddings, norm_weight_activated)
if self.fp16:
logits = logits.float()
logits = logits.clamp(-1, 1)
logits = self.margin_softmax(logits, labels)
loss = self.dist_cross_entropy(logits, labels)
return loss
def state_dict(self, destination=None, prefix="", keep_vars=False):
if destination is None:
destination = collections.OrderedDict()
destination._metadata = collections.OrderedDict()
for name, module in self._modules.items():
if module is not None:
module.state_dict(destination, prefix + name + ".", keep_vars=keep_vars)
if self.sample_rate < 1:
destination["weight"] = self.weight.detach()
else:
destination["weight"] = self.weight_activated.data.detach()
return destination
def load_state_dict(self, state_dict, strict: bool = True):
if self.sample_rate < 1:
self.weight = state_dict["weight"].to(self.weight.device)
self.weight_mom.zero_()
self.weight_activated.data.zero_()
self.weight_activated_mom.zero_()
self.weight_index.zero_()
else:
self.weight_activated.data = state_dict["weight"].to(self.weight_activated.data.device)
class DistCrossEntropyFunc(torch.autograd.Function):
"""
CrossEntropy loss is calculated in parallel, allreduce denominator into single gpu and calculate softmax.
Implemented of ArcFace (https://arxiv.org/pdf/1801.07698v1.pdf):
"""
@staticmethod
def forward(ctx, logits: torch.Tensor, label: torch.Tensor):
""" """
batch_size = logits.size(0)
# for numerical stability
max_logits, _ = torch.max(logits, dim=1, keepdim=True)
# local to global
distributed.all_reduce(max_logits, distributed.ReduceOp.MAX)
logits.sub_(max_logits)
logits.exp_()
sum_logits_exp = torch.sum(logits, dim=1, keepdim=True)
# local to global
distributed.all_reduce(sum_logits_exp, distributed.ReduceOp.SUM)
logits.div_(sum_logits_exp)
index = torch.where(label != -1)[0]
# loss
loss = torch.zeros(batch_size, 1, device=logits.device)
loss[index] = logits[index].gather(1, label[index])
distributed.all_reduce(loss, distributed.ReduceOp.SUM)
ctx.save_for_backward(index, logits, label)
return loss.clamp_min_(1e-30).log_().mean() * (-1)
@staticmethod
def backward(ctx, loss_gradient):
"""
Args:
loss_grad (torch.Tensor): gradient backward by last layer
Returns:
gradients for each input in forward function
`None` gradients for one-hot label
"""
(
index,
logits,
label,
) = ctx.saved_tensors
batch_size = logits.size(0)
one_hot = torch.zeros(
size=[index.size(0), logits.size(1)], device=logits.device
)
one_hot.scatter_(1, label[index], 1)
logits[index] -= one_hot
logits.div_(batch_size)
return logits * loss_gradient.item(), None
class DistCrossEntropy(torch.nn.Module):
def __init__(self):
super(DistCrossEntropy, self).__init__()
def forward(self, logit_part, label_part):
return DistCrossEntropyFunc.apply(logit_part, label_part)
class AllGatherFunc(torch.autograd.Function):
"""AllGather op with gradient backward"""
@staticmethod
def forward(ctx, tensor, *gather_list):
gather_list = list(gather_list)
distributed.all_gather(gather_list, tensor)
return tuple(gather_list)
@staticmethod
def backward(ctx, *grads):
grad_list = list(grads)
rank = distributed.get_rank()
grad_out = grad_list[rank]
dist_ops = [
distributed.reduce(grad_out, rank, distributed.ReduceOp.SUM, async_op=True)
if i == rank
else distributed.reduce(
grad_list[i], i, distributed.ReduceOp.SUM, async_op=True
)
for i in range(distributed.get_world_size())
]
for _op in dist_ops:
_op.wait()
grad_out *= len(grad_list) # cooperate with distributed loss function
return (grad_out, *[None for _ in range(len(grad_list))])
AllGather = AllGatherFunc.apply
+5
View File
@@ -0,0 +1,5 @@
tensorboard
easydict
mxnet
onnx
sklearn
+9
View File
@@ -0,0 +1,9 @@
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
--master_addr="127.0.0.1" \
--master_port=12345 train.py $@
ps -ef | grep "train" | grep -v grep | awk '{print "kill -9 "$2}' | sh
+53
View File
@@ -0,0 +1,53 @@
import numpy as np
import onnx
import torch
def convert_onnx(net, path_module, output, opset=11, simplify=False):
assert isinstance(net, torch.nn.Module)
img = np.random.randint(0, 255, size=(112, 112, 3), dtype=np.int32)
img = img.astype(np.float)
img = (img / 255. - 0.5) / 0.5 # torch style norm
img = img.transpose((2, 0, 1))
img = torch.from_numpy(img).unsqueeze(0).float()
weight = torch.load(path_module)
net.load_state_dict(weight, strict=True)
net.eval()
torch.onnx.export(net, img, output, keep_initializers_as_inputs=False, verbose=False, opset_version=opset)
model = onnx.load(output)
graph = model.graph
graph.input[0].type.tensor_type.shape.dim[0].dim_param = 'None'
if simplify:
from onnxsim import simplify
model, check = simplify(model)
assert check, "Simplified ONNX model could not be validated"
onnx.save(model, output)
if __name__ == '__main__':
import os
import argparse
from backbones import get_model
parser = argparse.ArgumentParser(description='ArcFace PyTorch to onnx')
parser.add_argument('input', type=str, help='input backbone.pth file or path')
parser.add_argument('--output', type=str, default=None, help='output onnx path')
parser.add_argument('--network', type=str, default=None, help='backbone network')
parser.add_argument('--simplify', type=bool, default=False, help='onnx simplify')
args = parser.parse_args()
input_file = args.input
if os.path.isdir(input_file):
input_file = os.path.join(input_file, "model.pt")
assert os.path.exists(input_file)
# model_name = os.path.basename(os.path.dirname(input_file)).lower()
# params = model_name.split("_")
# if len(params) >= 3 and params[1] in ('arcface', 'cosface'):
# if args.network is None:
# args.network = params[2]
assert args.network is not None
print(args)
backbone_onnx = get_model(args.network, dropout=0)
if args.output is None:
args.output = os.path.join(os.path.dirname(args.input), "model.onnx")
convert_onnx(backbone_onnx, input_file, args.output, simplify=args.simplify)
+161
View File
@@ -0,0 +1,161 @@
import argparse
import logging
import os
import torch
from torch import distributed
from torch.utils.tensorboard import SummaryWriter
from backbones import get_model
from dataset import get_dataloader
from torch.utils.data import DataLoader
from lr_scheduler import PolyScheduler
from losses import CosFace, ArcFace
from partial_fc import PartialFC
from utils.utils_callbacks import CallBackLogging, CallBackVerification
from utils.utils_config import get_config
from utils.utils_logging import AverageMeter, init_logging
try:
world_size = int(os.environ["WORLD_SIZE"])
rank = int(os.environ["RANK"])
distributed.init_process_group("nccl")
except KeyError:
world_size = 1
rank = 0
distributed.init_process_group(
backend="nccl",
init_method="tcp://127.0.0.1:12584",
rank=rank,
world_size=world_size,
)
def main(args):
torch.cuda.set_device(args.local_rank)
cfg = get_config(args.config)
os.makedirs(cfg.output, exist_ok=True)
init_logging(rank, cfg.output)
summary_writer = (
SummaryWriter(log_dir=os.path.join(cfg.output, "tensorboard"))
if rank == 0
else None
)
train_loader = get_dataloader(
cfg.rec, local_rank=args.local_rank, batch_size=cfg.batch_size, dali=cfg.dali)
backbone = get_model(
cfg.network, dropout=0.0, fp16=cfg.fp16, num_features=cfg.embedding_size
).cuda()
backbone = torch.nn.parallel.DistributedDataParallel(
module=backbone, broadcast_buffers=False, device_ids=[args.local_rank])
backbone.train()
if cfg.loss == "arcface":
margin_loss = ArcFace()
elif cfg.loss == "cosface":
margin_loss = CosFace()
else:
raise
module_partial_fc = PartialFC(
margin_loss,
cfg.embedding_size,
cfg.num_classes,
cfg.sample_rate,
cfg.fp16
)
module_partial_fc.train().cuda()
# TODO the params of partial fc must be last in the params list
opt = torch.optim.SGD(
params=[
{"params": backbone.parameters(), },
{"params": module_partial_fc.parameters(), },
],
lr=cfg.lr,
momentum=0.9,
weight_decay=cfg.weight_decay
)
total_batch_size = cfg.batch_size * world_size
cfg.warmup_step = cfg.num_image // total_batch_size * cfg.warmup_epoch
cfg.total_step = cfg.num_image // total_batch_size * cfg.num_epoch
lr_scheduler = PolyScheduler(
optimizer=opt,
base_lr=cfg.lr,
max_steps=cfg.total_step,
warmup_steps=cfg.warmup_step
)
for key, value in cfg.items():
num_space = 25 - len(key)
logging.info(": " + key + " " * num_space + str(value))
callback_verification = CallBackVerification(
val_targets=cfg.val_targets, rec_prefix=cfg.rec, summary_writer=summary_writer
)
callback_logging = CallBackLogging(
frequent=cfg.frequent,
total_step=cfg.total_step,
batch_size=cfg.batch_size,
writer=summary_writer
)
loss_am = AverageMeter()
start_epoch = 0
global_step = 0
amp = torch.cuda.amp.grad_scaler.GradScaler(growth_interval=100)
for epoch in range(start_epoch, cfg.num_epoch):
if isinstance(train_loader, DataLoader):
train_loader.sampler.set_epoch(epoch)
for _, (img, local_labels) in enumerate(train_loader):
global_step += 1
local_embeddings = backbone(img)
loss: torch.Tensor = module_partial_fc(local_embeddings, local_labels, opt)
if cfg.fp16:
amp.scale(loss).backward()
amp.unscale_(opt)
torch.nn.utils.clip_grad_norm_(backbone.parameters(), 5)
amp.step(opt)
amp.update()
else:
loss.backward()
torch.nn.utils.clip_grad_norm_(backbone.parameters(), 5)
opt.step()
opt.zero_grad()
lr_scheduler.step()
with torch.no_grad():
loss_am.update(loss.item(), 1)
callback_logging(global_step, loss_am, epoch, cfg.fp16, lr_scheduler.get_last_lr()[0], amp)
if global_step % cfg.verbose == 0 and global_step > 200:
callback_verification(global_step, backbone)
path_pfc = os.path.join(cfg.output, "softmax_fc_gpu_{}.pt".format(rank))
torch.save(module_partial_fc.state_dict(), path_pfc)
if rank == 0:
path_module = os.path.join(cfg.output, "model.pt")
torch.save(backbone.module.state_dict(), path_module)
if cfg.dali:
train_loader.reset()
if rank == 0:
path_module = os.path.join(cfg.output, "model.pt")
torch.save(backbone.module.state_dict(), path_module)
distributed.destroy_process_group()
if __name__ == "__main__":
torch.backends.cudnn.benchmark = True
parser = argparse.ArgumentParser(description="Distributed Arcface Training in Pytorch")
parser.add_argument("config", type=str, help="py config file")
parser.add_argument("--local_rank", type=int, default=0, help="local_rank")
main(parser.parse_args())
View File
+71
View File
@@ -0,0 +1,71 @@
import os
import sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from menpo.visualize.viewmatplotlib import sample_colours_from_colourmap
from prettytable import PrettyTable
from sklearn.metrics import roc_curve, auc
with open(sys.argv[1], "r") as f:
files = f.readlines()
files = [x.strip() for x in files]
image_path = "/train_tmp/IJB_release/IJBC"
def read_template_pair_list(path):
pairs = pd.read_csv(path, sep=' ', header=None).values
t1 = pairs[:, 0].astype(np.int)
t2 = pairs[:, 1].astype(np.int)
label = pairs[:, 2].astype(np.int)
return t1, t2, label
p1, p2, label = read_template_pair_list(
os.path.join('%s/meta' % image_path,
'%s_template_pair_label.txt' % 'ijbc'))
methods = []
scores = []
for file in files:
methods.append(file)
scores.append(np.load(file))
methods = np.array(methods)
scores = dict(zip(methods, scores))
colours = dict(
zip(methods, sample_colours_from_colourmap(methods.shape[0], 'Set2')))
x_labels = [10 ** -6, 10 ** -5, 10 ** -4, 10 ** -3, 10 ** -2, 10 ** -1]
tpr_fpr_table = PrettyTable(['Methods'] + [str(x) for x in x_labels])
fig = plt.figure()
for method in methods:
fpr, tpr, _ = roc_curve(label, scores[method])
roc_auc = auc(fpr, tpr)
fpr = np.flipud(fpr)
tpr = np.flipud(tpr) # select largest tpr at same fpr
plt.plot(fpr,
tpr,
color=colours[method],
lw=1,
label=('[%s (AUC = %0.4f %%)]' %
(method.split('-')[-1], roc_auc * 100)))
tpr_fpr_row = []
tpr_fpr_row.append(method)
for fpr_iter in np.arange(len(x_labels)):
_, min_index = min(
list(zip(abs(fpr - x_labels[fpr_iter]), range(len(fpr)))))
tpr_fpr_row.append('%.2f' % (tpr[min_index] * 100))
tpr_fpr_table.add_row(tpr_fpr_row)
plt.xlim([10 ** -6, 0.1])
plt.ylim([0.3, 1.0])
plt.grid(linestyle='--', linewidth=1)
plt.xticks(x_labels)
plt.yticks(np.linspace(0.3, 1.0, 8, endpoint=True))
plt.xscale('log')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC on IJB')
plt.legend(loc="lower right")
print(tpr_fpr_table)
+110
View File
@@ -0,0 +1,110 @@
import logging
import os
import time
from typing import List
import torch
from eval import verification
from utils.utils_logging import AverageMeter
from torch.utils.tensorboard import SummaryWriter
from torch import distributed
class CallBackVerification(object):
def __init__(self, val_targets, rec_prefix, summary_writer=None, image_size=(112, 112)):
self.rank: int = distributed.get_rank()
self.highest_acc: float = 0.0
self.highest_acc_list: List[float] = [0.0] * len(val_targets)
self.ver_list: List[object] = []
self.ver_name_list: List[str] = []
if self.rank is 0:
self.init_dataset(val_targets=val_targets, data_dir=rec_prefix, image_size=image_size)
self.summary_writer = summary_writer
def ver_test(self, backbone: torch.nn.Module, global_step: int):
results = []
for i in range(len(self.ver_list)):
acc1, std1, acc2, std2, xnorm, embeddings_list = verification.test(
self.ver_list[i], backbone, 10, 10)
logging.info('[%s][%d]XNorm: %f' % (self.ver_name_list[i], global_step, xnorm))
logging.info('[%s][%d]Accuracy-Flip: %1.5f+-%1.5f' % (self.ver_name_list[i], global_step, acc2, std2))
self.summary_writer: SummaryWriter
self.summary_writer.add_scalar(tag=self.ver_name_list[i], scalar_value=acc2, global_step=global_step, )
if acc2 > self.highest_acc_list[i]:
self.highest_acc_list[i] = acc2
logging.info(
'[%s][%d]Accuracy-Highest: %1.5f' % (self.ver_name_list[i], global_step, self.highest_acc_list[i]))
results.append(acc2)
def init_dataset(self, val_targets, data_dir, image_size):
for name in val_targets:
path = os.path.join(data_dir, name + ".bin")
if os.path.exists(path):
data_set = verification.load_bin(path, image_size)
self.ver_list.append(data_set)
self.ver_name_list.append(name)
def __call__(self, num_update, backbone: torch.nn.Module):
if self.rank is 0 and num_update > 0:
backbone.eval()
self.ver_test(backbone, num_update)
backbone.train()
class CallBackLogging(object):
def __init__(self, frequent, total_step, batch_size, writer=None):
self.frequent: int = frequent
self.rank: int = distributed.get_rank()
self.world_size: int = distributed.get_world_size()
self.time_start = time.time()
self.total_step: int = total_step
self.batch_size: int = batch_size
self.writer = writer
self.init = False
self.tic = 0
def __call__(self,
global_step: int,
loss: AverageMeter,
epoch: int,
fp16: bool,
learning_rate: float,
grad_scaler: torch.cuda.amp.GradScaler):
if self.rank == 0 and global_step > 0 and global_step % self.frequent == 0:
if self.init:
try:
speed: float = self.frequent * self.batch_size / (time.time() - self.tic)
speed_total = speed * self.world_size
except ZeroDivisionError:
speed_total = float('inf')
time_now = (time.time() - self.time_start) / 3600
time_total = time_now / ((global_step + 1) / self.total_step)
time_for_end = time_total - time_now
if self.writer is not None:
self.writer.add_scalar('time_for_end', time_for_end, global_step)
self.writer.add_scalar('learning_rate', learning_rate, global_step)
self.writer.add_scalar('loss', loss.avg, global_step)
if fp16:
msg = "Speed %.2f samples/sec Loss %.4f LearningRate %.4f Epoch: %d Global Step: %d " \
"Fp16 Grad Scale: %2.f Required: %1.f hours" % (
speed_total, loss.avg, learning_rate, epoch, global_step,
grad_scaler.get_scale(), time_for_end
)
else:
msg = "Speed %.2f samples/sec Loss %.4f LearningRate %.4f Epoch: %d Global Step: %d " \
"Required: %1.f hours" % (
speed_total, loss.avg, learning_rate, epoch, global_step, time_for_end
)
logging.info(msg)
loss.reset()
self.tic = time.time()
else:
self.init = True
self.tic = time.time()
+16
View File
@@ -0,0 +1,16 @@
import importlib
import os.path as osp
def get_config(config_file):
assert config_file.startswith('configs/'), 'config file setting must start with configs/'
temp_config_name = osp.basename(config_file)
temp_module_name = osp.splitext(temp_config_name)[0]
config = importlib.import_module("configs.base")
cfg = config.config
config = importlib.import_module("configs.%s" % temp_module_name)
job_cfg = config.config
cfg.update(job_cfg)
if cfg.output is None:
cfg.output = osp.join('work_dirs', temp_module_name)
return cfg
+41
View File
@@ -0,0 +1,41 @@
import logging
import os
import sys
class AverageMeter(object):
"""Computes and stores the average and current value
"""
def __init__(self):
self.val = None
self.avg = None
self.sum = None
self.count = None
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def init_logging(rank, models_root):
if rank == 0:
log_root = logging.getLogger()
log_root.setLevel(logging.INFO)
formatter = logging.Formatter("Training: %(asctime)s-%(message)s")
handler_file = logging.FileHandler(os.path.join(models_root, "training.log"))
handler_stream = logging.StreamHandler(sys.stdout)
handler_file.setFormatter(formatter)
handler_stream.setFormatter(formatter)
log_root.addHandler(handler_file)
log_root.addHandler(handler_stream)
log_root.info('rank_id: %d' % rank)
+2 -2
View File
@@ -1,6 +1,6 @@
{
"breakpoint": [
1877,
29
31,
110
]
}
+258
View File
@@ -6726,3 +6726,261 @@ n002000\0058_02.jpg
n002000\0130_01.jpg
n002000\0135_01.jpg
n002000\0160_02.jpg
n000002\0054_01.jpg
n000002\0055_01.jpg
n000002\0138_01.jpg
n000002\0150_02.jpg
n000002\0208_01.jpg
n000002\0252_01.jpg
n000002\0273_01.jpg
n000002\0276_01.jpg
n000003\0024_01.jpg
n000003\0098_01.jpg
n000003\0219_01.jpg
n000004\0026_01.jpg
n000004\0084_01.jpg
n000004\0103_02.jpg
n000004\0118_01.jpg
n000004\0144_02.jpg
n000004\0155_01.jpg
n000004\0180_01.jpg
n000004\0231_01.jpg
n000004\0237_01.jpg
n000004\0239_01.jpg
n000004\0258_01.jpg
n000005\0138_01.jpg
n000005\0144_01.jpg
n000005\0287_01.jpg
n000006\0007_01.jpg
n000006\0014_01.jpg
n000006\0036_02.jpg
n000006\0091_01.jpg
n000006\0103_01.jpg
n000006\0281_01.jpg
n000006\0300_01.jpg
n000006\0351_01.jpg
n000006\0430_01.jpg
n000006\0519_01.jpg
n000007\0021_01.jpg
n000007\0042_01.jpg
n000007\0045_01.jpg
n000007\0050_02.jpg
n000007\0080_01.jpg
n000007\0086_01.jpg
n000007\0106_02.jpg
n000007\0115_01.jpg
n000007\0116_03.jpg
n000007\0119_01.jpg
n000007\0137_01.jpg
n000007\0140_02.jpg
n000007\0148_02.jpg
n000007\0174_01.jpg
n000007\0181_01.jpg
n000007\0182_02.jpg
n000007\0213_02.jpg
n000007\0226_02.jpg
n000007\0229_01.jpg
n000007\0432_01.jpg
n000008\0072_01.jpg
n000008\0297_01.jpg
n000010\0068_01.jpg
n000010\0069_01.jpg
n000010\0096_01.jpg
n000010\0150_02.jpg
n000010\0155_02.jpg
n000010\0223_01.jpg
n000011\0112_01.jpg
n000011\0142_02.jpg
n000011\0200_01.jpg
n000011\0217_01.jpg
n000011\0229_02.jpg
n000011\0291_02.jpg
n000012\0173_01.jpg
n000012\0180_01.jpg
n000012\0198_01.jpg
n000012\0282_01.jpg
n000012\0294_01.jpg
n000012\0307_01.jpg
n000012\0338_01.jpg
n000013\0029_06.jpg
n000013\0128_01.jpg
n000013\0132_01.jpg
n000013\0148_01.jpg
n000013\0190_02.jpg
n000013\0225_01.jpg
n000013\0277_01.jpg
n000013\0335_01.jpg
n000013\0337_01.jpg
n000013\0341_02.jpg
n000014\0163_01.jpg
n000015\0029_02.jpg
n000015\0059_01.jpg
n000015\0133_01.jpg
n000015\0243_02.jpg
n000015\0392_02.jpg
n000015\0393_01.jpg
n000015\0402_01.jpg
n000016\0189_01.jpg
n000016\0237_01.jpg
n000016\0266_01.jpg
n000016\0385_04.jpg
n000016\0391_01.jpg
n000016\0405_01.jpg
n000016\0477_02.jpg
n000016\0500_01.jpg
n000016\0503_01.jpg
n000016\0503_01.jpg
n000017\0123_02.jpg
n000017\0124_01.jpg
n000017\0163_01.jpg
n000017\0262_01.jpg
n000019\0038_01.jpg
n000019\0055_01.jpg
n000019\0061_01.jpg
n000019\0114_01.jpg
n000019\0130_02.jpg
n000019\0149_02.jpg
n000019\0170_01.jpg
n000019\0182_01.jpg
n000019\0219_01.jpg
n000019\0221_02.jpg
n000019\0234_02.jpg
n000019\0249_01.jpg
n000019\0259_01.jpg
n000019\0273_01.jpg
n000019\0306_01.jpg
n000019\0313_01.jpg
n000019\0333_01.jpg
n000019\0350_02.jpg
n000020\0006_01.jpg
n000020\0071_01.jpg
n000020\0074_02.jpg
n000020\0099_02.jpg
n000020\0379_01.jpg
n000020\0400_01.jpg
n000021\0120_02.jpg
n000021\0221_01.jpg
n000022\0051_01.jpg
n000022\0071_01.jpg
n000022\0146_02.jpg
n000022\0146_02.jpg
n000022\0236_01.jpg
n000023\0008_01.jpg
n000023\0078_01.jpg
n000023\0093_01.jpg
n000023\0133_01.jpg
n000023\0162_01.jpg
n000023\0198_01.jpg
n000023\0207_03.jpg
n000023\0269_02.jpg
n000023\0265_01.jpg
n000023\0280_01.jpg
n000023\0366_01.jpg
n000023\0389_01.jpg
n000024\0062_01.jpg
n000024\0073_01.jpg
n000024\0354_04.jpg
n000024\0409_01.jpg
n000025\0100_02.jpg
n000025\0274_02.jpg
n000026\0038_01.jpg
n000026\0041_01.jpg
n000026\0059_01.jpg
n000026\0062_01.jpg
n000026\0065_01.jpg
n000026\0082_02.jpg
n000026\0103_01.jpg
n000026\0137_01.jpg
n000026\0060_01.jpg
n000026\0179_03.jpg
n000026\0196_01.jpg
n000026\0248_01.jpg
n000026\0255_01.jpg
n000026\0273_01.jpg
n000026\0280_01.jpg
n000027\0023_02.jpg
n000027\0023_05.jpg
n000027\0115_01.jpg
n000027\0157_02.jpg
n000027\0171_01.jpg
n000027\0182_02.jpg
n000027\0211_02.jpg
n000027\0255_01.jpg
n000027\0274_04.jpg
n000027\0318_04.jpg
n000027\0326_01.jpg
n000027\0401_01.jpg
n000027\0402_01.jpg
n000027\0438_01.jpg
n000027\0442_01.jpg
n000027\0493_01.jpg
n000028\0040_04.jpg
n000028\0056_01.jpg
n000028\0134_01.jpg
n000028\0136_03.jpg
n000028\0138_01.jpg
n000028\0144_02.jpg
n000028\0156_01.jpg
n000028\0162_01.jpg
n000028\0168_01.jpg
n000028\0205_01.jpg
n000028\0220_01.jpg
n000028\0249_01.jpg
n000028\0300_01.jpg
n000028\0324_02.jpg
n000028\0343_01.jpg
n000028\0352_01.jpg
n000028\0384_01.jpg
n000028\0392_01.jpg
n000028\0408_02.jpg
n000028\0412_02.jpg
n000030\0112_01.jpg
n000030\0119_01.jpg
n000030\0156_01.jpg
n000030\0192_01.jpg
n000030\0195_01.jpg
n000030\0203_01.jpg
n000030\0218_02.jpg
n000030\0305_01.jpg
n000031\0025_01.jpg
n000031\0080_02.jpg
n000031\0141_01.jpg
n000031\0196_01.jpg
n000031\0215_01.jpg
n000031\0286_02.jpg
n000032\0085_01.jpg
n000032\0100_01.jpg
n000032\0100_02.jpg
n000032\0233_01.jpg
n000032\0261_01.jpg
n000032\0350_01.jpg
n000032\0374_01.jpg
n000032\0393_02.jpg
n000032\0428_01.jpg
n000032\0443_01.jpg
n000032\0459_01.jpg
n000032\0465_02.jpg
n000033\0031_01.jpg
n000033\0032_02.jpg
n000033\0034_01.jpg
n000033\0034_02.jpg
n000033\0080_01.jpg
n000033\0100_01.jpg
n000033\0100_02.jpg
n000033\0122_01.jpg
n000033\0164_02.jpg
n000033\0166_01.jpg
n000033\0250_02.jpg
n000033\0327_01.jpg
n000033\0337_01.jpg
n000034\0327_01.jpg
n000035\0072_02.jpg
n000035\0099_01.jpg
n000035\0132_03.jpg
n000035\0134_01.jpg
n000035\0150_01.jpg
n000035\0158_01.jpg
n000035\0159_02.jpg
n000035\0167_01.jpg
n000035\0170_01.jpg
n000035\0200_01.jpg
+17
View File
@@ -0,0 +1,17 @@
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
#############################################################
# File: test_arcface.py
# Created Date: Thursday March 17th 2022
# Author: Chen Xuanhong
# Email: chenxuanhongzju@outlook.com
# Last Modified: Thursday, 17th March 2022 12:34:57 am
# Modified By: Chen Xuanhong
# Copyright (c) 2022 Shanghai Jiao Tong University
#############################################################
import torch
if __name__ == "__main__":
arcface1 = torch.load("./arcface_ckpt/arcface_checkpoint.tar", map_location=torch.device("cpu"))
print(arcface1)
arcface = arcface1['model'].module
+9 -3
View File
@@ -5,7 +5,7 @@
# Created Date: Sunday January 9th 2022
# Author: Chen Xuanhong
# Email: chenxuanhongzju@outlook.com
# Last Modified: Tuesday, 15th February 2022 12:00:24 am
# Last Modified: Thursday, 17th March 2022 1:01:52 am
# Modified By: Chen Xuanhong
# Copyright (c) 2022 Shanghai Jiao Tong University
#############################################################
@@ -26,6 +26,8 @@ from torch_utils import training_stats
from torch_utils.ops import conv2d_gradfix
from torch_utils.ops import grid_sample_gradfix
from arcface_torch.backbones.iresnet import iresnet100
from utilities.plot import plot_batch
from losses.cos import cosin_metric
from train_scripts.trainer_multigpu_base import TrainerBase
@@ -95,8 +97,12 @@ def init_framework(config, reporter, device, rank):
reporter.writeInfo("Discriminator structure:")
reporter.writeModel(dis.__str__())
arcface1 = torch.load(config["arcface_ckpt"], map_location=torch.device("cpu"))
arcface = arcface1['model'].module
# arcface1 = torch.load(config["arcface_ckpt"], map_location=torch.device("cpu"))
# arcface = arcface1['model'].module
arcface = iresnet100(pretrained=False, fp16=False)
arcface.load_state_dict(torch.load(config["arcface_ckpt"], map_location='cpu'))
arcface.eval()
# train in GPU
File diff suppressed because it is too large Load Diff