for every batch_size*16 samples, model collects the samples with the highest error and learns them again therefore hard samples will be trained more often
for every batch_size*16 samples, model collects the samples with the highest error and learns them again therefore hard samples will be trained more often