Table of Contents
Fetching ...

VGGFace2: A dataset for recognising faces across pose and age

Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman

TL;DR

This work introduces VGGFace2, a large-scale face dataset designed to capture substantial pose and age variation with low label noise. It details a six-stage collection and filtering pipeline to assemble 9,131 identities and 3.31 million images, plus template-based pose and age annotations for evaluation. Experiments demonstrate that CNNs trained on VGGFace2 achieve state-of-the-art results on the IJB-A/B/C benchmarks, with additional gains when pretraining on MS-Celeb-1M is used. The dataset and trained models are publicly available, enabling broader assessment of pose- and age-robust face recognition.

Abstract

In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS- Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A, IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin. Datasets and models are publicly available.

VGGFace2: A dataset for recognising faces across pose and age

TL;DR

This work introduces VGGFace2, a large-scale face dataset designed to capture substantial pose and age variation with low label noise. It details a six-stage collection and filtering pipeline to assemble 9,131 identities and 3.31 million images, plus template-based pose and age annotations for evaluation. Experiments demonstrate that CNNs trained on VGGFace2 achieve state-of-the-art results on the IJB-A/B/C benchmarks, with additional gains when pretraining on MS-Celeb-1M is used. The dataset and trained models are publicly available, enabling broader assessment of pose- and age-robust face recognition.

Abstract

In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS- Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A, IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin. Datasets and models are publicly available.

Paper Structure

This paper contains 20 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: (a-b) VGGFace2 poses and ages statistics. (c-j) example images for eight subjects with different ethnicities.
  • Figure 2: VGGFace2 template examples. Left: pose templates from three different viewpoints (arranged by row) -- frontal, three-quarter, profile. Right: age templates for two subjects for young and mature ages (arranged by row).
  • Figure 3: Histograms of similarity scores for front-to-profile matching for the models trained on different datasets.
  • Figure 4: Two example templates of front-to-profile matching. Left: the similarity scores produced by VGGFace, MS1M, VGGFace2 are $0.41$, $0.35$ and $0.59$, respectively; Right: the scores are $0.41$, $0.31$ and $0.57$, respectively.
  • Figure 5: Histograms of similarity scores for young-to-mature matching for the models trained on different datasets.
  • ...and 6 more figures