Table of Contents
Fetching ...

Learning Face Representation from Scratch

Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li

TL;DR

The authors address the scarcity of public large-scale training data for face recognition by introducing CASIA-WebFace, a semi-automatically collected IMDb-based dataset with 10,575 subjects and 494,414 faces. They train an 11-layer CNN using a joint identification and verification loss to learn a compact, discriminative 320-dim face representation. Evaluations on LFW and YouTube Faces show strong performance, with a single network achieving competitive or superior results to some ensemble methods and the BLUFR protocol highlighting robustness at low false-alarm rates. This work provides a public benchmark to standardize evaluation and accelerate progress in face recognition in the wild.

Abstract

Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database, we use a 11-layer CNN to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will attract more research groups entering this field and accelerate the development of face recognition in the wild.

Learning Face Representation from Scratch

TL;DR

The authors address the scarcity of public large-scale training data for face recognition by introducing CASIA-WebFace, a semi-automatically collected IMDb-based dataset with 10,575 subjects and 494,414 faces. They train an 11-layer CNN using a joint identification and verification loss to learn a compact, discriminative 320-dim face representation. Evaluations on LFW and YouTube Faces show strong performance, with a single network achieving competitive or superior results to some ensemble methods and the BLUFR protocol highlighting robustness at low false-alarm rates. This work provides a public benchmark to standardize evaluation and accelerate progress in face recognition in the wild.

Abstract

Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database, we use a 11-layer CNN to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will attract more research groups entering this field and accelerate the development of face recognition in the wild.

Paper Structure

This paper contains 17 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: A sample page of David Fincher on IMDb. The "main photo" is used as initial seed and the 58 photos in the "photo gallery" need to be annotated.
  • Figure 2: Two sample photos of Ben Affleck containing multiple faces. The name tags corresponded to the photo are shown at the left-bottom of photo. The left photo contains 3 faces and is corresponded to 3 names, but 2 faces are not detected (white rectangles). The right photo contains 3 faces but is only corresponded to 2 names. The woman in the right figure is not annotated (yellow rectangle).
  • Figure 3: The proposed baseline convolutional network with many recent tricks.
  • Figure 4: Face image alignment and augmentation. The read circles on the face are two selected landmarks for similarity transformation.