RetinaFace: Single-stage Dense Face Localisation in the Wild

Jiankang Deng; Jia Guo; Yuxiang Zhou; Jinke Yu; Irene Kotsia; Stefanos Zafeiriou

RetinaFace: Single-stage Dense Face Localisation in the Wild

Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou

TL;DR

RetinaFace introduces a single-stage, pixel-wise dense face localisation framework that combines face detection, five landmark regression, and a self-supervised dense 3D face branch within a multi-task loss. By leveraging extra landmark annotations and a mesh-decoder-based dense regression with graph convolutions and differentiable rendering, it achieves state-of-the-art results on the challenging WIDER FACE Hard subset (AP 91.4%) and improves face verification performance (IJB-C TAR 89.59% at FAR 1e-6) when used with ArcFace. The approach remains efficient enough for real-time CPU inference with lightweight backbones, and the authors provide extensive ablations, demonstrations on landmark and dense regression benefits, and publicly release data and code. These contributions advance robust, scalable face localisation in the wild and its downstream recognition tasks.

Abstract

Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by 1.1% (achieving AP equal to 91.4%). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code have been made available at: https://github.com/deepinsight/insightface/tree/master/RetinaFace.

RetinaFace: Single-stage Dense Face Localisation in the Wild

TL;DR

Abstract

RetinaFace: Single-stage Dense Face Localisation in the Wild

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)