Table of Contents
Fetching ...

FDLite: A Single Stage Lightweight Face Detector Network

Yogesh Aggarwal, Prithwijit Guha

TL;DR

The novelty of this work lies in the design of a lightweight detector while training with only the commonly used loss functions and learning strategies, and the proposed face detector grossly follows the established RetinaFace architecture.

Abstract

Face detection is frequently attempted by using heavy pre-trained backbone networks like ResNet-50/101/152 and VGG16/19. Few recent works have also proposed lightweight detectors with customized backbones, novel loss functions and efficient training strategies. The novelty of this work lies in the design of a lightweight detector while training with only the commonly used loss functions and learning strategies. The proposed face detector grossly follows the established RetinaFace architecture. The first contribution of this work is the design of a customized lightweight backbone network (BLite) having 0.167M parameters with 0.52 GFLOPs. The second contribution is the use of two independent multi-task losses. The proposed lightweight face detector (FDLite) has 0.26M parameters with 0.94 GFLOPs. The network is trained on the WIDER FACE dataset. FDLite is observed to achieve 92.3\%, 89.8\%, and 82.2\% Average Precision (AP) on the easy, medium, and hard subsets of the WIDER FACE validation dataset, respectively.

FDLite: A Single Stage Lightweight Face Detector Network

TL;DR

The novelty of this work lies in the design of a lightweight detector while training with only the commonly used loss functions and learning strategies, and the proposed face detector grossly follows the established RetinaFace architecture.

Abstract

Face detection is frequently attempted by using heavy pre-trained backbone networks like ResNet-50/101/152 and VGG16/19. Few recent works have also proposed lightweight detectors with customized backbones, novel loss functions and efficient training strategies. The novelty of this work lies in the design of a lightweight detector while training with only the commonly used loss functions and learning strategies. The proposed face detector grossly follows the established RetinaFace architecture. The first contribution of this work is the design of a customized lightweight backbone network (BLite) having 0.167M parameters with 0.52 GFLOPs. The second contribution is the use of two independent multi-task losses. The proposed lightweight face detector (FDLite) has 0.26M parameters with 0.94 GFLOPs. The network is trained on the WIDER FACE dataset. FDLite is observed to achieve 92.3\%, 89.8\%, and 82.2\% Average Precision (AP) on the easy, medium, and hard subsets of the WIDER FACE validation dataset, respectively.
Paper Structure (9 sections, 2 equations, 5 figures, 3 tables)

This paper contains 9 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Face detection performance (Average Precision) of state-of-art models on Hard subset of WIDER FACE validation dataset. The Average Precision is plotted with respect to (a) floating point operations (GFLOPs) and (b) model parameters in millions (M). Note the performance of the proposed face detector -- $82.3\%$ AP with $0.24$M parameters and $0.94$ GFLOPs.
  • Figure 2: Illustrating the key components of the network architecture of the FDLite Face detector. Here, $a^i = 4\times2^i$ where $i \in \{1,2,3\}$
  • Figure 3: Illustrating the architecture of the customized backbone BLite along with its component units ($CBL$, $CDw$, $FRU$, $CL$ and $MP$).
  • Figure 4: The architecture of the detector head includes a sub-network featuring a convolution layer $C(1\times 1\times 32@3\times x;1,0,1)$, where $x$ represents a number of convolution filters (2, 4, and 10) for classification, bounding box regression, and landmark regression, respectively.
  • Figure 5: Qualitative results of the proposed face detector's performance under various challenging conditions on WIDER FACE dataset images.