Table of Contents
Fetching ...

Sample and Computation Redistribution for Efficient Face Detection

Jia Guo, Jiankang Deng, Alexandros Lattas, Stefanos Zafeiriou

TL;DR

The paper tackles efficient, high-accuracy face detection at VGA resolution by identifying training-data sampling and computation distribution as key levers. It introduces Sample Redistribution (SR) to bolster small-face training signals and Computation Redistribution (CR) to reallocate FLOPs across backbone, neck, and head via a two-step search, yielding the SCRFD family. Empirical results on WIDER FACE show substantial gains in both accuracy and speed, with SCRFD-34GF outperforming TinaFace while using far less compute, and SR/CR advantages persisting across low- and high-compute regimes. The work also provides implementation details and releases the code to facilitate further research in efficient, scale-aware face detection.

Abstract

Although tremendous strides have been made in uncontrolled face detection, efficient face detection with a low computation cost as well as high precision remains an open challenge. In this paper, we point out that training data sampling and computation distribution strategies are the keys to efficient and accurate face detection. Motivated by these observations, we introduce two simple but effective methods (1) Sample Redistribution (SR), which augments training samples for the most needed stages, based on the statistics of benchmark datasets; and (2) Computation Redistribution (CR), which reallocates the computation between the backbone, neck and head of the model, based on a meticulously defined search methodology. Extensive experiments conducted on WIDER FACE demonstrate the state-of-the-art efficiency-accuracy trade-off for the proposed \scrfd family across a wide range of compute regimes. In particular, \scrfdf{34} outperforms the best competitor, TinaFace, by $3.86\%$ (AP at hard set) while being more than \emph{3$\times$ faster} on GPUs with VGA-resolution images. We also release our code to facilitate future research.

Sample and Computation Redistribution for Efficient Face Detection

TL;DR

The paper tackles efficient, high-accuracy face detection at VGA resolution by identifying training-data sampling and computation distribution as key levers. It introduces Sample Redistribution (SR) to bolster small-face training signals and Computation Redistribution (CR) to reallocate FLOPs across backbone, neck, and head via a two-step search, yielding the SCRFD family. Empirical results on WIDER FACE show substantial gains in both accuracy and speed, with SCRFD-34GF outperforming TinaFace while using far less compute, and SR/CR advantages persisting across low- and high-compute regimes. The work also provides implementation details and releases the code to facilitate further research in efficient, scale-aware face detection.

Abstract

Although tremendous strides have been made in uncontrolled face detection, efficient face detection with a low computation cost as well as high precision remains an open challenge. In this paper, we point out that training data sampling and computation distribution strategies are the keys to efficient and accurate face detection. Motivated by these observations, we introduce two simple but effective methods (1) Sample Redistribution (SR), which augments training samples for the most needed stages, based on the statistics of benchmark datasets; and (2) Computation Redistribution (CR), which reallocates the computation between the backbone, neck and head of the model, based on a meticulously defined search methodology. Extensive experiments conducted on WIDER FACE demonstrate the state-of-the-art efficiency-accuracy trade-off for the proposed \scrfd family across a wide range of compute regimes. In particular, \scrfdf{34} outperforms the best competitor, TinaFace, by (AP at hard set) while being more than \emph{3 faster} on GPUs with VGA-resolution images. We also release our code to facilitate future research.

Paper Structure

This paper contains 11 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Performance-computation trade-off on the WIDER FACE validation hard set for different face detectors. Flops and APs are reported by using the VGA resolution ($640\times480$) during testing. The proposed SCRFD outperforms a range of state-of-the-art open-sourced methods by using much fewer flops.
  • Figure 2: (a) Precision-recall curves of TinaFace-ResNet50 on the WIDER FACE hard validation subset, under different testing scales. (b) Computation distribution of TinaFace on backbone, neck and head with $640\times480$ as the testing scale.
  • Figure 3: Cumulative face scale distribution on the WIDER FACE validation dataset (Easy $\subset$ Medium $\subset$ Hard). When the long edge is fixed as $640$ pixels, most of the easy faces are larger than $32 \times 32$, and most of the medium faces are larger than $16 \times 16$. For the hard track, 78.93% faces are smaller than $32 \times 32$, 51.85% faces are smaller than $16 \times 16$, and 13.36% faces are smaller than $8 \times 8$.
  • Figure 4: Ground-truth and positive anchor distribution within one epoch. The baseline method employs a random size from the set $[0.3, 1.0]$, while our method uses a random size from the set $[0.3, 2.0]$. The number of small faces ($<32\times32$) significantly increases after the large cropping strategy is used.
  • Figure 5: Computation redistribution on the backbone (stem, C2, C3, C4 and C5) with fixed neck and head under the constraint of 2.5 Gflops. For each component within the backbone, the range of computation ratio in which the best models may fall is estimated by the empirical bootstrap.
  • ...and 5 more figures