Table of Contents
Fetching ...

SqueezeFacePoseNet: Lightweight Face Verification Across Different Poses for Mobile Platforms

Fernando Alonso-Fernandez, Javier Barrachina, Kevin Hernandez-Diaz, Josef Bigun

TL;DR

This work tackles cross-pose face verification on mobile devices by designing SqueezeFacePoseNet, a compact 4.41 MB network based on SqueezeNet that operates on 113 by 113 faces. Trained with a two-stage protocol (MS1M pretraining followed by VGGFace2 fine-tuning) and evaluated on the VGGFace2-Pose dataset, it demonstrates strong verification performance across pose variations, particularly when templates comprise five images. The study also explores further size reductions via depth-wise convolutions (down to 2.5 MB) and the impact of training data, showing that multi-image templates and joint MS1M+VGGFace2 pretraining yield the best results for cross-pose recognition on mobile hardware. Overall, the results indicate that lightweight CNNs can achieve practical cross-pose face verification on resource-constrained devices, guiding future improvements with residual connections and region-focused features for further gains.

Abstract

Virtual applications through mobile platforms are one of the most critical and ever-growing fields in AI, where ubiquitous and real-time person authentication has become critical after the breakthrough of all services provided via mobile devices. In this context, face verification technologies can provide reliable and robust user authentication, given the availability of cameras in these devices, as well as their widespread use in everyday applications. The rapid development of deep Convolutional Neural Networks has resulted in many accurate face verification architectures. However, their typical size (hundreds of megabytes) makes them infeasible to be incorporated in downloadable mobile applications where the entire file typically may not exceed 100 Mb. Accordingly, we address the challenge of developing a lightweight face recognition network of just a few megabytes that can operate with sufficient accuracy in comparison to much larger models. The network also should be able to operate under different poses, given the variability naturally observed in uncontrolled environments where mobile devices are typically used. In this paper, we adapt the lightweight SqueezeNet model, of just 4.4MB, to effectively provide cross-pose face recognition. After trained on the MS-Celeb-1M and VGGFace2 databases, our model achieves an EER of 1.23% on the difficult frontal vs. profile comparison, and0.54% on profile vs. profile images. Under less extreme variations involving frontal images in any of the enrolment/query images pair, EER is pushed down to<0.3%, and the FRR at FAR=0.1%to less than 1%. This makes our light model suitable for face recognition where at least acquisition of the enrolment image can be controlled. At the cost of a slight degradation in performance, we also test an even lighter model (of just 2.5MB) where regular convolutions are replaced with depth-wise separable convolutions.

SqueezeFacePoseNet: Lightweight Face Verification Across Different Poses for Mobile Platforms

TL;DR

This work tackles cross-pose face verification on mobile devices by designing SqueezeFacePoseNet, a compact 4.41 MB network based on SqueezeNet that operates on 113 by 113 faces. Trained with a two-stage protocol (MS1M pretraining followed by VGGFace2 fine-tuning) and evaluated on the VGGFace2-Pose dataset, it demonstrates strong verification performance across pose variations, particularly when templates comprise five images. The study also explores further size reductions via depth-wise convolutions (down to 2.5 MB) and the impact of training data, showing that multi-image templates and joint MS1M+VGGFace2 pretraining yield the best results for cross-pose recognition on mobile hardware. Overall, the results indicate that lightweight CNNs can achieve practical cross-pose face verification on resource-constrained devices, guiding future improvements with residual connections and region-focused features for further gains.

Abstract

Virtual applications through mobile platforms are one of the most critical and ever-growing fields in AI, where ubiquitous and real-time person authentication has become critical after the breakthrough of all services provided via mobile devices. In this context, face verification technologies can provide reliable and robust user authentication, given the availability of cameras in these devices, as well as their widespread use in everyday applications. The rapid development of deep Convolutional Neural Networks has resulted in many accurate face verification architectures. However, their typical size (hundreds of megabytes) makes them infeasible to be incorporated in downloadable mobile applications where the entire file typically may not exceed 100 Mb. Accordingly, we address the challenge of developing a lightweight face recognition network of just a few megabytes that can operate with sufficient accuracy in comparison to much larger models. The network also should be able to operate under different poses, given the variability naturally observed in uncontrolled environments where mobile devices are typically used. In this paper, we adapt the lightweight SqueezeNet model, of just 4.4MB, to effectively provide cross-pose face recognition. After trained on the MS-Celeb-1M and VGGFace2 databases, our model achieves an EER of 1.23% on the difficult frontal vs. profile comparison, and0.54% on profile vs. profile images. Under less extreme variations involving frontal images in any of the enrolment/query images pair, EER is pushed down to<0.3%, and the FRR at FAR=0.1%to less than 1%. This makes our light model suitable for face recognition where at least acquisition of the enrolment image can be controlled. At the cost of a slight degradation in performance, we also test an even lighter model (of just 2.5MB) where regular convolutions are replaced with depth-wise separable convolutions.

Paper Structure

This paper contains 9 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Internal architecture of a fire module. In this example, the squeeze layer has three 1$\times$1 filters, and the expand layer has four 1$\times$1 and four 3$\times$3 filters. Adapted from [Iandola16SqueezeNet].
  • Figure 2: Example images of the databases employed.
  • Figure 3: Evaluation protocols: same-pose (left) and cross-pose comparisons (right).
  • Figure 4: SqueezeFacePoseNet: Face verification results (same-pose comparisons). Better in colour.
  • Figure 5: ResNet50ft and SENet50ft (same-pose comparisons). Better in colour.
  • ...and 2 more figures