Table of Contents
Fetching ...

Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance

Simone Maurizio La Cava, Sara Concas, Ruben Tolosana, Roberto Casula, Giulia Orrù, Martin Drahansky, Julian Fierrez, Gian Luca Marcialis

TL;DR

This work tackles the challenge of face verification in surveillance by leveraging multiple 3D face reconstruction (3DFR) algorithms to generate diverse templates from 2D data. It evaluates three state-of-the-art 3DFR methods—EOS, 3DDFA v2, and NextFace—paired with two Siamese networks (VGG19 and Xception) and combines their output via score-level fusion, including the a posteriori probability $P( ext{match}|X,Y) = 1/(d+1)$. The experiments on the SCFace dataset show that while individual 3DFR-driven systems improve verification, averaging their scores yields the strongest performance in intra-settings and remains advantageous in cross-settings, where results generally degrade due to domain shifts. The findings support the viability of multi-3DFR fusion for robust face verification in challenging surveillance scenarios and motivate further exploration of additional fusion strategies and more 3DFR methods to enhance generalization.

Abstract

3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to distinct application scenarios. These assumptions limit their use when acquisition conditions, such as the subject's distance from the camera or the camera's characteristics, are different than expected, as typically happens in video surveillance. Additionally, 3DFR algorithms follow various strategies to address the reconstruction of a 3D shape from 2D data, such as statistical model fitting, photometric stereo, or deep learning. In the present study, we explore the application of three 3DFR algorithms representative of the SOTA, employing each one as the template set generator for a face verification system. The scores provided by each system are combined by score-level fusion. We show that the complementarity induced by different 3DFR algorithms improves performance when tests are conducted at never-seen-before distances from the camera and camera characteristics (cross-distance and cross-camera settings), thus encouraging further investigations on multiple 3DFR-based approaches.

Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance

TL;DR

This work tackles the challenge of face verification in surveillance by leveraging multiple 3D face reconstruction (3DFR) algorithms to generate diverse templates from 2D data. It evaluates three state-of-the-art 3DFR methods—EOS, 3DDFA v2, and NextFace—paired with two Siamese networks (VGG19 and Xception) and combines their output via score-level fusion, including the a posteriori probability . The experiments on the SCFace dataset show that while individual 3DFR-driven systems improve verification, averaging their scores yields the strongest performance in intra-settings and remains advantageous in cross-settings, where results generally degrade due to domain shifts. The findings support the viability of multi-3DFR fusion for robust face verification in challenging surveillance scenarios and motivate further exploration of additional fusion strategies and more 3DFR methods to enhance generalization.

Abstract

3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to distinct application scenarios. These assumptions limit their use when acquisition conditions, such as the subject's distance from the camera or the camera's characteristics, are different than expected, as typically happens in video surveillance. Additionally, 3DFR algorithms follow various strategies to address the reconstruction of a 3D shape from 2D data, such as statistical model fitting, photometric stereo, or deep learning. In the present study, we explore the application of three 3DFR algorithms representative of the SOTA, employing each one as the template set generator for a face verification system. The scores provided by each system are combined by score-level fusion. We show that the complementarity induced by different 3DFR algorithms improves performance when tests are conducted at never-seen-before distances from the camera and camera characteristics (cross-distance and cross-camera settings), thus encouraging further investigations on multiple 3DFR-based approaches.
Paper Structure (15 sections, 4 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 4 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Example of 3D face reconstruction from a single 2D image using EOS eos.
  • Figure 2: Proposed method. The synthetic view generation produces a 2D image from the 3D template (i.e., with various view angles obtained from gallery enlargement during the system's training, only in frontal view during inference). The Siamese Neural Networks (same architecture) provide complementary information as they are enhanced through different 3DFR algorithms (EOS, 3DDFA v2, or NextFace). The example images are from the SCface database SCFace.
  • Figure 3: Examples of personalized 3D templates generated from a mugshot in the SCface database SCFace (a), through EOS eos (b), 3DDFA v2 guo2020towards (c), and NextFace dib2021practical (d).
  • Figure 4: Example of gallery enlargement from personalized 3D template obtained from a mugshot in the SCface database SCFace using the EOS eos 3DFR algorithm.
  • Figure 5: Example of the proposed face verification system using a Siamese architecture, introducing as input the probe image and the facial representation in a non-frontal view obtained from a mugshot image through the EOS method eos. The images are related to a subject in the SCface database SCFace. EMB refers to the feature embeddings obtained from the backbone (i.e., VGG19 or XceptionNet). Only frontal views are used in the final inference stage.
  • ...and 3 more figures