Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance
Simone Maurizio La Cava, Sara Concas, Ruben Tolosana, Roberto Casula, Giulia Orrù, Martin Drahansky, Julian Fierrez, Gian Luca Marcialis
TL;DR
This work tackles the challenge of face verification in surveillance by leveraging multiple 3D face reconstruction (3DFR) algorithms to generate diverse templates from 2D data. It evaluates three state-of-the-art 3DFR methods—EOS, 3DDFA v2, and NextFace—paired with two Siamese networks (VGG19 and Xception) and combines their output via score-level fusion, including the a posteriori probability $P( ext{match}|X,Y) = 1/(d+1)$. The experiments on the SCFace dataset show that while individual 3DFR-driven systems improve verification, averaging their scores yields the strongest performance in intra-settings and remains advantageous in cross-settings, where results generally degrade due to domain shifts. The findings support the viability of multi-3DFR fusion for robust face verification in challenging surveillance scenarios and motivate further exploration of additional fusion strategies and more 3DFR methods to enhance generalization.
Abstract
3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to distinct application scenarios. These assumptions limit their use when acquisition conditions, such as the subject's distance from the camera or the camera's characteristics, are different than expected, as typically happens in video surveillance. Additionally, 3DFR algorithms follow various strategies to address the reconstruction of a 3D shape from 2D data, such as statistical model fitting, photometric stereo, or deep learning. In the present study, we explore the application of three 3DFR algorithms representative of the SOTA, employing each one as the template set generator for a face verification system. The scores provided by each system are combined by score-level fusion. We show that the complementarity induced by different 3DFR algorithms improves performance when tests are conducted at never-seen-before distances from the camera and camera characteristics (cross-distance and cross-camera settings), thus encouraging further investigations on multiple 3DFR-based approaches.
