Table of Contents
Fetching ...

Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis

Zijian Chen, Mei Wang, Weihong Deng, Hongzhi Shi, Dongchao Wen, Yingjie Zhang, Xingchen Cui, Jian Zhao

TL;DR

The paper tackles RGB-D face recognition under real-world constraints where paired RGB-D data is scarce. It introduces a domain-independent pre-training framework that uses separate RGB and depth branches, with a large virtual depth dataset generated from a 3D Morphable Model to train the depth pathway, combined with an Adaptive Confidence Weighting (ACW) mechanism for score-level fusion. Key contributions include the virtual-depth pretraining strategy, a lightweight, plug-in ACW for modality fusion, and demonstrated state-of-the-art performance on Lock3DFace as well as competitive results on IIIT-D and Bosphorus, including robustness to challenging scenarios. This approach reduces data dependencies, avoids cross-modal fine-tuning, and enhances practical RGB-D recognition in noisy, occluded, or low-quality conditions.

Abstract

2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. However, collecting sufficient paired RGB-D training data is expensive and time-consuming, hindering wide deployment. In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training. Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining. To seamlessly integrate the two distinct networks and harness the complementary benefits of RGB and depth information for improved accuracy, we propose an innovative Adaptive Confidence Weighting (ACW). This mechanism is designed to learn confidence estimates for each modality to achieve modality fusion at the score level. Our method is simple and lightweight, only requiring ACW training beyond the backbone models. Experiments on multiple public RGB-D face recognition benchmarks demonstrate state-of-the-art performance surpassing previous methods based on depth estimation and feature fusion, validating the efficacy of our approach.

Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis

TL;DR

The paper tackles RGB-D face recognition under real-world constraints where paired RGB-D data is scarce. It introduces a domain-independent pre-training framework that uses separate RGB and depth branches, with a large virtual depth dataset generated from a 3D Morphable Model to train the depth pathway, combined with an Adaptive Confidence Weighting (ACW) mechanism for score-level fusion. Key contributions include the virtual-depth pretraining strategy, a lightweight, plug-in ACW for modality fusion, and demonstrated state-of-the-art performance on Lock3DFace as well as competitive results on IIIT-D and Bosphorus, including robustness to challenging scenarios. This approach reduces data dependencies, avoids cross-modal fine-tuning, and enhances practical RGB-D recognition in noisy, occluded, or low-quality conditions.

Abstract

2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. However, collecting sufficient paired RGB-D training data is expensive and time-consuming, hindering wide deployment. In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training. Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining. To seamlessly integrate the two distinct networks and harness the complementary benefits of RGB and depth information for improved accuracy, we propose an innovative Adaptive Confidence Weighting (ACW). This mechanism is designed to learn confidence estimates for each modality to achieve modality fusion at the score level. Our method is simple and lightweight, only requiring ACW training beyond the backbone models. Experiments on multiple public RGB-D face recognition benchmarks demonstrate state-of-the-art performance surpassing previous methods based on depth estimation and feature fusion, validating the efficacy of our approach.
Paper Structure (13 sections, 6 equations, 5 figures, 6 tables)

This paper contains 13 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: A comparison of our network and other networks.
  • Figure 2: Overview of our proposed method. During training of ACW, two branches freeze the main backbones and are trained completely independently. During inference, RGB and depth are extracted to the features $f_r$ and $f_d$ respectively and fed into MLP to obtain confidence levels $c_r$ and $c_d$. The cosine similarity of the two modalities is weighted and fused using the confidence level, and the final score $S$ is obtained.
  • Figure 3: Generated depth images showing variations in identity, expression, and pose. Each row represents samples with different identities, expressions, and poses, respectively.
  • Figure 4: Simulating challenging scenarios. (a) Gamma correction algorithm simulates low-light scenarios. (b) Scaling down the images simulates far-distance scenarios.
  • Figure 5: The confidence of the two modalities obtained by our method on the Lock3DFace dataset. It demonstrates a strong correlation between confidence and image quality.