Table of Contents
Fetching ...

Masked Face Recognition with Generative-to-Discriminative Representations

Shiming Ge, Weijia Guo, Chenyu Li, Junzheng Zhang, Yong Li, Dan Zeng

TL;DR

Masked face recognition is challenged by occlusions that degrade information needed for identity prediction. The authors propose a generative-to-discriminative framework (G2D) that cascades a generative encoder for mask-robust representations with a discriminative reformer to recover identity cues, followed by a lightweight classifier head. The backbone is learned via greedy, module-wise pretraining on synthetic masked faces using self-supervised losses and relational distillation from a pretrained teacher, yielding occlusion-robust and identity-discriminative representations. Across synthetic LFW and realistic RMFD/MLFW benchmarks, G2D achieves state-of-the-art performance while maintaining competitive normal-face recognition and practical inference speed, highlighting its potential for real-world safety and governance applications.

Abstract

Masked face recognition is important for social good but challenged by diverse occlusions that cause insufficient or inaccurate representations. In this work, we propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition. To this end, we split the network into three modules and learn them on synthetic masked faces in a greedy module-wise pretraining manner. First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors. Attribute to the generative encoder's ability in recovering context information, the resulting descriptors can provide occlusion-robust representations for masked faces, mitigating the effect of diverse masks. Then, we incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors, where the learning is effectively supervised by distilling relation knowledge from off-the-shelf face recognition model. In this way, the discriminative reformer together with the generative encoder serves as the pretrained backbone, providing general and discriminative representations towards masked faces. Finally, we cascade one fully-connected layer following by one softmax layer into a feature classifier and finetune it to identify the reformed identity-aware vectors. Extensive experiments on synthetic and realistic datasets demonstrate the effectiveness of our approach in recognizing masked faces.

Masked Face Recognition with Generative-to-Discriminative Representations

TL;DR

Masked face recognition is challenged by occlusions that degrade information needed for identity prediction. The authors propose a generative-to-discriminative framework (G2D) that cascades a generative encoder for mask-robust representations with a discriminative reformer to recover identity cues, followed by a lightweight classifier head. The backbone is learned via greedy, module-wise pretraining on synthetic masked faces using self-supervised losses and relational distillation from a pretrained teacher, yielding occlusion-robust and identity-discriminative representations. Across synthetic LFW and realistic RMFD/MLFW benchmarks, G2D achieves state-of-the-art performance while maintaining competitive normal-face recognition and practical inference speed, highlighting its potential for real-world safety and governance applications.

Abstract

Masked face recognition is important for social good but challenged by diverse occlusions that cause insufficient or inaccurate representations. In this work, we propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition. To this end, we split the network into three modules and learn them on synthetic masked faces in a greedy module-wise pretraining manner. First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors. Attribute to the generative encoder's ability in recovering context information, the resulting descriptors can provide occlusion-robust representations for masked faces, mitigating the effect of diverse masks. Then, we incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors, where the learning is effectively supervised by distilling relation knowledge from off-the-shelf face recognition model. In this way, the discriminative reformer together with the generative encoder serves as the pretrained backbone, providing general and discriminative representations towards masked faces. Finally, we cascade one fully-connected layer following by one softmax layer into a feature classifier and finetune it to identify the reformed identity-aware vectors. Extensive experiments on synthetic and realistic datasets demonstrate the effectiveness of our approach in recognizing masked faces.
Paper Structure (15 sections, 9 equations, 9 figures, 4 tables)

This paper contains 15 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Our approach learns generative-to-discriminative representations for masked face recognition, which combines the advantages of generative representations and discriminative representations, providing general and robust solution to recover missing clues and capture identity-related characteristics.
  • Figure 2: The framework of the proposed approach. It cascades three modules into a unified network and learns generative-to-discriminative representations on synthetic masked faces in a progressive manner. The approach first finetunes a generative encoder to represent a masked face into category-aware descriptors by initializing with a pretrained face inpainting model and finetuning via self-supervised pixel reconstruction. Then, it learns a CNN-based discriminative reformer to convert the category-aware descriptors into an identity-aware vector by distilling a general pretrained face recognizer via self-supervised relation-based feature approximation. Finally, it learns a feature classifier on identity-aware vectors by optimizing supervised classification task.
  • Figure 3: The t-SNE visualization of representations. We randomly sample five identities, use all sample images with these identities to synthesize masked faces with five random mask types, and extract generative and discriminative representations of masked faces. Generative representations are robust towards diverse mask occlusions but short in inter- and intra-identity discriminablility, while discriminative representations show good identity discriminablility. Bottom: some synthetic masked faces.
  • Figure 4: Evaluation on synthetic masked LFW. We report the accuracy of the proposed method (G2D), and make comparisons with combinations of general face recognizers (CenterLoss wen2016discriminative or CL, VGGFace parkhi2015deep or VGG, ArcFace deng2019arcface or AF, and VGGFace2 cao2018vggface2 or VGG2), and state-of-the-art generative face inpainting approaches (GFC li2017generative, DeepFill yu2018generative, IDGAN ge2020tcsvt and ICT wan2021high).
  • Figure 5: Verificaiton accuracy (%) on MLFW wang2022ccbr. AF: ArcFace deng2019arcface, CF: CosFace wang2018cvpr, CuF: CurricularFace huang2020cvpr, SF: SFace zhong2021tip. P: Private-Asia, W: WebFace, V: VGGFace2, M: MS1MV2. 50 means ResNet50 and 100 means ResNet100.
  • ...and 4 more figures