Look Through Masks: Towards Masked Face Recognition with De-Occlusion Distillation
Chenyu Li, Shiming Ge, Daichi Zhang, Jia Li
TL;DR
This work tackles masked face recognition by integrating amodal perception-inspired de-occlusion with relational knowledge distillation. A GAN-based de-occlusion module performs explicit face completion to recover appearance under masks, while a teacher-student distillation transfers structural relations from a pre-trained unmasked-face recognizer to a student operating on completed faces. The distillation uses instance-, pair-, and triplet-wise relational losses, with an identity-centered, softened instance-wise variant to improve stability when domain shift occurs. Evaluations on synthetic (CelebA, LFW) and realistic (AR) masked-face datasets show consistent accuracy gains over baselines, demonstrating the practical value of combining amodal completion with structured knowledge transfer for robust masked-face recognition.
Abstract
Many real-world applications today like video surveillance and urban governance need to address the recognition of masked faces, where content replacement by diverse masks often brings in incomplete appearance and ambiguous representation, leading to a sharp drop in accuracy. Inspired by recent progress on amodal perception, we propose to migrate the mechanism of amodal completion for the task of masked face recognition with an end-to-end de-occlusion distillation framework, which consists of two modules. The \textit{de-occlusion} module applies a generative adversarial network to perform face completion, which recovers the content under the mask and eliminates appearance ambiguity. The \textit{distillation} module takes a pre-trained general face recognition model as the teacher and transfers its knowledge to train a student for completed faces using massive online synthesized face pairs. Especially, the teacher knowledge is represented with structural relations among instances in multiple orders, which serves as a posterior regularization to enable the adaptation. In this way, the knowledge can be fully distilled and transferred to identify masked faces. Experiments on synthetic and realistic datasets show the efficacy of the proposed approach.
