Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition
Junzheng Zhang, Weijia Guo, Bochao Liu, Ruixin Shi, Yong Li, Shiming Ge
TL;DR
This work tackles very low-resolution face recognition by introducing a generative-discriminative representation distillation (GDRD) framework. It first distills general generative knowledge from a diffusion-model encoder via $L_{ ext{gen}}$ to teach the backbone, then transfers discriminative knowledge from a high-resolution recognizer through cross-resolution relational distillation with $L_{ ext{rcd}}$, $L_{ ext{kd}}$, and $L_{ ext{cls}}$ to supervise the head, all within a progressive, module-wise training scheme. The authors validate GDRD on four benchmarks, reporting state-of-the-art performance on 16×16 inputs for verification, identification, and retrieval, with robust results under occlusion and illumination changes. The approach offers a practical path to robust low-resolution face recognition by bridging generative detail recovery and discriminative alignment in a two-stage distillation process.
Abstract
Very low-resolution face recognition is challenging due to the serious loss of informative facial details in resolution degradation. In this paper, we propose a generative-discriminative representation distillation approach that combines generative representation with cross-resolution aligned knowledge distillation. This approach facilitates very low-resolution face recognition by jointly distilling generative and discriminative models via two distillation modules. Firstly, the generative representation distillation takes the encoder of a diffusion model pretrained for face super-resolution as the generative teacher to supervise the learning of the student backbone via feature regression, and then freezes the student backbone. After that, the discriminative representation distillation further considers a pretrained face recognizer as the discriminative teacher to supervise the learning of the student head via cross-resolution relational contrastive distillation. In this way, the general backbone representation can be transformed into discriminative head representation, leading to a robust and discriminative student model for very low-resolution face recognition. Our approach improves the recovery of the missing details in very low-resolution faces and achieves better knowledge transfer. Extensive experiments on face datasets demonstrate that our approach enhances the recognition accuracy of very low-resolution faces, showcasing its effectiveness and adaptability.
