Table of Contents
Fetching ...

Effective Adapter for Face Recognition in the Wild

Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

TL;DR

The paper tackles face recognition in the wild where real-world degradations create a domain gap between low-quality probes and high-quality galleries. It introduces an effective adapter that processes both LQ and restored HQ images using two similar branches (one fixed and one trainable) and fuses their representations with a nested Cross-Attention and Self-Attention Fusion Structure, anchored by a residual connection to preserve baseline LQ information. Trained with Angular Margin Softmax on a frozen HQ backbone, the method achieves zero-shot improvements of roughly 3%, 4%, and 7% on LFW, CFP-FP, and AgeDB across synthetic and real-world degradation, and code will be publicly released. The approach preserves prior knowledge while bridging the LQ–HQ gap, offering practical benefits for real-world recognition under atmospheric turbulence and similar degradations.

Abstract

In this paper, we tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions. Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains. To overcome these issues, we propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets. The key of our adapter is to process both the unrefined and enhanced images using two similar structures, one fixed and the other trainable. Such design can confer two benefits. First, the dual-input system minimizes the domain gap while providing varied perspectives for the face recognition model, where the enhanced image can be regarded as a complex non-linear transformation of the original one by the restoration model. Second, both two similar structures can be initialized by the pre-trained models without dropping the past knowledge. The extensive experiments in zero-shot settings show the effectiveness of our method by surpassing baselines of about 3%, 4%, and 7% in three datasets. Our code will be publicly available.

Effective Adapter for Face Recognition in the Wild

TL;DR

The paper tackles face recognition in the wild where real-world degradations create a domain gap between low-quality probes and high-quality galleries. It introduces an effective adapter that processes both LQ and restored HQ images using two similar branches (one fixed and one trainable) and fuses their representations with a nested Cross-Attention and Self-Attention Fusion Structure, anchored by a residual connection to preserve baseline LQ information. Trained with Angular Margin Softmax on a frozen HQ backbone, the method achieves zero-shot improvements of roughly 3%, 4%, and 7% on LFW, CFP-FP, and AgeDB across synthetic and real-world degradation, and code will be publicly released. The approach preserves prior knowledge while bridging the LQ–HQ gap, offering practical benefits for real-world recognition under atmospheric turbulence and similar degradations.

Abstract

In this paper, we tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions. Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains. To overcome these issues, we propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets. The key of our adapter is to process both the unrefined and enhanced images using two similar structures, one fixed and the other trainable. Such design can confer two benefits. First, the dual-input system minimizes the domain gap while providing varied perspectives for the face recognition model, where the enhanced image can be regarded as a complex non-linear transformation of the original one by the restoration model. Second, both two similar structures can be initialized by the pre-trained models without dropping the past knowledge. The extensive experiments in zero-shot settings show the effectiveness of our method by surpassing baselines of about 3%, 4%, and 7% in three datasets. Our code will be publicly available.
Paper Structure (19 sections, 5 equations, 11 figures, 10 tables)

This paper contains 19 sections, 5 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: In real-world applications, face recognition systems frequently encounter probe images of low quality (LQ), which presents a significant domain gap compared to the high-quality (HQ) embedding gallery. Our method (c) addresses this challenge by integrating the features from the LQ images with those of enhanced HQ images in the fusion structure. Compared with conventional methods (a) (b), our method effectively bridges the domain gap, ensuring more accurate and reliable face recognition performance in real-world conditions. In this figure, we can manifest the difference between past recognition and in-the-wild settings. So we have shown the difficulties and then propose our method.
  • Figure 2: Joint Face Recognition Framework with Dual-Input Processing. This architecture processes both low-quality (LQ) and restored high-quality (HQ) images, extracting them by two identical face recognition models. The Fusion Structure integrates feature sets before passing them to an Angular Margin Softmax function for loss computation, optimizing the network for enhanced recognition accuracy.
  • Figure 3: Verification performance (%) using different face restoration methods on LFW, CFP-FP, and AgeDB with different Degradation Intensity.
  • Figure 4: Comparison between the two different attention orders in Fusion Structure. Our method uses (b) Cross-Attention First.
  • Figure 5: Comparison of methods for different orders of input in Cross-Attention: (a), (b) are the methods for Feature as Query or Key and Value in single Cross-Attention input. (c), (d) are methods in nested Cross-Attention.
  • ...and 6 more figures