Table of Contents
Fetching ...

ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition

Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei, Nasser M. Nasrabadi

TL;DR

This work tackles the drop in face recognition performance on low-quality images caused in part by Face Alignment Errors (FAE). It introduces ARoFace, a plug-and-play training framework that pairs a differentiable spatial transformer with adversarial data augmentation to generate FAE-like samples during training, without requiring target datasets or GANs. By formalizing FAE as a FR-specific degradation and constraining perturbations via a per-landmark flow-based budget, ARoFace achieves state-of-the-art results on IJB-B, IJB-C, TinyFace, and IJB-S benchmarks while maintaining HQ performance and incurring minimal parameter overhead. The approach offers a practical, scalable method to enhance low-quality FR in real-world deployments, with broad compatibility across angular-margin losses and minimal computational burden beyond standard adversarial training.

Abstract

Aiming to enhance Face Recognition (FR) on Low-Quality (LQ) inputs, recent studies suggest incorporating synthetic LQ samples into training. Although promising, the quality factors that are considered in these works are general rather than FR-specific, \eg, atmospheric turbulence, resolution, \etc. Motivated by the observation of the vulnerability of current FR models to even small Face Alignment Errors (FAE) in LQ images, we present a simple yet effective method that considers FAE as another quality factor that is tailored to FR. We seek to improve LQ FR by enhancing FR models' robustness to FAE. To this aim, we formalize the problem as a combination of differentiable spatial transformations and adversarial data augmentation in FR. We perturb the alignment of the training samples using a controllable spatial transformation and enrich the training with samples expressing FAE. We demonstrate the benefits of the proposed method by conducting evaluations on IJB-B, IJB-C, IJB-S (+4.3\% Rank1), and TinyFace (+2.63\%). \href{https://github.com/msed-Ebrahimi/ARoFace}{https://github.com/msed-Ebrahimi/ARoFace}

ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition

TL;DR

This work tackles the drop in face recognition performance on low-quality images caused in part by Face Alignment Errors (FAE). It introduces ARoFace, a plug-and-play training framework that pairs a differentiable spatial transformer with adversarial data augmentation to generate FAE-like samples during training, without requiring target datasets or GANs. By formalizing FAE as a FR-specific degradation and constraining perturbations via a per-landmark flow-based budget, ARoFace achieves state-of-the-art results on IJB-B, IJB-C, TinyFace, and IJB-S benchmarks while maintaining HQ performance and incurring minimal parameter overhead. The approach offers a practical, scalable method to enhance low-quality FR in real-world deployments, with broad compatibility across angular-margin losses and minimal computational burden beyond standard adversarial training.

Abstract

Aiming to enhance Face Recognition (FR) on Low-Quality (LQ) inputs, recent studies suggest incorporating synthetic LQ samples into training. Although promising, the quality factors that are considered in these works are general rather than FR-specific, \eg, atmospheric turbulence, resolution, \etc. Motivated by the observation of the vulnerability of current FR models to even small Face Alignment Errors (FAE) in LQ images, we present a simple yet effective method that considers FAE as another quality factor that is tailored to FR. We seek to improve LQ FR by enhancing FR models' robustness to FAE. To this aim, we formalize the problem as a combination of differentiable spatial transformations and adversarial data augmentation in FR. We perturb the alignment of the training samples using a controllable spatial transformation and enrich the training with samples expressing FAE. We demonstrate the benefits of the proposed method by conducting evaluations on IJB-B, IJB-C, IJB-S (+4.3\% Rank1), and TinyFace (+2.63\%). \href{https://github.com/msed-Ebrahimi/ARoFace}{https://github.com/msed-Ebrahimi/ARoFace}
Paper Structure (26 sections, 11 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 11 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visual comparison of aligned (a) and alignment-perturbed (b) samples from the IJB-B dataset. (c, d, e) The performance difference between aligned inputs and those with slight FAE. Models exhibit robustness to FAE in HQ samples but suffer significant performance drops in LQ faces, with over 50% reduction in $\text{TAR@FAR}=1e-5$. Results from two distinct ResNet-100 trained on MS1MV3 using ArcFace/AdaFace objective.
  • Figure 2: Overview of proposed method. Each training iteration is composed of two steps. The adversarial spatial transformation finds $\bm{\theta}^{*} = (\varphi^{*}, \Delta {u}^{*}, \Delta {v}^{*}, \lambda^{*})$ for each instance in the batch based on the feedback from the FR network to produce hard but faithful samples, I.e., maximization of $L$. Then the FR network is trained using a batch of adversarial and original samples, I.e., minimization of $L$.
  • Figure 3: (a, b) Visualizing samples from IJB-S and IJB-B datasets respectively. IJB-B consists of both HQ and LQ instances while IJB-S only consists of LQ probe instances. (c) Visualizing the benign ($\text{top}:\mathbf{x}$) and their corresponding adversarial example ($\text{bottom}:T_{\theta}(\mathbf{x})$) produced by ARoFace. (d) Orthogonality of the ARoFace to different FR objective functions. In all scenarios, integrating ARoFace into training significantly improved performance on TinyFace.
  • Figure 4: (a, b) Training speed and GPU memory consumption comparison between CFSM and ARoFace: ARoFace significantly enhances training efficiency and reduces GPU memory consumption compared to CFSM. (c, d, e) Comparing the evaluation performance between employing adversarial vs. random spatial transformation during training: Adversarial improves performance, while random fails on IJB-B and IJB-C.
  • Figure 5: Experiments on different perturbation budgets. Tinyface performance is the Rank1 identification. IJB-B and IJB-C performance is the TAR@FAR=$1e-5$.