Table of Contents
Fetching ...

OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations

Caixin Kang, Yubo Chen, Shouwei Ruan, Shiji Zhao, Ruochen Zhang, Jiayi Wang, Shan Fu, Xingxing Wei

TL;DR

OODFace introduces a comprehensive robustness benchmark for face recognition under real-world distribution shifts by designing 30 OOD scenarios (20 common corruptions and 10 appearance variations) with five severity levels, and constructs three benchmarks: LFW-C/V, CFP-C/V, and YTF-C/V. The authors evaluate 19 open-source FR models and 3 commercial APIs, supplementing with physical face-mask tests and experiments involving Vision-Language Models to explore potential solutions. They quantify robustness with metrics such as Acc_clean, Acc_cor, Acc_var, along with Relative Corruption Error ($\mathrm{RCE}$) and Relative Variations Error ($\mathrm{RVE}$), revealing that corruption robustness is not aligned with clean performance and that Data & Processing is particularly damaging. Defense strategies offer limited improvements, while some VLMs show strong robust FR potential, highlighting both promise and practical challenges for deployment and privacy. Overall, OODFace provides a unified toolkit and nuanced insights to guide future robustness improvements in FR systems, emphasizing the need for adaptable defenses and principled integration of multimodal models.

Abstract

With the rise of deep learning, facial recognition technology has seen extensive research and rapid development. Although facial recognition is considered a mature technology, we find that existing open-source models and commercial algorithms lack robustness in certain complex Out-of-Distribution (OOD) scenarios, raising concerns about the reliability of these systems. In this paper, we introduce OODFace, which explores the OOD challenges faced by facial recognition models from two perspectives: common corruptions and appearance variations. We systematically design 30 OOD scenarios across 9 major categories tailored for facial recognition. By simulating these challenges on public datasets, we establish three robustness benchmarks: LFW-C/V, CFP-FP-C/V, and YTF-C/V. We then conduct extensive experiments on 19 facial recognition models and 3 commercial APIs, along with extended physical experiments on face masks to assess their robustness. Next, we explore potential solutions from two perspectives: defense strategies and Vision-Language Models (VLMs). Based on the results, we draw several key insights, highlighting the vulnerability of facial recognition systems to OOD data and suggesting possible solutions. Additionally, we offer a unified toolkit that includes all corruption and variation types, easily extendable to other datasets. We hope that our benchmarks and findings can provide guidance for future improvements in facial recognition model robustness.

OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations

TL;DR

OODFace introduces a comprehensive robustness benchmark for face recognition under real-world distribution shifts by designing 30 OOD scenarios (20 common corruptions and 10 appearance variations) with five severity levels, and constructs three benchmarks: LFW-C/V, CFP-C/V, and YTF-C/V. The authors evaluate 19 open-source FR models and 3 commercial APIs, supplementing with physical face-mask tests and experiments involving Vision-Language Models to explore potential solutions. They quantify robustness with metrics such as Acc_clean, Acc_cor, Acc_var, along with Relative Corruption Error () and Relative Variations Error (), revealing that corruption robustness is not aligned with clean performance and that Data & Processing is particularly damaging. Defense strategies offer limited improvements, while some VLMs show strong robust FR potential, highlighting both promise and practical challenges for deployment and privacy. Overall, OODFace provides a unified toolkit and nuanced insights to guide future robustness improvements in FR systems, emphasizing the need for adaptable defenses and principled integration of multimodal models.

Abstract

With the rise of deep learning, facial recognition technology has seen extensive research and rapid development. Although facial recognition is considered a mature technology, we find that existing open-source models and commercial algorithms lack robustness in certain complex Out-of-Distribution (OOD) scenarios, raising concerns about the reliability of these systems. In this paper, we introduce OODFace, which explores the OOD challenges faced by facial recognition models from two perspectives: common corruptions and appearance variations. We systematically design 30 OOD scenarios across 9 major categories tailored for facial recognition. By simulating these challenges on public datasets, we establish three robustness benchmarks: LFW-C/V, CFP-FP-C/V, and YTF-C/V. We then conduct extensive experiments on 19 facial recognition models and 3 commercial APIs, along with extended physical experiments on face masks to assess their robustness. Next, we explore potential solutions from two perspectives: defense strategies and Vision-Language Models (VLMs). Based on the results, we draw several key insights, highlighting the vulnerability of facial recognition systems to OOD data and suggesting possible solutions. Additionally, we offer a unified toolkit that includes all corruption and variation types, easily extendable to other datasets. We hope that our benchmarks and findings can provide guidance for future improvements in facial recognition model robustness.

Paper Structure

This paper contains 36 sections, 4 equations, 31 figures, 54 tables.

Figures (31)

  • Figure 1: Challenges in FR systems. Simple Gaussian noise poses a threat to the performance of state-of-the-art open-source FR models and commercial FR APIs. (Accuracy tested on LFW.)
  • Figure 2: Overview of OODFace’s 30 OOD scenarios. OODs are divided into two major categories, common corruptions and appearance variations, further subdivided into 20 and 10 subcategories, each with 5 severity levels.
  • Figure 3: Visualization of 30 subcategories of common corruptions and appearance variations. More results are available in Appendix \ref{['sup:H']}.
  • Figure 4: Visualization of severity levels. Top: Motion Blur from level 1 to 5; Bottom: Age- from level 1 to 5. Full visual results are available in Appendix \ref{['sup:H']}.
  • Figure 5: RCE results on LFW-C.
  • ...and 26 more figures