Table of Contents
Fetching ...

Towards A Comprehensive Visual Saliency Explanation Framework for AI-based Face Recognition Systems

Yuhang Lu, Zewei Xu, Touradj Ebrahimi

TL;DR

This work addresses the explainability gap in AI-based face recognition by introducing a comprehensive visual saliency framework that covers both verification and identification. It proposes CorrRISE, a model-agnostic, perturbation- and correlation-based method that produces similarity and dissimilarity saliency maps for face pairs, scalable to one-to-many identification through top-K gallery comparisons. An objective evaluation methodology using Deletion and Insertion metrics is developed to quantify saliency-map quality across verification and identification, with extensive experiments on ArcFace, AdaFace, and MagFace across multiple datasets. The results show CorrRISE yields superior similarity maps and competitive dissimilarity maps, demonstrates robust generalization across models, and provides insights into failure modes and potential improvements. Overall, the paper delivers a practical, extensible framework for interpretable face recognition and establishes a rigorous standard for evaluating saliency explanations in this domain.

Abstract

Over recent years, deep convolutional neural networks have significantly advanced the field of face recognition techniques for both verification and identification purposes. Despite the impressive accuracy, these neural networks are often criticized for lacking explainability. There is a growing demand for understanding the decision-making process of AI-based face recognition systems. Some studies have investigated the use of visual saliency maps as explanations, but they have predominantly focused on the specific face verification case. The discussion on more general face recognition scenarios and the corresponding evaluation methodology for these explanations have long been absent in current research. Therefore, this manuscript conceives a comprehensive explanation framework for face recognition tasks. Firstly, an exhaustive definition of visual saliency map-based explanations for AI-based face recognition systems is provided, taking into account the two most common recognition situations individually, i.e., face verification and identification. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions between any given face images. Subsequently, the explanation framework conceives a new evaluation methodology that offers quantitative measurement and comparison of the performance of general visual saliency explanation methods in face recognition. Consequently, extensive experiments are carried out on multiple verification and identification scenarios. The results showcase that CorrRISE generates insightful saliency maps and demonstrates superior performance, particularly in similarity maps in comparison with the state-of-the-art explanation approaches.

Towards A Comprehensive Visual Saliency Explanation Framework for AI-based Face Recognition Systems

TL;DR

This work addresses the explainability gap in AI-based face recognition by introducing a comprehensive visual saliency framework that covers both verification and identification. It proposes CorrRISE, a model-agnostic, perturbation- and correlation-based method that produces similarity and dissimilarity saliency maps for face pairs, scalable to one-to-many identification through top-K gallery comparisons. An objective evaluation methodology using Deletion and Insertion metrics is developed to quantify saliency-map quality across verification and identification, with extensive experiments on ArcFace, AdaFace, and MagFace across multiple datasets. The results show CorrRISE yields superior similarity maps and competitive dissimilarity maps, demonstrates robust generalization across models, and provides insights into failure modes and potential improvements. Overall, the paper delivers a practical, extensible framework for interpretable face recognition and establishes a rigorous standard for evaluating saliency explanations in this domain.

Abstract

Over recent years, deep convolutional neural networks have significantly advanced the field of face recognition techniques for both verification and identification purposes. Despite the impressive accuracy, these neural networks are often criticized for lacking explainability. There is a growing demand for understanding the decision-making process of AI-based face recognition systems. Some studies have investigated the use of visual saliency maps as explanations, but they have predominantly focused on the specific face verification case. The discussion on more general face recognition scenarios and the corresponding evaluation methodology for these explanations have long been absent in current research. Therefore, this manuscript conceives a comprehensive explanation framework for face recognition tasks. Firstly, an exhaustive definition of visual saliency map-based explanations for AI-based face recognition systems is provided, taking into account the two most common recognition situations individually, i.e., face verification and identification. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions between any given face images. Subsequently, the explanation framework conceives a new evaluation methodology that offers quantitative measurement and comparison of the performance of general visual saliency explanation methods in face recognition. Consequently, extensive experiments are carried out on multiple verification and identification scenarios. The results showcase that CorrRISE generates insightful saliency maps and demonstrates superior performance, particularly in similarity maps in comparison with the state-of-the-art explanation approaches.
Paper Structure (29 sections, 3 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: The proposed definition of visual saliency-based explanations in two typical face recognition scenarios, i.e., face verification and identification.
  • Figure 2: Workflow of the proposed CorrRISE explanation method. The similarity and dissimilarity maps are calculated respectively given an arbitrary input face pair. The block in the middle repeats $N$ iterations using different random masks. The output similarity scores and the mask set are fed to the correlation module to calculate similarity and dissimilarity saliency maps in a pixel-wise manner.
  • Figure 3: The deletion and insertion processes to calculate corresponding evaluation metrics. The most important pixels indicated by the saliency maps are gradually removed/added.
  • Figure 4: Sanity check for the CorrRISE explanation method. The first row is the explanation heatmap for a deep model with randomized parameters, while the second row is for a normal face recognition model. The importance increases from blue to red color.
  • Figure 5: Visual explanation results from CorrRISE for both matching and non-matching face pairs in standard face verification scenario. The produced saliency maps explain why the verification model makes correct predictions on all face pairs. The saliency value increases from blue to red color.
  • ...and 7 more figures