Table of Contents
Fetching ...

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing

Ivan DeAndres-Tame, Muhammad Faisal, Ruben Tolosana, Rouqaiah Al-Refai, Ruben Vera-Rodriguez, Philipp Terhörst

TL;DR

This work tackles the interpretability gap in contemporary face recognition by proposing an interactive framework that fuses model-agnostic XAI with NLP-based natural-language QA. It combines a robust FR pipeline (ArcFace with MTCNN detection and PIC-Score confidence) with a multi-technique saliency analysis to form an explainability table mapping facial regions to decision importance, and leverages a BERT-QA interface to answer user queries in natural language. The approach preserves recognition performance while delivering rich explanations, supported by both qualitative demonstrations and quantitative analyses of the QA system and overall explanations. The framework promises greater transparency and user understanding in sensitive FR applications, with demonstrated scalability and potential for extension without retraining.

Abstract

Face Recognition (FR) has advanced significantly with the development of deep learning, achieving high accuracy in several applications. However, the lack of interpretability of these systems raises concerns about their accountability, fairness, and reliability. In the present study, we propose an interactive framework to enhance the explainability of FR models by combining model-agnostic Explainable Artificial Intelligence (XAI) and Natural Language Processing (NLP) techniques. The proposed framework is able to accurately answer various questions of the user through an interactive chatbot. In particular, the explanations generated by our proposed method are in the form of natural language text and visual representations, which for example can describe how different facial regions contribute to the similarity measure between two faces. This is achieved through the automatic analysis of the output's saliency heatmaps of the face images and a BERT question-answering model, providing users with an interface that facilitates a comprehensive understanding of the FR decisions. The proposed approach is interactive, allowing the users to ask questions to get more precise information based on the user's background knowledge. More importantly, in contrast to previous studies, our solution does not decrease the face recognition performance. We demonstrate the effectiveness of the method through different experiments, highlighting its potential to make FR systems more interpretable and user-friendly, especially in sensitive applications where decision-making transparency is crucial.

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing

TL;DR

This work tackles the interpretability gap in contemporary face recognition by proposing an interactive framework that fuses model-agnostic XAI with NLP-based natural-language QA. It combines a robust FR pipeline (ArcFace with MTCNN detection and PIC-Score confidence) with a multi-technique saliency analysis to form an explainability table mapping facial regions to decision importance, and leverages a BERT-QA interface to answer user queries in natural language. The approach preserves recognition performance while delivering rich explanations, supported by both qualitative demonstrations and quantitative analyses of the QA system and overall explanations. The framework promises greater transparency and user understanding in sensitive FR applications, with demonstrated scalability and potential for extension without retraining.

Abstract

Face Recognition (FR) has advanced significantly with the development of deep learning, achieving high accuracy in several applications. However, the lack of interpretability of these systems raises concerns about their accountability, fairness, and reliability. In the present study, we propose an interactive framework to enhance the explainability of FR models by combining model-agnostic Explainable Artificial Intelligence (XAI) and Natural Language Processing (NLP) techniques. The proposed framework is able to accurately answer various questions of the user through an interactive chatbot. In particular, the explanations generated by our proposed method are in the form of natural language text and visual representations, which for example can describe how different facial regions contribute to the similarity measure between two faces. This is achieved through the automatic analysis of the output's saliency heatmaps of the face images and a BERT question-answering model, providing users with an interface that facilitates a comprehensive understanding of the FR decisions. The proposed approach is interactive, allowing the users to ask questions to get more precise information based on the user's background knowledge. More importantly, in contrast to previous studies, our solution does not decrease the face recognition performance. We demonstrate the effectiveness of the method through different experiments, highlighting its potential to make FR systems more interpretable and user-friendly, especially in sensitive applications where decision-making transparency is crucial.
Paper Structure (15 sections, 7 figures, 2 tables)

This paper contains 15 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Graphical representation of the proposed framework, focused on the combination of model-agnostic XAI and NLP techniques to leverage the explainability of the FR systems.
  • Figure 2: Graphical representation of the FR module. neto2023pic.
  • Figure 3: XAI methods presented in mery2022black to plot the saliency heatmap over a face image.
  • Figure 4: Graphical representation showing how the explainability table is created.
  • Figure 5: Graphical representation of the proposed QA interface based on the BERT NLP model devlin2018bert.
  • ...and 2 more figures