Table of Contents
Fetching ...

Found in Translation: semantic approaches for enhancing AI interpretability in face verification

Miriam Doh, Caroline Mazini Rodrigues, N. Boutry, L. Najman, Matei Mancas, Bernard Gosselin

TL;DR

In face verification, the paper frames explainability as aligning model decisions with human semantic understanding by introducing semantic feature sets derived from Mediapipe landmarks and a hybrid global-local XAI framework. It integrates multiple concept-extraction approaches (LIME, MAGE EaOC, KernelSHAP), a weighted single-removal similarity map (S0), and LLM-generated textual explanations to produce human-friendly narratives. Quantitative occlusion analyses and a user study (61 participants) show that semantic explanations, especially with the finest SET_2 granularity, yield clearer, more detailed, and more human-aligned interpretations than traditional pixel-based heatmaps. By combining semantic concepts with narrative explanations, the work advances XAI 2.0, aiming to foster trust and acceptance in critical applications of face verification; it also highlights how global concept aggregation and local attributions can be synergistically used to diagnose model decisions.

Abstract

The increasing complexity of machine learning models in computer vision, particularly in face verification, requires the development of explainable artificial intelligence (XAI) to enhance interpretability and transparency. This study extends previous work by integrating semantic concepts derived from human cognitive processes into XAI frameworks to bridge the comprehension gap between model outputs and human understanding. We propose a novel approach combining global and local explanations, using semantic features defined by user-selected facial landmarks to generate similarity maps and textual explanations via large language models (LLMs). The methodology was validated through quantitative experiments and user feedback, demonstrating improved interpretability. Results indicate that our semantic-based approach, particularly the most detailed set, offers a more nuanced understanding of model decisions than traditional methods. User studies highlight a preference for our semantic explanations over traditional pixelbased heatmaps, emphasizing the benefits of human-centric interpretability in AI. This work contributes to the ongoing efforts to create XAI frameworks that align AI models behaviour with human cognitive processes, fostering trust and acceptance in critical applications.

Found in Translation: semantic approaches for enhancing AI interpretability in face verification

TL;DR

In face verification, the paper frames explainability as aligning model decisions with human semantic understanding by introducing semantic feature sets derived from Mediapipe landmarks and a hybrid global-local XAI framework. It integrates multiple concept-extraction approaches (LIME, MAGE EaOC, KernelSHAP), a weighted single-removal similarity map (S0), and LLM-generated textual explanations to produce human-friendly narratives. Quantitative occlusion analyses and a user study (61 participants) show that semantic explanations, especially with the finest SET_2 granularity, yield clearer, more detailed, and more human-aligned interpretations than traditional pixel-based heatmaps. By combining semantic concepts with narrative explanations, the work advances XAI 2.0, aiming to foster trust and acceptance in critical applications of face verification; it also highlights how global concept aggregation and local attributions can be synergistically used to diagnose model decisions.

Abstract

The increasing complexity of machine learning models in computer vision, particularly in face verification, requires the development of explainable artificial intelligence (XAI) to enhance interpretability and transparency. This study extends previous work by integrating semantic concepts derived from human cognitive processes into XAI frameworks to bridge the comprehension gap between model outputs and human understanding. We propose a novel approach combining global and local explanations, using semantic features defined by user-selected facial landmarks to generate similarity maps and textual explanations via large language models (LLMs). The methodology was validated through quantitative experiments and user feedback, demonstrating improved interpretability. Results indicate that our semantic-based approach, particularly the most detailed set, offers a more nuanced understanding of model decisions than traditional methods. User studies highlight a preference for our semantic explanations over traditional pixelbased heatmaps, emphasizing the benefits of human-centric interpretability in AI. This work contributes to the ongoing efforts to create XAI frameworks that align AI models behaviour with human cognitive processes, fostering trust and acceptance in critical applications.
Paper Structure (29 sections, 4 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: On the left, an illustration shows how humans perform face recognition by focusing on specific facial areas. On the right, we present an adaptation of the XAI Perceptual Processing Framework, originally proposed by Zhang et al. Zhang2022, specifically tailored for face verification, drawing inspiration from how humans process visual stimuli.
  • Figure 2: This figure contrasts the previously established framework (left) with the newly proposed framework presented in this paper (right). The new framework includes several additional components, highlighted in green. Specifically, it introduces three hypothetical semantic sets to evaluate the variability of the proposed method. Moreover, the new framework incorporates the evaluation of three concept extraction methods (KernelSHAP, MAGE, LIME), whereas the previous work utilized only KernelSHAP without evaluation. The explanation visualization has been expanded to include textual descriptions via LLM models. Additionally, user feedback has been incorporated, which was not collected in the prior work.
  • Figure 3: The process of creating human-based semantic features. (1) Mediapipe's landmarks projected onto an example face. Using Mediapipe's facemesh, users can define semantic areas by selecting specific landmarks, as shown in the example. (2) Three sets of human-based concepts with varying granularity (SET_0, SET_1, SET_2) created from these user-defined areas. SET_0 and SET_1 have 13 features each, while SET_2 has 30 features.
  • Figure 4: Illustrative example of EaOC behavior under an occlusion. Consider each represented image as the corresponding embedding for this image obtained by a trained model. Initially, given a set of images, we order the images according to their distance to the origin. After each occlusion, we calculate the orders again. The occluded image may change its order if the occlusion is impactful.
  • Figure 5: Methodology to extract globally important concepts. We use the human-segmented regions to obtain the explanations (using xAI methods such as LIME, KernelSHAP and MAGE) for all images (1). We order the segmented regions' importance, for each image, according to the xAI method (2). Finally, we combine the orders into a final ranking that shows the most important face segments globally (3).
  • ...and 7 more figures