Table of Contents
Fetching ...

FaceX: Understanding Face Attribute Classifiers through Summary Model Explanations

Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou

TL;DR

FaceX introduces the first summary model explanations for face attribute classifiers by aggregating instance-level attributions across 19 facial regions to produce region-wise IoR heatmaps, complemented by high-impact patches to reveal visual cues driving decisions. The approach combines face parsing, Grad-CAM-based instance explanations, and a region-level aggregation to deliver a single, global explanation per class, enabling robust bias detection and interpretability. Evaluations on CelebA, CelebAMask-HQ, FairFace, and RFW (including bias mitigation with FLAC) demonstrate FaceX’s ability to identify single- and multi-attribute biases and to reveal how training data shape region focus. The work offers a scalable, interpretable tool for fairness auditing in facial analysis and suggests avenues for fairness-aware training and broader domain applications.

Abstract

EXplainable Artificial Intelligence (XAI) approaches are widely applied for identifying fairness issues in Artificial Intelligence (AI) systems. However, in the context of facial analysis, existing XAI approaches, such as pixel attribution methods, offer explanations for individual images, posing challenges in assessing the overall behavior of a model, which would require labor-intensive manual inspection of a very large number of instances and leaving to the human the task of drawing a general impression of the model behavior from the individual outputs. Addressing this limitation, we introduce FaceX, the first method that provides a comprehensive understanding of face attribute classifiers through summary model explanations. Specifically, FaceX leverages the presence of distinct regions across all facial images to compute a region-level aggregation of model activations, allowing for the visualization of the model's region attribution across 19 predefined regions of interest in facial images, such as hair, ears, or skin. Beyond spatial explanations, FaceX enhances interpretability by visualizing specific image patches with the highest impact on the model's decisions for each facial region within a test benchmark. Through extensive evaluation in various experimental setups, including scenarios with or without intentional biases and mitigation efforts on four benchmarks, namely CelebA, FairFace, CelebAMask-HQ, and Racial Faces in the Wild, FaceX demonstrates high effectiveness in identifying the models' biases.

FaceX: Understanding Face Attribute Classifiers through Summary Model Explanations

TL;DR

FaceX introduces the first summary model explanations for face attribute classifiers by aggregating instance-level attributions across 19 facial regions to produce region-wise IoR heatmaps, complemented by high-impact patches to reveal visual cues driving decisions. The approach combines face parsing, Grad-CAM-based instance explanations, and a region-level aggregation to deliver a single, global explanation per class, enabling robust bias detection and interpretability. Evaluations on CelebA, CelebAMask-HQ, FairFace, and RFW (including bias mitigation with FLAC) demonstrate FaceX’s ability to identify single- and multi-attribute biases and to reveal how training data shape region focus. The work offers a scalable, interpretable tool for fairness auditing in facial analysis and suggests avenues for fairness-aware training and broader domain applications.

Abstract

EXplainable Artificial Intelligence (XAI) approaches are widely applied for identifying fairness issues in Artificial Intelligence (AI) systems. However, in the context of facial analysis, existing XAI approaches, such as pixel attribution methods, offer explanations for individual images, posing challenges in assessing the overall behavior of a model, which would require labor-intensive manual inspection of a very large number of instances and leaving to the human the task of drawing a general impression of the model behavior from the individual outputs. Addressing this limitation, we introduce FaceX, the first method that provides a comprehensive understanding of face attribute classifiers through summary model explanations. Specifically, FaceX leverages the presence of distinct regions across all facial images to compute a region-level aggregation of model activations, allowing for the visualization of the model's region attribution across 19 predefined regions of interest in facial images, such as hair, ears, or skin. Beyond spatial explanations, FaceX enhances interpretability by visualizing specific image patches with the highest impact on the model's decisions for each facial region within a test benchmark. Through extensive evaluation in various experimental setups, including scenarios with or without intentional biases and mitigation efforts on four benchmarks, namely CelebA, FairFace, CelebAMask-HQ, and Racial Faces in the Wild, FaceX demonstrates high effectiveness in identifying the models' biases.

Paper Structure

This paper contains 18 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Grad-CAM instance-level explanations for six random samples for a gender classifier biased towards the Wearing_Lipstick attribute. Two key limitations are evident: a) there is an inconsistent attribution on the mouth region, hindering the user's ability to pinpoint where the biased attribute occurs; b) interpreting the visual characteristics of the region of interest (i.e., lipstick) is not straightforward. The corresponding FaceX summary model explanation is provided in Figure \ref{['fig:facex']}.
  • Figure 2: FaceX heatmap output for single attribute bias experiments on CelebA (train) and CelebAMask-HQ (test). Heatmap's color scale is from blue to red, corresponding to the lowest and highest activations, respectively.
  • Figure 3: FaceX heatmap output and high impact patches for single attribute bias experiment on FairFace (train) and RFW (test). Model's target is gender and the correlated attribute is race. Heatmap's color scale is from blue to red, corresponding to the lowest and highest activations, respectively.
  • Figure 4: FaceX heatmap outputs for single attribute bias experiment on CelebA (train) and CelebAMask-HQ (test), before (i.e., Vanilla model) and after applying a bias mitigation approach (i.e., FLAC sarridis2023flac). The model's target is gender and the correlated attribute is the Blond_Hair attribute. Heatmap's color scale is from blue to red, corresponding to the lowest and highest activations, respectively.
  • Figure 5: FaceX heatmap outputs for experiments on default on CelebA and FairFace training dataset with target the Gender attribute and CelebAMask-HQ as test benchmark. Heatmap's color scale is from blue to red, corresponding to the lowest and highest activations, respectively.