Table of Contents
Fetching ...

Beyond Spatial Explanations: Explainable Face Recognition in the Frequency Domain

Marco Huber, Naser Damer

TL;DR

This paper addresses the opacity of deep face verification decisions by introducing explainability in the frequency domain, moving beyond traditional spatial saliency maps. It presents a training-free masking framework in which face images are transformed via the discrete Fourier transform, frequency bands are masked, and the inverse transform reconstructs spatial images to measure impact on cosine similarity of embeddings; the influence of each band is quantified as $h_{B} = |cs_{ref} - cs_{B}|$. Experiments on two state-of-the-art FR models (ElasticFace-Arc and CurricularFace) on the LFW benchmark show that low-frequency bands are particularly influential, especially for low-resolution inputs, and yield interpretable absolute and directed frequency heat plots (FHPs); supplementary material covers cross-resolution scenarios and morphing attacks. Overall, the work provides the first frequency-domain explainability framework for verification decisions, validated through insertion/deletion analyses and visualizations, with potential to enhance transparency and robustness in FR systems by incorporating human-imperceptible cues.

Abstract

The need for more transparent face recognition (FR), along with other visual-based decision-making systems has recently attracted more attention in research, society, and industry. The reasons why two face images are matched or not matched by a deep learning-based face recognition system are not obvious due to the high number of parameters and the complexity of the models. However, it is important for users, operators, and developers to ensure trust and accountability of the system and to analyze drawbacks such as biased behavior. While many previous works use spatial semantic maps to highlight the regions that have a significant influence on the decision of the face recognition system, frequency components which are also considered by CNNs, are neglected. In this work, we take a step forward and investigate explainable face recognition in the unexplored frequency domain. This makes this work the first to propose explainability of verification-based decisions in the frequency domain, thus explaining the relative influence of the frequency components of each input toward the obtained outcome. To achieve this, we manipulate face images in the spatial frequency domain and investigate the impact on verification outcomes. In extensive quantitative experiments, along with investigating two special scenarios cases, cross-resolution FR and morphing attacks (the latter in supplementary material), we observe the applicability of our proposed frequency-based explanations.

Beyond Spatial Explanations: Explainable Face Recognition in the Frequency Domain

TL;DR

This paper addresses the opacity of deep face verification decisions by introducing explainability in the frequency domain, moving beyond traditional spatial saliency maps. It presents a training-free masking framework in which face images are transformed via the discrete Fourier transform, frequency bands are masked, and the inverse transform reconstructs spatial images to measure impact on cosine similarity of embeddings; the influence of each band is quantified as . Experiments on two state-of-the-art FR models (ElasticFace-Arc and CurricularFace) on the LFW benchmark show that low-frequency bands are particularly influential, especially for low-resolution inputs, and yield interpretable absolute and directed frequency heat plots (FHPs); supplementary material covers cross-resolution scenarios and morphing attacks. Overall, the work provides the first frequency-domain explainability framework for verification decisions, validated through insertion/deletion analyses and visualizations, with potential to enhance transparency and robustness in FR systems by incorporating human-imperceptible cues.

Abstract

The need for more transparent face recognition (FR), along with other visual-based decision-making systems has recently attracted more attention in research, society, and industry. The reasons why two face images are matched or not matched by a deep learning-based face recognition system are not obvious due to the high number of parameters and the complexity of the models. However, it is important for users, operators, and developers to ensure trust and accountability of the system and to analyze drawbacks such as biased behavior. While many previous works use spatial semantic maps to highlight the regions that have a significant influence on the decision of the face recognition system, frequency components which are also considered by CNNs, are neglected. In this work, we take a step forward and investigate explainable face recognition in the unexplored frequency domain. This makes this work the first to propose explainability of verification-based decisions in the frequency domain, thus explaining the relative influence of the frequency components of each input toward the obtained outcome. To achieve this, we manipulate face images in the spatial frequency domain and investigate the impact on verification outcomes. In extensive quantitative experiments, along with investigating two special scenarios cases, cross-resolution FR and morphing attacks (the latter in supplementary material), we observe the applicability of our proposed frequency-based explanations.
Paper Structure (15 sections, 8 equations, 7 figures)

This paper contains 15 sections, 8 equations, 7 figures.

Figures (7)

  • Figure 1: Between the "semantic" components and the frequency components of the data, a correlation exists wang2020high. While the human face matcher relies on the human interpretation (semantic component), the FR model also utilizes frequency components mi2022duetfacemi2023privacy. Current explainable FR approaches only focus on highlighting the spatial semantic clues.
  • Figure 2: Overview of the proposed frequency-based explainability approach. In the first step, the images of an image pair are transformed into the frequency domain. Certain frequencies are masked and the images are re-transformed into the spatial domain, in a lossless process. In the next step, the unaltered images and the set of frequency-masked images are processed by an FR model to create face embeddings and to calculate cosine similarity scores. In the last step, the difference between the cosine similarity scores of the masked image pairs and the unaltered image pair is used to assign an influence score to the different frequencies (bands). The normalized influences are either presented as the absolute (a) or directed (b) frequency heat plots (FHPs).
  • Figure 3: Deletion and insertion curves using ElasticFace-Arc elasticface on LFW lfw. The solid lines are a result of our proposed explanations. The dotted line indicates the performance of the baseline with the same frequency band size $s$ as its color counterpart in solid line. Both faster ascending deletion curves and faster descending insertion curves point to the effectiveness of the proposed explanations.
  • Figure 4: Deletion and insertion curves using CurricularFace curricularface on LFW lfw. The solid lines are a result of our proposed explanations. The dotted line indicates the performance of the baseline with the same frequency band size $s$ as its color counterpart in solid line. Both faster ascending deletion curves and faster descending insertion curves point to the effectiveness of the proposed explanations.
  • Figure 5: Matching (genuine) pairs and their low-resolution versions with FHPs. The FHPs provide image pair specific frequency-based explanations on the influence of certain frequency bands. The low-frequency bands are the most influential and get even more influential on low-resolution images that lack details. Under each image pair we show two) with different $s$ absolute FHPs followed by a third directed FHP.
  • ...and 2 more figures