Beyond Spatial Explanations: Explainable Face Recognition in the Frequency Domain
Marco Huber, Naser Damer
TL;DR
This paper addresses the opacity of deep face verification decisions by introducing explainability in the frequency domain, moving beyond traditional spatial saliency maps. It presents a training-free masking framework in which face images are transformed via the discrete Fourier transform, frequency bands are masked, and the inverse transform reconstructs spatial images to measure impact on cosine similarity of embeddings; the influence of each band is quantified as $h_{B} = |cs_{ref} - cs_{B}|$. Experiments on two state-of-the-art FR models (ElasticFace-Arc and CurricularFace) on the LFW benchmark show that low-frequency bands are particularly influential, especially for low-resolution inputs, and yield interpretable absolute and directed frequency heat plots (FHPs); supplementary material covers cross-resolution scenarios and morphing attacks. Overall, the work provides the first frequency-domain explainability framework for verification decisions, validated through insertion/deletion analyses and visualizations, with potential to enhance transparency and robustness in FR systems by incorporating human-imperceptible cues.
Abstract
The need for more transparent face recognition (FR), along with other visual-based decision-making systems has recently attracted more attention in research, society, and industry. The reasons why two face images are matched or not matched by a deep learning-based face recognition system are not obvious due to the high number of parameters and the complexity of the models. However, it is important for users, operators, and developers to ensure trust and accountability of the system and to analyze drawbacks such as biased behavior. While many previous works use spatial semantic maps to highlight the regions that have a significant influence on the decision of the face recognition system, frequency components which are also considered by CNNs, are neglected. In this work, we take a step forward and investigate explainable face recognition in the unexplored frequency domain. This makes this work the first to propose explainability of verification-based decisions in the frequency domain, thus explaining the relative influence of the frequency components of each input toward the obtained outcome. To achieve this, we manipulate face images in the spatial frequency domain and investigate the impact on verification outcomes. In extensive quantitative experiments, along with investigating two special scenarios cases, cross-resolution FR and morphing attacks (the latter in supplementary material), we observe the applicability of our proposed frequency-based explanations.
