A Data-Driven Exploration of Elevation Cues in HRTFs: An Explainable AI Perspective Across Multiple Datasets
Juan Antonio De Rus, Mario Montagud, Jesus Lopez-Ballester, Francesc J. Ferri, Maximo Cobos
TL;DR
This study tackles elevation localization in binaural HRTFs by pairing a simple, interpretable 1D-CNN with explainable AI (CAM) applied across 11 public HRTF datasets, covering over 600 subjects. It systematically compares multiple HRTF pre-processing strategies and evaluates both in-domain and cross-domain generalization, identifying spectral bands that consistently drive elevation classification into seven sectors. The authors reveal that high-frequency bands above about 5 kHz reliably aid elevation discrimination, while 1–5 kHz bands contribute for frontal and lateral directions, with ERB-based filtering and broad data diversity improving cross-dataset robustness. The work offers practical guidance for cross-dataset HRTF modeling and interpretable spectral-cue analysis, suggesting future listening tests and architecture explorations to further link saliency findings to human perception.
Abstract
Precise elevation perception in binaural audio remains a challenge, despite extensive research on head-related transfer functions (HRTFs) and spectral cues. While prior studies have advanced our understanding of sound localization cues, the interplay between spectral features and elevation perception is still not fully understood. This paper presents a comprehensive analysis of over 600 subjects from 11 diverse public HRTF datasets, employing a convolutional neural network (CNN) model combined with explainable artificial intelligence (XAI) techniques to investigate elevation cues. In addition to testing various HRTF pre-processing methods, we focus on both within-dataset and inter-dataset generalization and explainability, assessing the model's robustness across different HRTF variations stemming from subjects and measurement setups. By leveraging class activation mapping (CAM) saliency maps, we identify key frequency bands that may contribute to elevation perception, providing deeper insights into the spectral features that drive elevation-specific classification. This study offers new perspectives on HRTF modeling and elevation perception by analyzing diverse datasets and pre-processing techniques, expanding our understanding of these cues across a wide range of conditions.
