Table of Contents
Fetching ...

Systematic Evaluation of Time-Frequency Features for Binaural Sound Source Localization

Davoud Shariat Panah, Alessandro Ragano, Dan Barry, Jan Skoglund, Andrew Hines

TL;DR

This work provides a systematic evaluation of time-frequency feature design for binaural sound source localization using a CNN. By comparing amplitude-based (magnitude spectrogram, ILD) and phase-based (phase spectrogram, IPD) features across in-domain and out-of-domain data with mismatched HRTFs, the study demonstrates that carefully selected feature sets can outperform simple increases in model complexity. In-domain localization is effectively served by compact combinations like ILD+IPD, while generalization to diverse content benefits from richer inputs that also include phase cues (Phase L/R) alongside ILD and IPD. The findings offer practical guidance for feature design in binaural SSL and establish benchmarks for domain-specific and general-purpose localization, with the code and datasets slated for public release.

Abstract

This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.

Systematic Evaluation of Time-Frequency Features for Binaural Sound Source Localization

TL;DR

This work provides a systematic evaluation of time-frequency feature design for binaural sound source localization using a CNN. By comparing amplitude-based (magnitude spectrogram, ILD) and phase-based (phase spectrogram, IPD) features across in-domain and out-of-domain data with mismatched HRTFs, the study demonstrates that carefully selected feature sets can outperform simple increases in model complexity. In-domain localization is effectively served by compact combinations like ILD+IPD, while generalization to diverse content benefits from richer inputs that also include phase cues (Phase L/R) alongside ILD and IPD. The findings offer practical guidance for feature design in binaural SSL and establish benchmarks for domain-specific and general-purpose localization, with the code and datasets slated for public release.

Abstract

This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.

Paper Structure

This paper contains 11 sections, 8 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: MAE $\downarrow$ of the features across sources on the SynBAD--Var test set for the CNN model - Elevation = 0°.