Table of Contents
Fetching ...

Binaspect -- A Python Library for Binaural Audio Analysis, Visualization & Feature Generation

Dan Barry, Davoud Shariat Panah, Alessandro Ragano, Jan Skoglund, Andrew Hines

TL;DR

Binaspect addresses the need for interpretable binaural analysis tools by proposing four interconnected representations: the bounded ILR spectrogram, ITD spectrogram, bounded ILR histogram, and ITD histogram. The approach enables blind, head-model-free observation of binaural cues, with degradations from rendering, compression, and down-mixing manifesting as broadened or shifted clusters in the histograms. The library provides both human-friendly histogram visualizations for inspection and exportable features suitable for machine learning workflows, supporting tasks like quality assessment and spatial localization. By making these representations open-source and reproducible, Binaspect offers a practical framework to diagnose binaural cue degradations and to guide the design of binaural rendering and processing pipelines. The work emphasizes interpretability, complements existing auditory modeling resources, and highlights future directions for expanded feature sets and multi-source handling.

Abstract

We present Binaspect, an open-source Python library for binaural audio analysis, visualization, and feature generation. Binaspect generates interpretable "azimuth maps" by calculating modified interaural time and level difference spectrograms, and clustering those time-frequency (TF) bins into stable time-azimuth histogram representations. This allows multiple active sources to appear as distinct azimuthal clusters, while degradations manifest as broadened, diffused, or shifted distributions. Crucially, Binaspect operates blindly on audio, requiring no prior knowledge of head models. These visualizations enable researchers and engineers to observe how binaural cues are degraded by codec and renderer design choices, among other downstream processes. We demonstrate the tool on bitrate ladders, ambisonic rendering, and VBAP source positioning, where degradations are clearly revealed. In addition to their diagnostic value, the proposed representations can be exported as structured features suitable for training machine learning models in quality prediction, spatial audio classification, and other binaural tasks. Binaspect is released under an open-source license with full reproducibility scripts at https://github.com/QxLabIreland/Binaspect.

Binaspect -- A Python Library for Binaural Audio Analysis, Visualization & Feature Generation

TL;DR

Binaspect addresses the need for interpretable binaural analysis tools by proposing four interconnected representations: the bounded ILR spectrogram, ITD spectrogram, bounded ILR histogram, and ITD histogram. The approach enables blind, head-model-free observation of binaural cues, with degradations from rendering, compression, and down-mixing manifesting as broadened or shifted clusters in the histograms. The library provides both human-friendly histogram visualizations for inspection and exportable features suitable for machine learning workflows, supporting tasks like quality assessment and spatial localization. By making these representations open-source and reproducible, Binaspect offers a practical framework to diagnose binaural cue degradations and to guide the design of binaural rendering and processing pipelines. The work emphasizes interpretability, complements existing auditory modeling resources, and highlights future directions for expanded feature sets and multi-source handling.

Abstract

We present Binaspect, an open-source Python library for binaural audio analysis, visualization, and feature generation. Binaspect generates interpretable "azimuth maps" by calculating modified interaural time and level difference spectrograms, and clustering those time-frequency (TF) bins into stable time-azimuth histogram representations. This allows multiple active sources to appear as distinct azimuthal clusters, while degradations manifest as broadened, diffused, or shifted distributions. Crucially, Binaspect operates blindly on audio, requiring no prior knowledge of head models. These visualizations enable researchers and engineers to observe how binaural cues are degraded by codec and renderer design choices, among other downstream processes. We demonstrate the tool on bitrate ladders, ambisonic rendering, and VBAP source positioning, where degradations are clearly revealed. In addition to their diagnostic value, the proposed representations can be exported as structured features suitable for training machine learning models in quality prediction, spatial audio classification, and other binaural tasks. Binaspect is released under an open-source license with full reproducibility scripts at https://github.com/QxLabIreland/Binaspect.

Paper Structure

This paper contains 12 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: Block diagram illustrating the processing blocks to produce the 4 primary features.
  • Figure 2: ITD and ILR histograms showing differences between a binaural render of Higher Order Ambisonics (HOA) and First Order Ambisonics (FOA) versions of an audio source being spatially panned from azimuth 0° to 270° with fixed elevation 30°. The respective ITD and ILR differences are also shown.
  • Figure 3: ITD and ILR histograms showing differences between various Opus codec bitrates. The audio source is being spatially panned from azimuth 0° to 270° with fixed elevation 30°
  • Figure 4: ITD and ILR histograms showing differences between direct binaural renders of a 7.1 mix and a 5.1 mix of the same material. There is one source at 90° and another at 0°. The 90° source has a discrete channel in 7.1 but is rendered as a virtual source in 5.1