Table of Contents
Fetching ...

The Extended SONICOM HRTF Dataset and Spatial Audio Metrics Toolbox

Katarina C. Poole, Julie Meyer, Vincent Martin, Rapolas Daugintis, Nils Marggraf-Turley, Jack Webb, Ludovic Pirard, Nicola La Magna, Oliver Turvey, Lorenzo Picinali

TL;DR

This paper expands the publicly available SONICOM HRTF resource by adding 300 measured and 200 synthetic HRTFs derived from processed 3D scans, addressing limitations in dataset size and measurement complexity. It introduces Mesh2HRTF-based synthesis and provides two scan variants (pre-processed and plugged ear canal) to support efficient HRTF generation across a dense measurement grid. The authors also release the Spatial Audio Metrics (SAM) Toolbox, a Python package for standardized analysis of HRTF quality via metrics like spectral distortion and ITD/ILD, facilitating reproducible evaluation and visualization. Together, these contributions create a scalable platform for ML-driven personalized spatial audio research and pave the way for perceptual validation and broader adoption of synthetic HRTFs.

Abstract

Headphone-based spatial audio uses head-related transfer functions (HRTFs) to simulate real-world acoustic environments. HRTFs are unique to everyone, due to personal morphology, shaping how sound waves interact with the body before reaching the eardrums. Here we present the extended SONICOM HRTF dataset which expands on the previous version released in 2023. The total number of measured subjects has now been increased to 300, with demographic information for a subset of the participants, providing context for the dataset's population and relevance. The dataset incorporates synthesised HRTFs for 200 of the 300 subjects, generated using Mesh2HRTF, alongside pre-processed 3D scans of the head and ears, optimised for HRTF synthesis. This rich dataset facilitates rapid and iterative optimisation of HRTF synthesis algorithms, allowing the automatic generation of large data. The optimised scans enable seamless morphological modifications, providing insights into how anatomical changes impact HRTFs, and the larger sample size enhances the effectiveness of machine learning approaches. To support analysis, we also introduce the Spatial Audio Metrics (SAM) Toolbox, a Python package designed for efficient analysis and visualisation of HRTF data, offering customisable tools for advanced research. Together, the extended dataset and toolbox offer a comprehensive resource for advancing personalised spatial audio research and development.

The Extended SONICOM HRTF Dataset and Spatial Audio Metrics Toolbox

TL;DR

This paper expands the publicly available SONICOM HRTF resource by adding 300 measured and 200 synthetic HRTFs derived from processed 3D scans, addressing limitations in dataset size and measurement complexity. It introduces Mesh2HRTF-based synthesis and provides two scan variants (pre-processed and plugged ear canal) to support efficient HRTF generation across a dense measurement grid. The authors also release the Spatial Audio Metrics (SAM) Toolbox, a Python package for standardized analysis of HRTF quality via metrics like spectral distortion and ITD/ILD, facilitating reproducible evaluation and visualization. Together, these contributions create a scalable platform for ML-driven personalized spatial audio research and pave the way for perceptual validation and broader adoption of synthetic HRTFs.

Abstract

Headphone-based spatial audio uses head-related transfer functions (HRTFs) to simulate real-world acoustic environments. HRTFs are unique to everyone, due to personal morphology, shaping how sound waves interact with the body before reaching the eardrums. Here we present the extended SONICOM HRTF dataset which expands on the previous version released in 2023. The total number of measured subjects has now been increased to 300, with demographic information for a subset of the participants, providing context for the dataset's population and relevance. The dataset incorporates synthesised HRTFs for 200 of the 300 subjects, generated using Mesh2HRTF, alongside pre-processed 3D scans of the head and ears, optimised for HRTF synthesis. This rich dataset facilitates rapid and iterative optimisation of HRTF synthesis algorithms, allowing the automatic generation of large data. The optimised scans enable seamless morphological modifications, providing insights into how anatomical changes impact HRTFs, and the larger sample size enhances the effectiveness of machine learning approaches. To support analysis, we also introduce the Spatial Audio Metrics (SAM) Toolbox, a Python package designed for efficient analysis and visualisation of HRTF data, offering customisable tools for advanced research. Together, the extended dataset and toolbox offer a comprehensive resource for advancing personalised spatial audio research and development.

Paper Structure

This paper contains 7 sections, 1 figure.

Figures (1)

  • Figure 1: Processing of 3D scans for HRTF simulation. (A) Raw 3D scan before processing (B) Processed scan, aligned to the Frankfurt plane with hair and extraneous features removed C) Preprocessed ear canal (D) Plugged ear canal variant.