Towards HRTF Personalization using Denoising Diffusion Models

Juan Camilo Albarracín Sánchez; Luca Comanducci; Mirco Pezzoli; Fabio Antonacci

Towards HRTF Personalization using Denoising Diffusion Models

Juan Camilo Albarracín Sánchez, Luca Comanducci, Mirco Pezzoli, Fabio Antonacci

TL;DR

The paper tackles HRTF personalization by conditioning a diffusion-based generative process on anthropometric measurements and DOA to synthesize time-domain HRIRs. It demonstrates that a conditional diffusion framework can produce subject-specific HRIRs whose LSD approaches state-of-the-art results, with perceptual analyses suggesting reasonable ITD accuracy and manageable high-frequency deviations. The method is validated on the HUTUBS dataset using LOOCV, highlighting both the feasibility of diffusion-based HRIR generation and areas for improvement in high-frequency reconstruction and feature representations. This approach offers a scalable path toward accessible, personalized spatial audio without full acoustic measurements.

Abstract

Head-Related Transfer Functions (HRTFs) have fundamental applications for realistic rendering in immersive audio scenarios. However, they are strongly subject-dependent as they vary considerably depending on the shape of the ears, head and torso. Thus, personalization procedures are required for accurate binaural rendering. Recently, Denoising Diffusion Probabilistic Models (DDPMs), a class of generative learning techniques, have been applied to solve a variety of signal processing-related problems. In this paper, we propose a first approach for using DDPM conditioned on anthropometric measurements to generate personalized Head-Related Impulse Response (HRIR), the time-domain representation of HRTF. The results show the feasibility of DDPMs for HRTF personalization obtaining performance in line with state-of-the-art models.

Towards HRTF Personalization using Denoising Diffusion Models

TL;DR

Abstract

Paper Structure (14 sections, 9 equations, 3 figures, 1 table)

This paper contains 14 sections, 9 equations, 3 figures, 1 table.

Introduction
Problem formulation
Signal Model
Problem Definition
Proposed Method
Conditioning data
HRIR reconstruction via Diffusion model
Architecture
Validation
Setup
Evaluation metrics
HRIR personalization results
Further discussion on the personalization results
Conclusions

Figures (3)

Figure 1: Outline of the (a) training and (b) inference stages of the proposed method for HRIR personalization. Note how the conditioning information is embedded at each encoding/decoding block.
Figure 2: (a) Subject 16 predicted $\hat{h}$ and ground truth $h$ for DOA $\bm{r}=(0,0)$, in (b) their respective HRTF and (c) ITD in the horizontal plane.
Figure 3: Mean PBC computed across the ERB.

Towards HRTF Personalization using Denoising Diffusion Models

TL;DR

Abstract

Towards HRTF Personalization using Denoising Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)