Table of Contents
Fetching ...

Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming

Chin-Yun Yu, Johan Pauwels, György Fazekas

TL;DR

The smoothing issue is solved by formulating the task as solving an integer linear programming problem equivalent to minimising an $L^1$-norm, and it is shown the proposed method can get more accurate alignments than the Euclidean-based method.

Abstract

In binaural audio synthesis, aligning head-related impulse responses (HRIRs) in time has been an important pre-processing step, enabling accurate spatial interpolation and efficient data compression. The maximum correlation time delay between spatially nearby HRIRs has previously been used to get accurate and smooth alignment by solving a matrix equation in which the solution has the minimum Euclidean distance to the time delay. However, the Euclidean criterion could lead to an over-smoothing solution in practice. In this paper, we solve the smoothing issue by formulating the task as solving an integer linear programming problem equivalent to minimising an $L^1$-norm. Moreover, we incorporate 1) the cross-correlation of inter-aural HRIRs, and 2) HRIRs with their minimum-phase responses to have more reference measurements for optimisation. We show the proposed method can get more accurate alignments than the Euclidean-based method by comparing the spectral reconstruction loss of time-aligned HRIRs using spherical harmonics representation on seven HRIRs consisting of human and dummy heads. The extra correlation features and the $L^1$-norm are also beneficial in extremely noisy conditions. In addition, this method can be applied to phase unwrapping of head-related transfer functions, where the unwrapped phase could be a compact feature for downstream tasks.

Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming

TL;DR

The smoothing issue is solved by formulating the task as solving an integer linear programming problem equivalent to minimising an -norm, and it is shown the proposed method can get more accurate alignments than the Euclidean-based method.

Abstract

In binaural audio synthesis, aligning head-related impulse responses (HRIRs) in time has been an important pre-processing step, enabling accurate spatial interpolation and efficient data compression. The maximum correlation time delay between spatially nearby HRIRs has previously been used to get accurate and smooth alignment by solving a matrix equation in which the solution has the minimum Euclidean distance to the time delay. However, the Euclidean criterion could lead to an over-smoothing solution in practice. In this paper, we solve the smoothing issue by formulating the task as solving an integer linear programming problem equivalent to minimising an -norm. Moreover, we incorporate 1) the cross-correlation of inter-aural HRIRs, and 2) HRIRs with their minimum-phase responses to have more reference measurements for optimisation. We show the proposed method can get more accurate alignments than the Euclidean-based method by comparing the spectral reconstruction loss of time-aligned HRIRs using spherical harmonics representation on seven HRIRs consisting of human and dummy heads. The extra correlation features and the -norm are also beneficial in extremely noisy conditions. In addition, this method can be applied to phase unwrapping of head-related transfer functions, where the unwrapped phase could be a compact feature for downstream tasks.
Paper Structure (19 sections, 16 equations, 5 figures, 3 tables)

This paper contains 19 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Two simple graphs consists of three measurement directions $\{a,b,c\}$, and an auxiliary vertex $\delta$ with $\tau_\delta=0$. The elementary cycles in graphs can be categorised into five groups on the right, with four types of edges: inter/intra-aural time differences, absolute time lags, and inter-frequency phase differences. We use (①-④) for TOA estimation and (①, ⑤) for PU.
  • Figure 2: Noise robustness experiment on the SONICOM HRTF. The LSD (d-f) is calculated between the measured (clean) and the reconstructed noisy HRTFs. $N$ is the SH order. (g-l): visualisations of ITDs using all the correlation features (full). Each row shares the same noise SNR. Mollweide projection is used to plot the hemisphere, and each dot is a sampled direction. The ITDs are clipped to $\pm 0.8$ ms.
  • Figure 3: Noise robustness experiment on the CIPIC HRTF. The details for each subplot are the same as Fig. \ref{['fig:sonicom-noise']}.
  • Figure 4: Unwrapped phase delay of SONICOM HRTFs. The left column is the horizontal plane, and the right is the frontal plane. Rows from top to bottom: Sec. \ref{['ssec:freq-pu']}, Sec. \ref{['ssec:sph-pu']}, and Sec. \ref{['ssec:sph-freq-pu']}.
  • Figure 5: Phase delay distortion as a function of frequency with different unwrapping methods and SH order $N$.