Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling

Xingyu Chen; Hanwen Bi; Fei Ma; Sipei Zhao; Eva Cheng; Ian S. Burnett

Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling

Xingyu Chen, Hanwen Bi, Fei Ma, Sipei Zhao, Eva Cheng, Ian S. Burnett

TL;DR

This work tackles the challenge of upsampling dense HRTFs from sparse measurements for personalized spatial audio by emphasizing explicit frequency-domain modeling of the log-magnitude spectrum $H_{\log}$. It introduces the FD-Conformer, a two-module sparse-to-dense network that sums spatial mapping with a frequency-domain Conformer, projecting a binaural spectral representation through Conformer blocks to capture local and long-range spectral dependencies; the model is trained with a combined LSD and spectral gradient loss. Across SONICOM and HUTUBS datasets, the FD-Conformer achieves state-of-the-art ILD and LSD, especially under extreme sparsity (e.g., 3–5 measurements), demonstrating the importance of frequency-aware design for robust HRTF magnitude upsampling. The approach offers practical impact for efficient, accurate personalized spatial audio with reduced measurement burden, and suggests future work on deeper integration of spectral and spatial modeling.

Abstract

Accurate upsampling of Head-Related Transfer Functions (HRTFs) from sparse measurements is crucial for personalized spatial audio rendering. Traditional interpolation methods, such as kernel-based weighting or basis function expansions, rely on measurements from a single subject and are limited by the spatial sampling theorem, resulting in significant performance degradation under sparse sampling. Recent learning-based methods alleviate this limitation by leveraging cross-subject information, yet most existing neural architectures primarily focus on modeling spatial relationships across directions, while spectral dependencies along the frequency dimension are often modeled implicitly or treated independently. However, HRTF magnitude responses exhibit strong local continuity and long-range structure in the frequency domain, which are not fully exploited. This work investigates frequency-domain feature modeling by examining how different architectural choices, ranging from per-frequency multilayer perceptrons to convolutional, dilated convolutional, and attention-based models, affect performance under varying sparsity levels, showing that explicit spectral modeling consistently improves reconstruction accuracy, particularly under severe sparsity. Motivated by this observation, a frequency-domain Conformer-based architecture is adopted to jointly capture local spectral continuity and long-range frequency correlations. Experimental results on the SONICOM and HUTUBS datasets demonstrate that the proposed method achieves state-of-the-art performance in terms of interaural level difference and log-spectral distortion.

Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling

TL;DR

This work tackles the challenge of upsampling dense HRTFs from sparse measurements for personalized spatial audio by emphasizing explicit frequency-domain modeling of the log-magnitude spectrum

. It introduces the FD-Conformer, a two-module sparse-to-dense network that sums spatial mapping with a frequency-domain Conformer, projecting a binaural spectral representation through Conformer blocks to capture local and long-range spectral dependencies; the model is trained with a combined LSD and spectral gradient loss. Across SONICOM and HUTUBS datasets, the FD-Conformer achieves state-of-the-art ILD and LSD, especially under extreme sparsity (e.g., 3–5 measurements), demonstrating the importance of frequency-aware design for robust HRTF magnitude upsampling. The approach offers practical impact for efficient, accurate personalized spatial audio with reduced measurement burden, and suggests future work on deeper integration of spectral and spatial modeling.

Abstract

Paper Structure (22 sections, 20 equations, 6 figures, 3 tables)

This paper contains 22 sections, 20 equations, 6 figures, 3 tables.

Introduction
Problem Statement
Existing Methods
Distance-Weighted Interpolation
Basis Function Decomposition
Learning-Based Methods
Proposed Method
Network Architecture
Spatial Mapping Module
Binaural Spectral Representation
Frequency-Domain Modeling with Conformer Blocks
Training Objective
Experiments
Dataset and Preprocessing
Experimental Setup
...and 7 more sections

Figures (6)

Figure 1: Conceptual illustration of HRTF upsampling from sparse measurements.
Figure 2: Frequency-frequency Pearson correlation of HRTF log-magnitude responses. Each entry indicates the correlation between two frequency bins, computed over all subjects, directions, and ears. Strong correlations are observed both locally around the diagonal and across distant frequency bins.
Figure 3: Overall framework of the proposed FD-Conformer.
Figure 4: Per-frequency LSD under different numbers of measurements.
Figure 5: Spatial distribution of LSD over azimuth and elevation under sparse measurements. For each method, the left and right panels correspond to the left and right ears, respectively.
...and 1 more figures

Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling

TL;DR

Abstract

Exploring Frequency-Domain Feature Modeling for HRTF Magnitude Upsampling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)