Table of Contents
Fetching ...

Graph Neural Field with Spatial-Correlation Augmentation for HRTF Personalization

De Hu, Junsheng Hu, Cuicui Jiang

TL;DR

This work targets HRTF personalization for unseen subjects by leveraging spatial correlations across directions. It introduces GraphNF-SCA, a three-part framework with a GNN-based HRTF-P module to predict subject-specific HRTFs, a GNN-based HRTF-U module to model directional spatial structure, and a fine-tuning stage that reinforces predictions via spatial relationships. The method achieves state-of-the-art LSD and ILD performance across SONICOM, CIPIC, and HUTUBS datasets, especially in data-scarce scenarios, by effectively integrating retrieved subject information and spatial correlations through graph neural networks and LoRA-based decoding. This approach enables scalable, accurate HRTF personalization for unseen subjects, facilitating high-fidelity immersive spatial audio in VR/AR applications.

Abstract

To achieve immersive spatial audio rendering on VR/AR devices, high-quality Head-Related Transfer Functions (HRTFs) are essential. In general, HRTFs are subject-dependent and position-dependent, and their measurement is time-consuming and tedious. To address this challenge, we propose the Graph Neural Field with Spatial-Correlation Augmentation (GraphNF-SCA) for HRTF personalization, which can be used to generate individual HRTFs for unseen subjects. The GraphNF-SCA consists of three key components: an HRTF personalization (HRTF-P) module, an HRTF upsampling (HRTF-U) module, and a fine-tuning stage. In the HRTF-P module, we predict HRTFs of the target subject via the Graph Neural Network (GNN) with an encoder-decoder architecture, where the encoder extracts universal features and the decoder incorporates the target-relevant features and produces individualized HRTFs. The HRTF-U module employs another GNN to model spatial correlations across HRTFs. This module is fine-tuned using the output of the HRTF-P module, thereby enhancing the spatial consistency of the predicted HRTFs. Unlike existing methods that estimate individual HRTFs position-by-position without spatial correlation modeling, the GraphNF-SCA effectively leverages inherent spatial correlations across HRTFs to enhance the performance of HRTF personalization. Experimental results demonstrate that the GraphNF-SCA achieves state-of-the-art results.

Graph Neural Field with Spatial-Correlation Augmentation for HRTF Personalization

TL;DR

This work targets HRTF personalization for unseen subjects by leveraging spatial correlations across directions. It introduces GraphNF-SCA, a three-part framework with a GNN-based HRTF-P module to predict subject-specific HRTFs, a GNN-based HRTF-U module to model directional spatial structure, and a fine-tuning stage that reinforces predictions via spatial relationships. The method achieves state-of-the-art LSD and ILD performance across SONICOM, CIPIC, and HUTUBS datasets, especially in data-scarce scenarios, by effectively integrating retrieved subject information and spatial correlations through graph neural networks and LoRA-based decoding. This approach enables scalable, accurate HRTF personalization for unseen subjects, facilitating high-fidelity immersive spatial audio in VR/AR applications.

Abstract

To achieve immersive spatial audio rendering on VR/AR devices, high-quality Head-Related Transfer Functions (HRTFs) are essential. In general, HRTFs are subject-dependent and position-dependent, and their measurement is time-consuming and tedious. To address this challenge, we propose the Graph Neural Field with Spatial-Correlation Augmentation (GraphNF-SCA) for HRTF personalization, which can be used to generate individual HRTFs for unseen subjects. The GraphNF-SCA consists of three key components: an HRTF personalization (HRTF-P) module, an HRTF upsampling (HRTF-U) module, and a fine-tuning stage. In the HRTF-P module, we predict HRTFs of the target subject via the Graph Neural Network (GNN) with an encoder-decoder architecture, where the encoder extracts universal features and the decoder incorporates the target-relevant features and produces individualized HRTFs. The HRTF-U module employs another GNN to model spatial correlations across HRTFs. This module is fine-tuned using the output of the HRTF-P module, thereby enhancing the spatial consistency of the predicted HRTFs. Unlike existing methods that estimate individual HRTFs position-by-position without spatial correlation modeling, the GraphNF-SCA effectively leverages inherent spatial correlations across HRTFs to enhance the performance of HRTF personalization. Experimental results demonstrate that the GraphNF-SCA achieves state-of-the-art results.

Paper Structure

This paper contains 30 sections, 18 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Illustration of position-dependent and subject-dependent nature of HRTFs: The magnitude of HRTFs corresponding to subjects PP1 (left) and PP3 (right) at an azimuth angle of 45°, obtained from the HUTUBS database brinkmann2019cross.
  • Figure 2: Overview of the proposed HRTF personalization framework (referred to as GraphNF-SCA), where the pre-trained HRTF-U module is fine-tuned to reinforce the spatial correlation among the outputs of the HRTF-P module.
  • Figure 3: Network architecture for GNN-based HRTF-P module.
  • Figure 4: Network architecture for GNN-based HRTF-U module.
  • Figure 5: Spatial distribution of LSD errors for unmeasured HRTFs at the left ear. Blue and red dots indicate LSD $\leq \zeta$ and LSD $>\zeta$, respectively.
  • ...and 2 more figures