Table of Contents
Fetching ...

Photogrammetry-Reconstructed 3D Head Meshes for Accessible Individual Head-Related Transfer Functions

Ludovic Pirard, Lorenzo Picinali, Katarina C. Poole

Abstract

Individual head-related transfer functions (HRTFs) are essential for accurate spatial audio binaural rendering but remain difficult to obtain due to measurement complexity. This study investigates whether photogrammetry-reconstructed (PR) head and ear meshes, acquired with consumer hardware, can provide a practically useful baseline for individual HRTF synthesis. Using the SONICOM HRTF dataset, 72-image photogrammetry captures per subject were processed with Apple's Object Capture API to generate PR meshes for 150 subjects. Mesh2HRTF was used to compute PR synthetic HRTFs, which were compared against measured HRTFs, high-resolution 3D scan-derived HRTFs, KEMAR, and random HRTFs through numerical evaluation, auditory models, and a behavioural sound localisation experiment (N = 27). PR synthetic HRTFs preserved ITD cues but exhibited increased ILD and spectral errors. Auditory-model predictions and behavioural data showed substantially higher quadrant error rates, reduced elevation accuracy, and greater front-back confusions than measured HRTFs, performing worse than random HRTFs on perceptual metrics. Current photogrammetry pipelines support individual HRTF synthesis but are limited by insufficient pinna morphology details and high-frequency spectral fidelity needed for accurate individual HRTFs containing monaural cues.

Photogrammetry-Reconstructed 3D Head Meshes for Accessible Individual Head-Related Transfer Functions

Abstract

Individual head-related transfer functions (HRTFs) are essential for accurate spatial audio binaural rendering but remain difficult to obtain due to measurement complexity. This study investigates whether photogrammetry-reconstructed (PR) head and ear meshes, acquired with consumer hardware, can provide a practically useful baseline for individual HRTF synthesis. Using the SONICOM HRTF dataset, 72-image photogrammetry captures per subject were processed with Apple's Object Capture API to generate PR meshes for 150 subjects. Mesh2HRTF was used to compute PR synthetic HRTFs, which were compared against measured HRTFs, high-resolution 3D scan-derived HRTFs, KEMAR, and random HRTFs through numerical evaluation, auditory models, and a behavioural sound localisation experiment (N = 27). PR synthetic HRTFs preserved ITD cues but exhibited increased ILD and spectral errors. Auditory-model predictions and behavioural data showed substantially higher quadrant error rates, reduced elevation accuracy, and greater front-back confusions than measured HRTFs, performing worse than random HRTFs on perceptual metrics. Current photogrammetry pipelines support individual HRTF synthesis but are limited by insufficient pinna morphology details and high-frequency spectral fidelity needed for accurate individual HRTFs containing monaural cues.
Paper Structure (32 sections, 2 equations, 14 figures, 1 table)

This paper contains 32 sections, 2 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Photogrammetry data example from the SONICOM dataset.
  • Figure 2: Wireframe views of the high-resolution 3D scan mesh (left, subject P0004, right ear) and the photogrammetry-reconstructed mesh (right, same subject and same ear).
  • Figure 3: Azimuth plane: Four subjects measured, 3D synthetic, PR synthetic and random HRTFs (left HRTFs).
  • Figure 4: Elevation plane: Four subjects measured, 3D synthetic, PR synthetic and random HRTFs (left HRTFs).
  • Figure 5: Average metrics across 150 subjects showing median [25th-75th percentile] with significance bars. A. Absolute ITD difference ($\mu$s), B. Absolute ILD difference (dB), C. Log-Spectral Distortion (dB). (* $p < 0.05$, ** $p < 0.01$, *** $p < 0.001$).
  • ...and 9 more figures