Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
Hyunsung Cho, Alexander Wang, Divya Kartik, Emily Liying Xie, Yukang Yan, David Lindlbauer
TL;DR
Auptimize tackles localization errors in XR spatial audio by disentangling audio cue locations from their visual hosts and relocating cues to optimal azimuth positions. It combines a data-driven analyzer (estimating localization blur $P(V|S)$ and cone-of-confusion distance $D(V,S)$) with an integer-programming optimizer that assigns audio cues to 12°-bin locations $S^*$ to maximize a weighted objective. Empirical evaluation shows Auptimize outperforms generic head-related transfer function (HRTF) and dynamic audio baselines in source identification accuracy and reduces response times, demonstrating practical benefits for multi-source audio guidance in XR. The approach establishes a foundational method for reliable audio-visual integration in spatial interfaces and points to future work on personalization, complex scenes, and additional audio-visual cues.
Abstract
Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimination error and front-back confusion. This decreases the efficiency of XR interfaces because users misidentify from which XR element a sound is coming. To address this, we propose Auptimize, a novel computational approach for placing XR sound sources, which mitigates such localization errors by utilizing the ventriloquist effect. Auptimize disentangles the sound source locations from the visual elements and relocates the sound sources to optimal positions for unambiguous identification of sound cues, avoiding errors due to inter-source proximity and front-back confusion. Our evaluation shows that Auptimize decreases spatial audio-based source identification errors compared to playing sound cues at the paired visual-sound locations. We demonstrate the applicability of Auptimize for diverse spatial audio-based interactive XR scenarios.
