SonifyAR: Context-Aware Sound Generation in Augmented Reality
Xia Su, Jon E. Froehlich, Eunyee Koh, Chang Xiao
TL;DR
SonifyAR tackles the absence of context-aware AR sound by introducing a PbD-based workflow that automatically captures event context and leverages an LLM to orchestrate four sound acquisition methods: recommendation, retrieval, generation, and transfer. By textualizing context from user actions, virtual objects, and real-world surfaces, the system prompts sound assets that are calibrated to materials and animations, enabling in-situ sound generation for complex AR interactions. A modular implementation using ARKit, Dense Material Segmentation, AudioLDM, and GPT-4 demonstrates usability in an eight-designer study and five application scenarios, including education and accessibility, while highlighting areas for improvement in sound quality and UI design. The work spotlights a practical path toward more immersive, accessible, and efficient AR sound authoring, with broad implications for AR content creation and headset safety applications.
Abstract
Sound plays a crucial role in enhancing user experience and immersiveness in Augmented Reality (AR). However, current platforms lack support for AR sound authoring due to limited interaction types, challenges in collecting and specifying context information, and difficulty in acquiring matching sound assets. We present SonifyAR, an LLM-based AR sound authoring system that generates context-aware sound effects for AR experiences. SonifyAR expands the current design space of AR sound and implements a Programming by Demonstration (PbD) pipeline to automatically collect contextual information of AR events, including virtual content semantics and real world context. This context information is then processed by a large language model to acquire sound effects with Recommendation, Retrieval, Generation, and Transfer methods. To evaluate the usability and performance of our system, we conducted a user study with eight participants and created five example applications, including an AR-based science experiment, an improving case for AR headset safety, and an assisting example for low vision AR users.
