Table of Contents
Fetching ...

SonoCraftAR: Towards Supporting Personalized Authoring of Sound-Reactive AR Interfaces by Deaf and Hard of Hearing Users

Jaewook Lee, Davin Win Kyi, Leejun Kim, Jenny Peng, Gagyeom Lim, Jeremy Zhengqi Huang, Dhruv Jain, Jon E. Froehlich

TL;DR

This paper addresses the lack of personalized sound visualizations for Deaf and hard‑of‑hearing users in AR. It introduces SonoCraftAR, a proof‑of‑concept system that uses typed prompts and a multi‑agent LLM pipeline to generate runtime Unity/Shapes code and visualize ambient sounds by mapping the dominant frequency to visual properties. The implementation combines real‑time audio processing on a Windows laptop with Roslyn‑based runtime compilation streamed to the HoloLens 2, and showcases eight example interfaces that respond to audio. The work highlights opportunities for AI‑assisted, personalized sound accessibility tools in AR while noting design, usability, and latency challenges that warrant future user studies and technical refinements.

Abstract

Augmented reality (AR) has shown promise for supporting Deaf and hard-of-hearing (DHH) individuals by captioning speech and visualizing environmental sounds, yet existing systems do not allow users to create personalized sound visualizations. We present SonoCraftAR, a proof-of-concept prototype that empowers DHH users to author custom sound-reactive AR interfaces using typed natural language input. SonoCraftAR integrates real-time audio signal processing with a multi-agent LLM pipeline that procedurally generates animated 2D interfaces via a vector graphics library. The system extracts the dominant frequency of incoming audio and maps it to visual properties such as size and color, making the visualizations respond dynamically to sound. This early exploration demonstrates the feasibility of open-ended sound-reactive AR interface authoring and discusses future opportunities for personalized, AI-assisted tools to improve sound accessibility.

SonoCraftAR: Towards Supporting Personalized Authoring of Sound-Reactive AR Interfaces by Deaf and Hard of Hearing Users

TL;DR

This paper addresses the lack of personalized sound visualizations for Deaf and hard‑of‑hearing users in AR. It introduces SonoCraftAR, a proof‑of‑concept system that uses typed prompts and a multi‑agent LLM pipeline to generate runtime Unity/Shapes code and visualize ambient sounds by mapping the dominant frequency to visual properties. The implementation combines real‑time audio processing on a Windows laptop with Roslyn‑based runtime compilation streamed to the HoloLens 2, and showcases eight example interfaces that respond to audio. The work highlights opportunities for AI‑assisted, personalized sound accessibility tools in AR while noting design, usability, and latency challenges that warrant future user studies and technical refinements.

Abstract

Augmented reality (AR) has shown promise for supporting Deaf and hard-of-hearing (DHH) individuals by captioning speech and visualizing environmental sounds, yet existing systems do not allow users to create personalized sound visualizations. We present SonoCraftAR, a proof-of-concept prototype that empowers DHH users to author custom sound-reactive AR interfaces using typed natural language input. SonoCraftAR integrates real-time audio signal processing with a multi-agent LLM pipeline that procedurally generates animated 2D interfaces via a vector graphics library. The system extracts the dominant frequency of incoming audio and maps it to visual properties such as size and color, making the visualizations respond dynamically to sound. This early exploration demonstrates the feasibility of open-ended sound-reactive AR interface authoring and discusses future opportunities for personalized, AI-assisted tools to improve sound accessibility.

Paper Structure

This paper contains 9 sections, 2 figures.

Figures (2)

  • Figure 1: System overview of SonoCraftAR. A typed user prompt is first expanded by the Prompt Enhancement agent into structured implementation guidelines. The Code Generation agent then produces a Unity C# script that uses the Shapes vector graphics library, which is checked for compilation errors by the Code Checker agent. The finalized code script is compiled at runtime with Roslyn and rendered in AR. A real‑time audio processing server continuously computes the dominant frequency of incoming sound, which drives the visualization’s animations.
  • Figure 2: Eight example sound‑reactive interfaces created with SonoCraftAR. The designs include arrows, waves, pulsing arcs, sound bars, and more. Examples A–C were inspired by GlassEar jain2015headmounted, while D–H showcase ideas proposed by the research team. These visualizations react to the dominant frequency of music playing through a speaker.