SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XR
Hyunsung Cho, Naveen Sendhilnathan, Michael Nebeling, Tianyi Wang, Purnima Padmanabhan, Jonathan Browder, David Lindlbauer, Tanya R. Jonker, Kashyap Todi
TL;DR
The paper tackles gaze-based object selection in XR when visual feedback is unavailable or unreliable due to display limitations. It proposes SonoHaptics, an audio-haptic cursor that uses data-driven cross-modal mappings from visual features (color lightness, size, material, position) to audio-haptic cues (pitch, direction, amplitude, timbre) and generates feedback automatically as users hover their gaze. A perception study establishes reliable mappings (e.g., color lightness ↔ pitch, size ↔ amplitude) to ground the models, which are then implemented to provide global object-level and local clutter-amplified feedback. Comparative evaluation shows SonoHaptics improves accuracy in cluttered scenes relative to non-visual baselines and can match or exceed text-to-speech in certain scenarios, suggesting broad potential for non-visual XR interaction and low-vision accessibility in real-world use cases.
Abstract
We introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e.g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user's environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.
