Table of Contents
Fetching ...

SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XR

Hyunsung Cho, Naveen Sendhilnathan, Michael Nebeling, Tianyi Wang, Purnima Padmanabhan, Jonathan Browder, David Lindlbauer, Tanya R. Jonker, Kashyap Todi

TL;DR

The paper tackles gaze-based object selection in XR when visual feedback is unavailable or unreliable due to display limitations. It proposes SonoHaptics, an audio-haptic cursor that uses data-driven cross-modal mappings from visual features (color lightness, size, material, position) to audio-haptic cues (pitch, direction, amplitude, timbre) and generates feedback automatically as users hover their gaze. A perception study establishes reliable mappings (e.g., color lightness ↔ pitch, size ↔ amplitude) to ground the models, which are then implemented to provide global object-level and local clutter-amplified feedback. Comparative evaluation shows SonoHaptics improves accuracy in cluttered scenes relative to non-visual baselines and can match or exceed text-to-speech in certain scenarios, suggesting broad potential for non-visual XR interaction and low-vision accessibility in real-world use cases.

Abstract

We introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e.g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user's environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.

SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XR

TL;DR

The paper tackles gaze-based object selection in XR when visual feedback is unavailable or unreliable due to display limitations. It proposes SonoHaptics, an audio-haptic cursor that uses data-driven cross-modal mappings from visual features (color lightness, size, material, position) to audio-haptic cues (pitch, direction, amplitude, timbre) and generates feedback automatically as users hover their gaze. A perception study establishes reliable mappings (e.g., color lightness ↔ pitch, size ↔ amplitude) to ground the models, which are then implemented to provide global object-level and local clutter-amplified feedback. Comparative evaluation shows SonoHaptics improves accuracy in cluttered scenes relative to non-visual baselines and can match or exceed text-to-speech in certain scenarios, suggesting broad potential for non-visual XR interaction and low-vision accessibility in real-world use cases.

Abstract

We introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e.g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user's environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.
Paper Structure (56 sections, 2 equations, 17 figures, 3 tables)

This paper contains 56 sections, 2 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: The "bouba/kiki" effect: Which one is 'bouba'? Which one is 'kiki'? Cross-modal correspondences enable us to perceive features across multiple sensory modalities, such as shapes visually or aurally.
  • Figure 2: Perception Study Setup: Participants were shown a cube that varied in color lightness and size. They used the right thumbstick of a Quest Pro controller to manipulate the pitch of an audio signal (left--right direction) and amplitude of a vibration signal (up--down), and the left controller trigger button to confirm selection after selecting the best matching pitch and signal. In-ear stereo earphones were used for audio feedback. Four linear resonance actuators positioned at cardinal directions on a wristband provided haptic feedback (wristband illustrated to maintain anonymity).
  • Figure 3: Examples of individual lightness-to-audio pitch mappings in black ($R^2$ mean=0.72, SD=0.25, median=0.78) and area size-to-vibration amplitude mappings in blue ($R^2$ mean=0.56, SD=0.24, median=0.65).
  • Figure 4: One-to-one and compound mappings of lightness to pitch ($r$=0.709; $r$=0.530) and size to amplitude ($r$=0.567; $r$=0.345). The x-axis shows the lightness level in CIELAB color space (L0=black, L100=white; left) and the area size of the cube from small to large (right). The y-axis represents the pitch in Hz (left) and amplitude (right).
  • Figure 5: Pearson's correlation coefficients $r$ for lightness/size to pitch/amplitude mappings. One-to-one mappings are when participants could change only one of pitch and amplitude value at a time when only one of lightness or size changed. Compound mappings are when participants could change both pitch and amplitude values at once while both lightness and size of the cube change simultaneously.
  • ...and 12 more figures