Table of Contents
Fetching ...

Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

Mathias N. Lystbæk, Haley Adams, Ranjith Kagathi Ananda, Eric J Gonzalez, Luca Ballan, Qiuxuan Wu, Andrea Colaço, Peter Tan, Mar Gonzalez-Franco

Abstract

Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of future audio-only navigation systems for incorporating directional cues, especially real-time corrective spatial audio.

Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

Abstract

Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of future audio-only navigation systems for incorporating directional cues, especially real-time corrective spatial audio.
Paper Structure (26 sections, 18 figures, 1 table)

This paper contains 26 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: (a) Three color-coded routes, marked with route start and end for each route RX, where "X" is 1 to 3, along with color-matched stars and circles indicating landmarks along the routes. "RX D-E" indicates the dead end for each route. (b) The apparatus setup for the prototype and evaluation. (c) Box plots showing results on Distance Walked (m) and Route Deviations (count). Whiskers indicate minimum and maximum. Statistical significance is shown as * for $p<0.05$, ** for $p<0.01$, and *** for $p<0.001$.
  • Figure 2: (a) Three color-coded routes, marked with route start and end for each route RX, where "X" is 1 to 3, along with color-matched stars and circles indicating landmarks along the routes. "RX D-E" indicates the dead end for each route. (b) The apparatus setup for the prototype and evaluation. (c) Box plots showing results on Distance Walked (m) and Route Deviations (count). Whiskers indicate minimum and maximum. Statistical significance is shown as * for $p<0.05$, ** for $p<0.01$, and *** for $p<0.001$.
  • Figure 3: Results on the Distance Walked overall and for each route separately.
  • Figure 4: Results on the Number of Deviations overall and for each route separately.
  • Figure 5: Results on Pointing Accuracy overall and for each route separately.
  • ...and 13 more figures