Table of Contents
Fetching ...

AI Guide Dog: Egocentric Path Prediction on Smartphone

Aishwarya Jadhav, Jeffery Cao, Abhishree Shetty, Urvashi Priyam Kumar, Aditi Sharma, Ben Sukboontip, Jayant Sravan Tamarapalli, Jingyi Zhang, Anirudh Koul

TL;DR

This work tackles safe, autonomous navigation for visually impaired users by delivering a lightweight, on-device system that predicts egocentric directional actions from first-person video. It introduces an intent-conditioned, multi-label framework that supports both goal-based outdoor navigation and exploratory indoor travel using GPS and high-level maneuver signals. A newly collected egocentric video dataset with mobile sensor data is released, and the approach is demonstrated on an iPhone 13 with CoreML to enable real-time, privacy-preserving, on-device inference. Overall, the results show that incorporating intent signals significantly improves outdoor performance while maintaining practical latency, offering a step toward accessible, autonomous navigation for visually impaired individuals.

Abstract

This paper presents AI Guide Dog (AIGD), a lightweight egocentric (first-person) navigation system for visually impaired users, designed for real-time deployment on smartphones. AIGD employs a vision-only multi-label classification approach to predict directional commands, ensuring safe navigation across diverse environments. We introduce a novel technique for goal-based outdoor navigation by integrating GPS signals and high-level directions, while also handling uncertain multi-path predictions for destination-free indoor navigation. As the first navigation assistance system to handle both goal-oriented and exploratory navigation across indoor and outdoor settings, AIGD establishes a new benchmark in blind navigation. We present methods, datasets, evaluations, and deployment insights to encourage further innovations in assistive navigation systems.

AI Guide Dog: Egocentric Path Prediction on Smartphone

TL;DR

This work tackles safe, autonomous navigation for visually impaired users by delivering a lightweight, on-device system that predicts egocentric directional actions from first-person video. It introduces an intent-conditioned, multi-label framework that supports both goal-based outdoor navigation and exploratory indoor travel using GPS and high-level maneuver signals. A newly collected egocentric video dataset with mobile sensor data is released, and the approach is demonstrated on an iPhone 13 with CoreML to enable real-time, privacy-preserving, on-device inference. Overall, the results show that incorporating intent signals significantly improves outdoor performance while maintaining practical latency, offering a step toward accessible, autonomous navigation for visually impaired individuals.

Abstract

This paper presents AI Guide Dog (AIGD), a lightweight egocentric (first-person) navigation system for visually impaired users, designed for real-time deployment on smartphones. AIGD employs a vision-only multi-label classification approach to predict directional commands, ensuring safe navigation across diverse environments. We introduce a novel technique for goal-based outdoor navigation by integrating GPS signals and high-level directions, while also handling uncertain multi-path predictions for destination-free indoor navigation. As the first navigation assistance system to handle both goal-oriented and exploratory navigation across indoor and outdoor settings, AIGD establishes a new benchmark in blind navigation. We present methods, datasets, evaluations, and deployment insights to encourage further innovations in assistive navigation systems.
Paper Structure (17 sections, 10 figures, 5 tables)

This paper contains 17 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: AIGD requires just a smartphone camera and predicts future navigation direction labels.
  • Figure 2: System Architecture for AIGD.
  • Figure 3: Labeling Scheme for frames sampled at 1 FPS. Red blocks denote other walkable directions at intersection.
  • Figure 4: Data example for one timestep.
  • Figure 5: ConvLSTM with and without intent modifications
  • ...and 5 more figures