Table of Contents
Fetching ...

Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation

Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao Gong, Abolfazl Razi

TL;DR

Motor Focus tackles the problem of predicting pedestrian ego-motion from monocular video on mobile devices for assistive visual navigation. It introduces a lightweight image-based framework that uses dense optical flow and a SVD-based ego-motion model to separate camera-induced motion from scene dynamics, followed by Gaussian attention smoothing to stabilize movement focus. The approach is validated on a self-collected dataset and demonstrates real-time performance (FPS > 40) with MAE around 60 pixels and SNR near 23 dB, outperforming classical feature detectors. This work enables real-time prioritization of notifications in complex environments, reducing calibration requirements and improving usability for visually impaired users.

Abstract

Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific directions. This brings the need for identifying the observer's motion direction (ego-motion) by merely processing visual information, which is the key contribution of this paper. Specifically, we introduce Motor Focus, a lightweight image-based framework that predicts the ego-motion - the humans (and humanoid machines) movement intentions based on their visual feeds, while filtering out camera motion without any camera calibration. To this end, we implement an optical flow-based pixel-wise temporal analysis method to compensate for the camera motion with a Gaussian aggregation to smooth out the movement prediction area. Subsequently, to evaluate the performance, we collect a dataset including 50 clips of pedestrian scenes in 5 different scenarios. We tested this framework with classical feature detectors such as SIFT and ORB to show the comparison. Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness (SNR = 23dB), confirming its potential to enhance the usability of vision-based assistive navigation tools in complex environments.

Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation

TL;DR

Motor Focus tackles the problem of predicting pedestrian ego-motion from monocular video on mobile devices for assistive visual navigation. It introduces a lightweight image-based framework that uses dense optical flow and a SVD-based ego-motion model to separate camera-induced motion from scene dynamics, followed by Gaussian attention smoothing to stabilize movement focus. The approach is validated on a self-collected dataset and demonstrates real-time performance (FPS > 40) with MAE around 60 pixels and SNR near 23 dB, outperforming classical feature detectors. This work enables real-time prioritization of notifications in complex environments, reducing calibration requirements and improving usability for visually impaired users.

Abstract

Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific directions. This brings the need for identifying the observer's motion direction (ego-motion) by merely processing visual information, which is the key contribution of this paper. Specifically, we introduce Motor Focus, a lightweight image-based framework that predicts the ego-motion - the humans (and humanoid machines) movement intentions based on their visual feeds, while filtering out camera motion without any camera calibration. To this end, we implement an optical flow-based pixel-wise temporal analysis method to compensate for the camera motion with a Gaussian aggregation to smooth out the movement prediction area. Subsequently, to evaluate the performance, we collect a dataset including 50 clips of pedestrian scenes in 5 different scenarios. We tested this framework with classical feature detectors such as SIFT and ORB to show the comparison. Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness (SNR = 23dB), confirming its potential to enhance the usability of vision-based assistive navigation tools in complex environments.
Paper Structure (12 sections, 12 equations, 6 figures, 2 tables)

This paper contains 12 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Concept of Assistive Visual Navigation
  • Figure 2: Motor focus visualization, (a) is the raw RGB image, (b) is the compensated optical flow map, (c) shows the identified attention points of 10 consecutive frames, (d) is the attention map aggregated by the Gaussian distributions of attention points from (c).
  • Figure 3: The proposed framework, (a) is a two consecutive frame pair, (b) is the original optical flow map (magnitude), (c) is the original optical flow field (vector), (d) is the compensated optical flow map, (e) is the camera motion $\epsilon$, (f) is the compensated optical flow field, (g) is the probability map of attention point for $I_2$, (h) is the aggregated gaussian distribution of attention points from (g), and (i) is the attention map for motor focus of frame $I_2$.
  • Figure 4: The samples of the collected dataset.
  • Figure 5: Visualization of ego-motion compensation, each image consists of four cells, from left to right: grayscale image with predicted moving direction, the magnitude of camera motion $\epsilon$ (ego-motion), raw optical flow (vanilla dense-optical flow), and optical flow with ego-motion compensation.
  • ...and 1 more figures