Table of Contents
Fetching ...

SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields

Sangheon Yang, Yeongin Yoon, Hong Mo Jung, Jongwoo Lim

TL;DR

SMF-VO proposes a motion-centric paradigm that directly estimates ego-motion as instantaneous linear and angular velocity from sparse optical flow, bypassing explicit pose estimation and dense landmark maps. It leverages a generalized 3D ray-based motion field to accommodate diverse camera models, including fisheye lenses, and solves per-frame linear least-squares problems, with robust RANSAC and an optional lightweight nonlinear refinement to curb drift. The approach demonstrates real-time performance (>100 FPS) on a CPU-only Raspberry Pi 5 while achieving competitive accuracy on EuRoC, KITTI, and TUM-VI Room benchmarks, highlighting strong efficiency for mobile robotics and wearables. This work provides a practical, scalable alternative to pose-centric VO/VIO, enabling robust ego-motion estimation on resource-constrained devices and broad camera systems; future extensions include event-camera integration and IMU fusion.

Abstract

Traditional Visual Odometry (VO) and Visual Inertial Odometry (VIO) methods rely on a 'pose-centric' paradigm, which computes absolute camera poses from the local map thus requires large-scale landmark maintenance and continuous map optimization. This approach is computationally expensive, limiting their real-time performance on resource-constrained devices. To overcome these limitations, we introduce Sparse Motion Field Visual Odometry (SMF-VO), a lightweight, 'motion-centric' framework. Our approach directly estimates instantaneous linear and angular velocity from sparse optical flow, bypassing the need for explicit pose estimation or expensive landmark tracking. We also employed a generalized 3D ray-based motion field formulation that works accurately with various camera models, including wide-field-of-view lenses. SMF-VO demonstrates superior efficiency and competitive accuracy on benchmark datasets, achieving over 100 FPS on a Raspberry Pi 5 using only a CPU. Our work establishes a scalable and efficient alternative to conventional methods, making it highly suitable for mobile robotics and wearable devices.

SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields

TL;DR

SMF-VO proposes a motion-centric paradigm that directly estimates ego-motion as instantaneous linear and angular velocity from sparse optical flow, bypassing explicit pose estimation and dense landmark maps. It leverages a generalized 3D ray-based motion field to accommodate diverse camera models, including fisheye lenses, and solves per-frame linear least-squares problems, with robust RANSAC and an optional lightweight nonlinear refinement to curb drift. The approach demonstrates real-time performance (>100 FPS) on a CPU-only Raspberry Pi 5 while achieving competitive accuracy on EuRoC, KITTI, and TUM-VI Room benchmarks, highlighting strong efficiency for mobile robotics and wearables. This work provides a practical, scalable alternative to pose-centric VO/VIO, enabling robust ego-motion estimation on resource-constrained devices and broad camera systems; future extensions include event-camera integration and IMU fusion.

Abstract

Traditional Visual Odometry (VO) and Visual Inertial Odometry (VIO) methods rely on a 'pose-centric' paradigm, which computes absolute camera poses from the local map thus requires large-scale landmark maintenance and continuous map optimization. This approach is computationally expensive, limiting their real-time performance on resource-constrained devices. To overcome these limitations, we introduce Sparse Motion Field Visual Odometry (SMF-VO), a lightweight, 'motion-centric' framework. Our approach directly estimates instantaneous linear and angular velocity from sparse optical flow, bypassing the need for explicit pose estimation or expensive landmark tracking. We also employed a generalized 3D ray-based motion field formulation that works accurately with various camera models, including wide-field-of-view lenses. SMF-VO demonstrates superior efficiency and competitive accuracy on benchmark datasets, achieving over 100 FPS on a Raspberry Pi 5 using only a CPU. Our work establishes a scalable and efficient alternative to conventional methods, making it highly suitable for mobile robotics and wearable devices.

Paper Structure

This paper contains 14 sections, 15 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overall comparison of the accuracy and speed of VO and VIO algorithms. Our algorithm directly estimates camera motion from the visual motion field. Thus, it runs much faster than conventional VO algorithms without sacrificing accuracy.
  • Figure 2: Overview of the framework we propose.
  • Figure 3: Qualitative trajectory result of EuRoC sequences. The estimated trajectories of SMF-VO (ours), ORB-SLAM3 Campos21 (VO), Basalt Usenko19 (VO), OpenVINS Geneva20 (VIO), and OKVIS2 Leutenegger22 (VO) are aligned to the ground truth pose at the first frame.
  • Figure 4: Qualitative trajectory result of KITTI sequences. The estimated trajectories of SMF-VO (ours), ORB-SLAM3 Campos21 (VO), Basalt Usenko19 (VO), and VINS-Fusion Qin18 (VO) are aligned to the ground truth pose at the first frame.
  • Figure 5: Qualitative trajectory results for TUM-VI Room sequences. The estimated trajectories of SMF-VO (ours), ORB-SLAM3 Campos21 (VO), Basalt Usenko19 (VO), and OKVIS2 Leutenegger22 (VO) are aligned to the ground truth pose at the first frame.