Table of Contents
Fetching ...

BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation

Yufei Wei, Sha Lu, Fuzhang Han, Rong Xiong, Yue Wang

TL;DR

This paper presents BEV-ODOM, a novel MVO framework leveraging the Bird’s Eye View (BEV) Representation to address scale drift, and indicates that BEV-ODOM outperforms current MVO methods, demonstrating reduced scale drift and higher accuracy.

Abstract

Monocular visual odometry (MVO) is vital in autonomous navigation and robotics, providing a cost-effective and flexible motion tracking solution, but the inherent scale ambiguity in monocular setups often leads to cumulative errors over time. In this paper, we present BEV-ODOM, a novel MVO framework leveraging the Bird's Eye View (BEV) Representation to address scale drift. Unlike existing approaches, BEV-ODOM integrates a depth-based perspective-view (PV) to BEV encoder, a correlation feature extraction neck, and a CNN-MLP-based decoder, enabling it to estimate motion across three degrees of freedom without the need for depth supervision or complex optimization techniques. Our framework reduces scale drift in long-term sequences and achieves accurate motion estimation across various datasets, including NCLT, Oxford, and KITTI. The results indicate that BEV-ODOM outperforms current MVO methods, demonstrating reduced scale drift and higher accuracy.

BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation

TL;DR

This paper presents BEV-ODOM, a novel MVO framework leveraging the Bird’s Eye View (BEV) Representation to address scale drift, and indicates that BEV-ODOM outperforms current MVO methods, demonstrating reduced scale drift and higher accuracy.

Abstract

Monocular visual odometry (MVO) is vital in autonomous navigation and robotics, providing a cost-effective and flexible motion tracking solution, but the inherent scale ambiguity in monocular setups often leads to cumulative errors over time. In this paper, we present BEV-ODOM, a novel MVO framework leveraging the Bird's Eye View (BEV) Representation to address scale drift. Unlike existing approaches, BEV-ODOM integrates a depth-based perspective-view (PV) to BEV encoder, a correlation feature extraction neck, and a CNN-MLP-based decoder, enabling it to estimate motion across three degrees of freedom without the need for depth supervision or complex optimization techniques. Our framework reduces scale drift in long-term sequences and achieves accurate motion estimation across various datasets, including NCLT, Oxford, and KITTI. The results indicate that BEV-ODOM outperforms current MVO methods, demonstrating reduced scale drift and higher accuracy.

Paper Structure

This paper contains 16 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of MVO approaches: traditional methods lack consistent scaling; learning-based methods require additional supervision. In contrast, our method achieves low scale drift using only pose supervision with BEV representation.
  • Figure 2: Overview of the proposed framework.
  • Figure 3: BEV-ODOM's intermediate process and outcomes: predicted and actual trajectories (top left), camera images at four positions (A-D, top right), and the BEV feature maps' and BEV optical flow information's visualization (bottom).
  • Figure 4: Trajectory comparisons on NCLT, Oxford, and KITTI datasets. For the NCLT (a) and Oxford (b) datasets, the left panels show full test paths and the right panels show selected subsets. For the KITTI (c) dataset, sequences 09 (left) and 10 (right) are displayed.
  • Figure 5: Logarithmic scale factor variation along the path.