Table of Contents
Fetching ...

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

Huai Yu, Junhao Wang, Yao He, Wen Yang, Gui-Song Xia

TL;DR

MCVO introduces a generic multi-camera visual odometry framework that tolerates arbitrary camera arrangements and outputs metric-scale poses without relying on IMUs. It shifting heavy feature processing to a GPU-accelerated frontend, initializes scale via cross-camera trajectory consistency, and refines pose and scale in a backend with loop closure and cross-camera BoW fusion. The approach demonstrates improved pose accuracy and robustness on KITTI-360 and MultiCamData, outperforming several baselines and showing better generalization to non-overlapping camera configurations. This work offers a flexible, scalable solution for vision-only SLAM in diverse robotic platforms, with practical benefits for autonomous navigation in texture-challenged and dynamic environments.

Abstract

Making multi-camera visual SLAM systems easier to set up and more robust to the environment is attractive for vision robots. Existing monocular and binocular vision SLAM systems have narrow sensing Field-of-View (FoV), resulting in degenerated accuracy and limited robustness in textureless environments. Thus multi-camera SLAM systems are gaining attention because they can provide redundancy with much wider FoV. However, the usual arbitrary placement and orientation of multiple cameras make the pose scale estimation and system updating challenging. To address these problems, we propose a robust visual odometry system for rigidly-bundled arbitrarily-arranged multi-cameras, namely MCVO, which can achieve metric-scale state estimation with high flexibility in the cameras' arrangement. Specifically, we first design a learning-based feature tracking framework to shift the pressure of CPU processing of multiple video streams to GPU. Then we initialize the odometry system with the metric-scale poses under the rigid constraints between moving cameras. Finally, we fuse the features of the multi-cameras in the back-end to achieve robust pose estimation and online scale optimization. Additionally, multi-camera features help improve the loop detection for pose graph optimization. Experiments on KITTI-360 and MultiCamData datasets validate its robustness over arbitrarily arranged cameras. Compared with other stereo and multi-camera visual SLAM systems, our method obtains higher pose accuracy with better generalization ability. Our codes and online demos are available at https://github.com/JunhaoWang615/MCVO

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

TL;DR

MCVO introduces a generic multi-camera visual odometry framework that tolerates arbitrary camera arrangements and outputs metric-scale poses without relying on IMUs. It shifting heavy feature processing to a GPU-accelerated frontend, initializes scale via cross-camera trajectory consistency, and refines pose and scale in a backend with loop closure and cross-camera BoW fusion. The approach demonstrates improved pose accuracy and robustness on KITTI-360 and MultiCamData, outperforming several baselines and showing better generalization to non-overlapping camera configurations. This work offers a flexible, scalable solution for vision-only SLAM in diverse robotic platforms, with practical benefits for autonomous navigation in texture-challenged and dynamic environments.

Abstract

Making multi-camera visual SLAM systems easier to set up and more robust to the environment is attractive for vision robots. Existing monocular and binocular vision SLAM systems have narrow sensing Field-of-View (FoV), resulting in degenerated accuracy and limited robustness in textureless environments. Thus multi-camera SLAM systems are gaining attention because they can provide redundancy with much wider FoV. However, the usual arbitrary placement and orientation of multiple cameras make the pose scale estimation and system updating challenging. To address these problems, we propose a robust visual odometry system for rigidly-bundled arbitrarily-arranged multi-cameras, namely MCVO, which can achieve metric-scale state estimation with high flexibility in the cameras' arrangement. Specifically, we first design a learning-based feature tracking framework to shift the pressure of CPU processing of multiple video streams to GPU. Then we initialize the odometry system with the metric-scale poses under the rigid constraints between moving cameras. Finally, we fuse the features of the multi-cameras in the back-end to achieve robust pose estimation and online scale optimization. Additionally, multi-camera features help improve the loop detection for pose graph optimization. Experiments on KITTI-360 and MultiCamData datasets validate its robustness over arbitrarily arranged cameras. Compared with other stereo and multi-camera visual SLAM systems, our method obtains higher pose accuracy with better generalization ability. Our codes and online demos are available at https://github.com/JunhaoWang615/MCVO

Paper Structure

This paper contains 22 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Illustration of the proposed MCVO system. An example of using 3-cam and 4-cam setups on KITTI-360 dataset Liao2023KITTI-360 for state estimation. The proposed MCVO performs better than ORB-SLAM3 Carlos2021Orb-slam3 using the front stereo camera.
  • Figure 2: The pipeline of the proposed MCVO system. Scale estimation and correction ensure scale stability.
  • Figure 3: Distribution of feature points. The proposed approach achieves more uniform feature distribution and better robustness than the score-based method. Red points: tracked features. Blue points: newly added features.
  • Figure 4: Illustration of the multi-camera re-localization
  • Figure 5: Comparison of ATE for different methods on the KITTI360
  • ...and 3 more figures