MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras
Huai Yu, Junhao Wang, Yao He, Wen Yang, Gui-Song Xia
TL;DR
MCVO introduces a generic multi-camera visual odometry framework that tolerates arbitrary camera arrangements and outputs metric-scale poses without relying on IMUs. It shifting heavy feature processing to a GPU-accelerated frontend, initializes scale via cross-camera trajectory consistency, and refines pose and scale in a backend with loop closure and cross-camera BoW fusion. The approach demonstrates improved pose accuracy and robustness on KITTI-360 and MultiCamData, outperforming several baselines and showing better generalization to non-overlapping camera configurations. This work offers a flexible, scalable solution for vision-only SLAM in diverse robotic platforms, with practical benefits for autonomous navigation in texture-challenged and dynamic environments.
Abstract
Making multi-camera visual SLAM systems easier to set up and more robust to the environment is attractive for vision robots. Existing monocular and binocular vision SLAM systems have narrow sensing Field-of-View (FoV), resulting in degenerated accuracy and limited robustness in textureless environments. Thus multi-camera SLAM systems are gaining attention because they can provide redundancy with much wider FoV. However, the usual arbitrary placement and orientation of multiple cameras make the pose scale estimation and system updating challenging. To address these problems, we propose a robust visual odometry system for rigidly-bundled arbitrarily-arranged multi-cameras, namely MCVO, which can achieve metric-scale state estimation with high flexibility in the cameras' arrangement. Specifically, we first design a learning-based feature tracking framework to shift the pressure of CPU processing of multiple video streams to GPU. Then we initialize the odometry system with the metric-scale poses under the rigid constraints between moving cameras. Finally, we fuse the features of the multi-cameras in the back-end to achieve robust pose estimation and online scale optimization. Additionally, multi-camera features help improve the loop detection for pose graph optimization. Experiments on KITTI-360 and MultiCamData datasets validate its robustness over arbitrarily arranged cameras. Compared with other stereo and multi-camera visual SLAM systems, our method obtains higher pose accuracy with better generalization ability. Our codes and online demos are available at https://github.com/JunhaoWang615/MCVO
