Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking
Marcus Abate, Ariel Schwartz, Xue Iuan Wong, Wangdong Luo, Rotem Littman, Marc Klinger, Lars Kuhnert, Douglas Blue, Luca Carlone
TL;DR
The paper tackles robust multi-camera VI-SLAM for autonomous valet parking in GPS-denied environments. It extends Kimera to fuse multiple monocular cameras with IMU and wheel odometry, introduces monocular loop-closure alternatives including scale-less pose and rotation-only variants, and builds a dense free-space map from a ground-plane detected by a CNN and a homography-based 3D projection feeding Kimera-Semantics. Across photo-realistic simulations and Ford datasets, the system achieves a trajectory error of less than $1\%$ of the trajectory length over more than $8\,\text{km}$. This approach delivers robust, globally consistent localization and mapping for autonomous parking without relying on LiDAR, enabling safe navigation in complex garages.
Abstract
Localization and mapping are key capabilities for self-driving vehicles. In this paper, we build on Kimera and extend it to use multiple cameras as well as external (eg wheel) odometry sensors, to obtain accurate and robust odometry estimates in real-world problems. Additionally, we propose an effective scheme for closing loops that circumvents the drawbacks of common alternatives based on the Perspective-n-Point method and also works with a single monocular camera. Finally, we develop a method for dense 3D mapping of the free space that combines a segmentation network for free-space detection with a homography-based dense mapping technique. We test our system on photo-realistic simulations and on several real datasets collected on a car prototype developed by the Ford Motor Company, spanning both indoor and outdoor parking scenarios. Our multi-camera system is shown to outperform state-of-the art open-source visual-inertial-SLAM pipelines (Vins-Fusion, ORB-SLAM3), and exhibits an average trajectory error under 1% of the trajectory length across more than 8km of distance traveled (combined across all datasets). A video showcasing the system is available at: youtu.be/H8CpzDpXOI8.
