Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

Marcus Abate; Ariel Schwartz; Xue Iuan Wong; Wangdong Luo; Rotem Littman; Marc Klinger; Lars Kuhnert; Douglas Blue; Luca Carlone

Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

Marcus Abate, Ariel Schwartz, Xue Iuan Wong, Wangdong Luo, Rotem Littman, Marc Klinger, Lars Kuhnert, Douglas Blue, Luca Carlone

TL;DR

The paper tackles robust multi-camera VI-SLAM for autonomous valet parking in GPS-denied environments. It extends Kimera to fuse multiple monocular cameras with IMU and wheel odometry, introduces monocular loop-closure alternatives including scale-less pose and rotation-only variants, and builds a dense free-space map from a ground-plane detected by a CNN and a homography-based 3D projection feeding Kimera-Semantics. Across photo-realistic simulations and Ford datasets, the system achieves a trajectory error of less than $1\%$ of the trajectory length over more than $8\,\text{km}$. This approach delivers robust, globally consistent localization and mapping for autonomous parking without relying on LiDAR, enabling safe navigation in complex garages.

Abstract

Localization and mapping are key capabilities for self-driving vehicles. In this paper, we build on Kimera and extend it to use multiple cameras as well as external (eg wheel) odometry sensors, to obtain accurate and robust odometry estimates in real-world problems. Additionally, we propose an effective scheme for closing loops that circumvents the drawbacks of common alternatives based on the Perspective-n-Point method and also works with a single monocular camera. Finally, we develop a method for dense 3D mapping of the free space that combines a segmentation network for free-space detection with a homography-based dense mapping technique. We test our system on photo-realistic simulations and on several real datasets collected on a car prototype developed by the Ford Motor Company, spanning both indoor and outdoor parking scenarios. Our multi-camera system is shown to outperform state-of-the art open-source visual-inertial-SLAM pipelines (Vins-Fusion, ORB-SLAM3), and exhibits an average trajectory error under 1% of the trajectory length across more than 8km of distance traveled (combined across all datasets). A video showcasing the system is available at: youtu.be/H8CpzDpXOI8.

Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

TL;DR

of the trajectory length over more than

. This approach delivers robust, globally consistent localization and mapping for autonomous parking without relying on LiDAR, enabling safe navigation in complex garages.

Abstract

Paper Structure (10 sections, 2 equations, 4 figures, 6 tables)

This paper contains 10 sections, 2 equations, 4 figures, 6 tables.

Introduction
Related Work
System Architecture
Hardware Architecture and Data Collection
Software Architecture
Experiments
Visual-Inertial Odometry
Loop-Closure Detection
Ground Plane Reconstruction
Conclusions

Figures (4)

Figure 1: (a) Illustration of the Ford test bed and sensor setup. (b) Four sample images from an outdoor dataset; the top two are from the front and right cameras onboard the car and the bottom two are the output of the semantic segmentation network that identifies free-space road for the mapping module. (c) Sample of outdoor trajectories collected on the car in Detroit, Michigan, USA. The pictured trajectories are on average 450 in length.
Figure 2: Overview of the proposed system architecture. Inputs are RGB monocular images from all four sides of the car, as well as a single IMU. Our modified Kimera-VIO processes all camera inputs in parallel and generates a robust state estimate, which is fed to the Robust Pose Graph Optimization (RPGO) module for loop closure detection and correction. Simultaneously, a semantic segmentation network identifies the ground plane in the image, which is used by the modified Kimera-Semantics module to generate a 3D reconstruction of the free space. For a more in-depth description of Kimera's modules, refer to Rosinol21ijrr-Kimera.
Figure 3: Histograms of the proposed loop-closure detection methods. Each bin is error on rotation and translation, and contains the sum total of all loop-closure candidates across all datasets that scored within that bin.
Figure 4: 3D reconstructions produced by the proposed free-space mapping approach on several Ford datasets. All four cameras were used for reconstruction, and Kimera's visual-inertial odometry was performed with all four cameras and external odometry. A colormap of the estimated trajectory is plotted over each reconstruction, with cooler colors representing lower ATE RMSE.

Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

TL;DR

Abstract

Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

Authors

TL;DR

Abstract

Table of Contents

Figures (4)