Table of Contents
Fetching ...

AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird's Eye View for Automated Valet Parking

Ye Li, Wenchao Yang, Dekun Lin, Qianlei Wang, Zhe Cui, Xiaolin Qin

TL;DR

AVM-SLAM tackles the challenging localization problem of automated valet parking in indoor garages by fusing data from four surround-view fisheye cameras, wheel encoders, and an IMU to build a semantic BEV map. The system introduces BEV flare removal to enhance road-marking segmentation and a semantic pre-qualification (SPQ) module to improve loop detection in repetitive textures, complemented by a two-layer mapping approach using submaps and a global map. A high-resolution underground garage dataset with synchronized multi-sensor data is released to benchmark and validate the method. Experimental results show improved semantic extraction, robust pose estimation, and higher mapping accuracy compared with traditional SLAM baselines, highlighting practical significance for AVP in challenging indoor environments.

Abstract

Accurate localization in challenging garage environments -- marked by poor lighting, sparse textures, repetitive structures, dynamic scenes, and the absence of GPS -- is crucial for automated valet parking (AVP) tasks. Addressing these challenges, our research introduces AVM-SLAM, a cutting-edge semantic visual SLAM architecture with multi-sensor fusion in a bird's eye view (BEV). This novel framework synergizes the capabilities of four fisheye cameras, wheel encoders, and an inertial measurement unit (IMU) to construct a robust SLAM system. Unique to our approach is the implementation of a flare removal technique within the BEV imagery, significantly enhancing road marking detection and semantic feature extraction by convolutional neural networks for superior mapping and localization. Our work also pioneers a semantic pre-qualification (SPQ) module, designed to adeptly handle the challenges posed by environments with repetitive textures, thereby enhancing loop detection and system robustness. To demonstrate the effectiveness and resilience of AVM-SLAM, we have released a specialized multi-sensor and high-resolution dataset of an underground garage, accessible at https://yale-cv.github.io/avm-slam_dataset, encouraging further exploration and validation of our approach within similar settings.

AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird's Eye View for Automated Valet Parking

TL;DR

AVM-SLAM tackles the challenging localization problem of automated valet parking in indoor garages by fusing data from four surround-view fisheye cameras, wheel encoders, and an IMU to build a semantic BEV map. The system introduces BEV flare removal to enhance road-marking segmentation and a semantic pre-qualification (SPQ) module to improve loop detection in repetitive textures, complemented by a two-layer mapping approach using submaps and a global map. A high-resolution underground garage dataset with synchronized multi-sensor data is released to benchmark and validate the method. Experimental results show improved semantic extraction, robust pose estimation, and higher mapping accuracy compared with traditional SLAM baselines, highlighting practical significance for AVP in challenging indoor environments.

Abstract

Accurate localization in challenging garage environments -- marked by poor lighting, sparse textures, repetitive structures, dynamic scenes, and the absence of GPS -- is crucial for automated valet parking (AVP) tasks. Addressing these challenges, our research introduces AVM-SLAM, a cutting-edge semantic visual SLAM architecture with multi-sensor fusion in a bird's eye view (BEV). This novel framework synergizes the capabilities of four fisheye cameras, wheel encoders, and an inertial measurement unit (IMU) to construct a robust SLAM system. Unique to our approach is the implementation of a flare removal technique within the BEV imagery, significantly enhancing road marking detection and semantic feature extraction by convolutional neural networks for superior mapping and localization. Our work also pioneers a semantic pre-qualification (SPQ) module, designed to adeptly handle the challenges posed by environments with repetitive textures, thereby enhancing loop detection and system robustness. To demonstrate the effectiveness and resilience of AVM-SLAM, we have released a specialized multi-sensor and high-resolution dataset of an underground garage, accessible at https://yale-cv.github.io/avm-slam_dataset, encouraging further exploration and validation of our approach within similar settings.
Paper Structure (18 sections, 4 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 4 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Semantic visual map of the garage built by our AVM-SLAM system. It fuses data from surround view cameras, wheel encoders and an IMU in a bird’s eye view.
  • Figure 2: The framework of the proposed AVM-SLAM system consists of two core modules: VIWFusion and Mapping. VIWFusion is a loosely multi-sensor weighted fusion front-end, while the Mapping module serves as a tightly integrated semantic mapping back-end. $w_1$ and $w_2$ are the fusion weights for the IMU and wheel odometry, respectively.
  • Figure 3: Flare removal and semantic segmentation.
  • Figure 4: Schematically of pose-graph with additional kinematic constraints.
  • Figure 5: Mapping results by VIWFusion, ordinary loop detection, SPQ loop detection & additional kinematic constraints, and flare removal.
  • ...and 1 more figures