Table of Contents
Fetching ...

SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations

Narayanan Elavathur Ranganatha, Hengyuan Zhang, Shashank Venkatramani, Jing-Yan Liao, Henrik I. Christensen

TL;DR

SemVecNet tackles the generalization problem in online vector map generation by introducing a modular pipeline that uses a BEV semantic map as an intermediate representation, decoupling sensor configurations from the final vector map. The semantic mapping stage fuses real-time camera and LiDAR data into an ego-centric BEV semantic grid, which is then vectorized by a MapTRv2-inspired decoder to produce labeled map elements. Cross-dataset experiments demonstrate substantially better transfer than state-of-the-art end-to-end approaches, and real-world campus data validates practical applicability without retraining. The approach reduces the need for extensive labeling and retraining when deploying across platforms with different sensor setups, moving toward sensor-configuration-agnostic autonomous systems.

Abstract

Vector maps are essential in autonomous driving for tasks like localization and planning, yet their creation and maintenance are notably costly. While recent advances in online vector map generation for autonomous vehicles are promising, current models lack adaptability to different sensor configurations. They tend to overfit to specific sensor poses, leading to decreased performance and higher retraining costs. This limitation hampers their practical use in real-world applications. In response to this challenge, we propose a modular pipeline for vector map generation with improved generalization to sensor configurations. The pipeline leverages probabilistic semantic mapping to generate a bird's-eye-view (BEV) semantic map as an intermediate representation. This intermediate representation is then converted to a vector map using the MapTRv2 decoder. By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance. We evaluate the model on datasets with sensor configurations not used during training. Our evaluation sets includes larger public datasets, and smaller scale private data collected on our platform. Our model generalizes significantly better than the state-of-the-art methods.

SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations

TL;DR

SemVecNet tackles the generalization problem in online vector map generation by introducing a modular pipeline that uses a BEV semantic map as an intermediate representation, decoupling sensor configurations from the final vector map. The semantic mapping stage fuses real-time camera and LiDAR data into an ego-centric BEV semantic grid, which is then vectorized by a MapTRv2-inspired decoder to produce labeled map elements. Cross-dataset experiments demonstrate substantially better transfer than state-of-the-art end-to-end approaches, and real-world campus data validates practical applicability without retraining. The approach reduces the need for extensive labeling and retraining when deploying across platforms with different sensor setups, moving toward sensor-configuration-agnostic autonomous systems.

Abstract

Vector maps are essential in autonomous driving for tasks like localization and planning, yet their creation and maintenance are notably costly. While recent advances in online vector map generation for autonomous vehicles are promising, current models lack adaptability to different sensor configurations. They tend to overfit to specific sensor poses, leading to decreased performance and higher retraining costs. This limitation hampers their practical use in real-world applications. In response to this challenge, we propose a modular pipeline for vector map generation with improved generalization to sensor configurations. The pipeline leverages probabilistic semantic mapping to generate a bird's-eye-view (BEV) semantic map as an intermediate representation. This intermediate representation is then converted to a vector map using the MapTRv2 decoder. By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance. We evaluate the model on datasets with sensor configurations not used during training. Our evaluation sets includes larger public datasets, and smaller scale private data collected on our platform. Our model generalizes significantly better than the state-of-the-art methods.
Paper Structure (14 sections, 6 equations, 6 figures, 4 tables)

This paper contains 14 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: State-of-the-art vector map generation models, for example, MapTRv2, perform well when trained and evaluated on the same dataset, but their performance degrades significantly when evaluated on a different dataset. Data labeling and retraining are required which limits their real-world application. SemVecNet shows significant improvements in performance transfer by leveraging the semantic mapping with more robust sensor configuration generalization as an intermediate representation.
  • Figure 2: SemVecNet takes camera images and LiDAR point cloud to generate a generalized sensor configuration BEV semantic map as an intermediate representation. The semantic map is vectorized into map elements such as centerlines, lane boundaries, crosswalks and road boundaries.
  • Figure 3: The diagrams (a), (b), and (c) display the significant configuration changes across platforms in sensor poses and number of sensors, for both LiDAR and cameras in NuScenes caesar_nuscenes_2020, Argoverse 2 Argoverse2 and AVL.
  • Figure 4: The qualitative result by directly inference from campus data. The images from top to bottom are satellite image esri, BEV semantic map, and vector map output from SemVecNet from the same region on UC San Diego campus.
  • Figure 5: Top row represents semantic map made with a single camera. The bottom row represents semantic maps made with six cameras. At the start of a log (first column), a lot of information is lost if some cameras are left out. The map by the end of the log (second column) ends up looking similar.
  • ...and 1 more figures