Table of Contents
Fetching ...

Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction

Wenjie Gao, Jiawei Fu, Yanqing Shen, Haodong Jing, Shitao Chen, Nanning Zheng

TL;DR

This paper tackles the limitations of onboard sensors in HD map construction, particularly long-range perception and occlusion, by introducing satellite-map fusion as a complementary data source. It presents a hierarchical fusion framework with feature-level masked cross-attention and BEV-level alignment to seamlessly integrate satellite tiles with BEV features produced from onboard sensors, enabling improved HD map semantic segmentation and instance detection. A complementary satellite map dataset for nuScenes is released to facilitate research. Across three baseline methods, the proposed fusion approach delivers substantial gains, especially in long-range scenarios, underscoring the practical value of cloud-based auxiliary maps for autonomous driving tasks.

Abstract

High-definition (HD) maps play a crucial role in autonomous driving systems. Recent methods have attempted to construct HD maps in real-time using vehicle onboard sensors. Due to the inherent limitations of onboard sensors, which include sensitivity to detection range and susceptibility to occlusion by nearby vehicles, the performance of these methods significantly declines in complex scenarios and long-range detection tasks. In this paper, we explore a new perspective that boosts HD map construction through the use of satellite maps to complement onboard sensors. We initially generate the satellite map tiles for each sample in nuScenes and release a complementary dataset for further research. To enable better integration of satellite maps with existing methods, we propose a hierarchical fusion module, which includes feature-level fusion and BEV-level fusion. The feature-level fusion, composed of a mask generator and a masked cross-attention mechanism, is used to refine the features from onboard sensors. The BEV-level fusion mitigates the coordinate differences between features obtained from onboard sensors and satellite maps through an alignment module. The experimental results on the augmented nuScenes showcase the seamless integration of our module into three existing HD map construction methods. The satellite maps and our proposed module notably enhance their performance in both HD map semantic segmentation and instance detection tasks.

Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction

TL;DR

This paper tackles the limitations of onboard sensors in HD map construction, particularly long-range perception and occlusion, by introducing satellite-map fusion as a complementary data source. It presents a hierarchical fusion framework with feature-level masked cross-attention and BEV-level alignment to seamlessly integrate satellite tiles with BEV features produced from onboard sensors, enabling improved HD map semantic segmentation and instance detection. A complementary satellite map dataset for nuScenes is released to facilitate research. Across three baseline methods, the proposed fusion approach delivers substantial gains, especially in long-range scenarios, underscoring the practical value of cloud-based auxiliary maps for autonomous driving tasks.

Abstract

High-definition (HD) maps play a crucial role in autonomous driving systems. Recent methods have attempted to construct HD maps in real-time using vehicle onboard sensors. Due to the inherent limitations of onboard sensors, which include sensitivity to detection range and susceptibility to occlusion by nearby vehicles, the performance of these methods significantly declines in complex scenarios and long-range detection tasks. In this paper, we explore a new perspective that boosts HD map construction through the use of satellite maps to complement onboard sensors. We initially generate the satellite map tiles for each sample in nuScenes and release a complementary dataset for further research. To enable better integration of satellite maps with existing methods, we propose a hierarchical fusion module, which includes feature-level fusion and BEV-level fusion. The feature-level fusion, composed of a mask generator and a masked cross-attention mechanism, is used to refine the features from onboard sensors. The BEV-level fusion mitigates the coordinate differences between features obtained from onboard sensors and satellite maps through an alignment module. The experimental results on the augmented nuScenes showcase the seamless integration of our module into three existing HD map construction methods. The satellite maps and our proposed module notably enhance their performance in both HD map semantic segmentation and instance detection tasks.
Paper Structure (16 sections, 5 equations, 3 figures, 6 tables)

This paper contains 16 sections, 5 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: (a) Satellite maps provide comprehensive insights into the surrounding region. (b) The satellite map tile of ego location can be integrated into the current HD map construction pipeline to complement onboard sensors.
  • Figure 2: Framework overview. PE stands for patch embedding and position embedding, LE stands for linear embedding. The green arrows represent the information flow from onboard sensors. The red arrows represent the information flow from satellite maps. Our framework utilizes two branches to extract features from multi-view images and satellite map tiles, respectively. A hierarchical fusion module, comprising feature-level fusion and BEV-level fusion, is designed to fuse the two features. The final task head is used to generate the HD maps from the fused features.
  • Figure 3: Qualitative results of our method, where sat stands for satellite maps and GT stands for ground truth. After incorporating satellite maps, the model's performance is significantly improved in both complex scenarios and situations of occlusion by other vehicles. Moreover, the model exhibits stable enhancement in areas not covered by satellite maps.