Table of Contents
Fetching ...

Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction

Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu

TL;DR

This paper tackles the generalization gap in online HD map vectorization by proposing UIGenMap, an uncertainty-instructed framework that injects explicit PV structure into BEV map decoding. It combines an uncertainty-aware UA-Decoder with probabilistic attention and per-point uncertainty outputs, a UI2DPrompt module that builds PV-based prompts from PV detections, and a lightweight Mimic Query Distillation (MQ-Distillation) to enable real-time inference by substituting PV prompts with mimic queries. Through geo-based partitions on nuScenes and Argoverse2, UIGenMap achieves state-of-the-art gains (e.g., +5.7 mAP region-based on nuScenes, +4.3 mAP city-based on nuScenes; 60.4 mAP region-based on Argoverse2) and demonstrates robust generalization to unfamiliar driving scenes. The approach offers practical impact for robust HD map construction in autonomous driving by enhancing generalization while maintaining real-time performance, and is complemented by open-source code.

Abstract

Reliable high-definition (HD) map construction is crucial for the driving safety of autonomous vehicles. Although recent studies demonstrate improved performance, their generalization capability across unfamiliar driving scenes remains unexplored. To tackle this issue, we propose UIGenMap, an uncertainty-instructed structure injection approach for generalizable HD map vectorization, which concerns the uncertainty resampling in statistical distribution and employs explicit instance features to reduce excessive reliance on training data. Specifically, we introduce the perspective-view (PV) detection branch to obtain explicit structural features, in which the uncertainty-aware decoder is designed to dynamically sample probability distributions considering the difference in scenes. With probabilistic embedding and selection, UI2DPrompt is proposed to construct PV-learnable prompts. These PV prompts are integrated into the map decoder by designed hybrid injection to compensate for neglected instance structures. To ensure real-time inference, a lightweight Mimic Query Distillation is designed to learn from PV prompts, which can serve as an efficient alternative to the flow of PV branches. Extensive experiments on challenging geographically disjoint (geo-based) data splits demonstrate that our UIGenMap achieves superior performance, with +5.7 mAP improvement on the nuScenes dataset. Source code will be available at https://github.com/xiaolul2/UIGenMap.

Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction

TL;DR

This paper tackles the generalization gap in online HD map vectorization by proposing UIGenMap, an uncertainty-instructed framework that injects explicit PV structure into BEV map decoding. It combines an uncertainty-aware UA-Decoder with probabilistic attention and per-point uncertainty outputs, a UI2DPrompt module that builds PV-based prompts from PV detections, and a lightweight Mimic Query Distillation (MQ-Distillation) to enable real-time inference by substituting PV prompts with mimic queries. Through geo-based partitions on nuScenes and Argoverse2, UIGenMap achieves state-of-the-art gains (e.g., +5.7 mAP region-based on nuScenes, +4.3 mAP city-based on nuScenes; 60.4 mAP region-based on Argoverse2) and demonstrates robust generalization to unfamiliar driving scenes. The approach offers practical impact for robust HD map construction in autonomous driving by enhancing generalization while maintaining real-time performance, and is complemented by open-source code.

Abstract

Reliable high-definition (HD) map construction is crucial for the driving safety of autonomous vehicles. Although recent studies demonstrate improved performance, their generalization capability across unfamiliar driving scenes remains unexplored. To tackle this issue, we propose UIGenMap, an uncertainty-instructed structure injection approach for generalizable HD map vectorization, which concerns the uncertainty resampling in statistical distribution and employs explicit instance features to reduce excessive reliance on training data. Specifically, we introduce the perspective-view (PV) detection branch to obtain explicit structural features, in which the uncertainty-aware decoder is designed to dynamically sample probability distributions considering the difference in scenes. With probabilistic embedding and selection, UI2DPrompt is proposed to construct PV-learnable prompts. These PV prompts are integrated into the map decoder by designed hybrid injection to compensate for neglected instance structures. To ensure real-time inference, a lightweight Mimic Query Distillation is designed to learn from PV prompts, which can serve as an efficient alternative to the flow of PV branches. Extensive experiments on challenging geographically disjoint (geo-based) data splits demonstrate that our UIGenMap achieves superior performance, with +5.7 mAP improvement on the nuScenes dataset. Source code will be available at https://github.com/xiaolul2/UIGenMap.

Paper Structure

This paper contains 23 sections, 13 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: (a) Comparisons on original and different geo-based data partitions among previous methods and ours. (b) Visual comparisons with circles on BEV map to represent the learned uncertainties, in which PV instances serve as compensation and provide a detailed understanding of map perception with generalization.
  • Figure 2: (a) Overview of Our UIGenMap. For training, the PV branch is introduced with the uncertainty-instructed structural injection, in which the MQ-Distillation is designed to mimic PV structural features. (b) Uncertainty-Aware Decoder Architecture. For each layer, UA-Decoder comprises probabilistic UA-Attention and UA-Head for reliable output. (c) UI2DPrompt Design. We construct PV prompts from PV-detected elements and their corresponding uncertainties, which are integrated with the main branch by hybrid injection.
  • Figure 3: UA-Attention Design. The learned probabilistic weights $\alpha_i$ are dynamically resampled from Gaussian distribution with stochastic disturbance, which is multiplied with sampled feature values to update map queries.
  • Figure 4: Design of Hybrid Injection. P2BEV-Attention and P2Q-Attention are designed for PV prompts to integrate with BEV features and map instance queries as structural compensation.
  • Figure 5: Qualitative results on the nuScenes dataset with the region-based data partition. Uncertainty outputs are represented as circles.
  • ...and 5 more figures