PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction

Nan Peng; Xun Zhou; Mingming Wang; Xiaojun Yang; Songming Chen; Guisong Chen

PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction

Nan Peng, Xun Zhou, Mingming Wang, Xiaojun Yang, Songming Chen, Guisong Chen

TL;DR

PrevPredMap introduces a temporal modeling framework for online vectorized HD map construction by encoding previous predictions into queries via a dedicated previous-predictions-based query generator and a dynamic-position-query decoder. A dual-mode training strategy ensures robust performance in both single-frame and temporal modes, supported by an enhanced single-frame baseline with a memory-efficient group-wise one-to-many branch. On nuScenes and Argoverse2, PrevPredMap sets new state-of-the-art results and demonstrates favorable inference speed, with ablation analyses confirming the contributions of the core modules. The work suggests that high-level predictions can serve as compact temporal priors and points toward integrating map priors and longer histories for further gains.

Abstract

Temporal information is crucial for detecting occluded instances. Existing temporal representations have progressed from BEV or PV features to more compact query features. Compared to these aforementioned features, predictions offer the highest level of abstraction, providing explicit information. In the context of online vectorized HD map construction, this unique characteristic of predictions is potentially advantageous for long-term temporal modeling and the integration of map priors. This paper introduces PrevPredMap, a pioneering temporal modeling framework that leverages previous predictions for constructing online vectorized HD maps. We have meticulously crafted two essential modules for PrevPredMap: the previous-predictions-based query generator and the dynamic-position-query decoder. Specifically, the previous-predictions-based query generator is designed to separately encode different types of information from previous predictions, which are then effectively utilized by the dynamic-position-query decoder to generate current predictions. Furthermore, we have developed a dual-mode strategy to ensure PrevPredMap's robust performance across both single-frame and temporal modes. Extensive experiments demonstrate that PrevPredMap achieves state-of-the-art performance on the nuScenes and Argoverse2 datasets. Code will be available at https://github.com/pnnnnnnn/PrevPredMap.

PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 3 figures, 6 tables)

This paper contains 15 sections, 7 equations, 3 figures, 6 tables.

Introduction
Related Work
Online Vectorized HD Map Construction
Temporal Modeling of BEV Perception
Method
Overall Architecture
Previous-Predictions-Based Query Generator
Dynamic-Position-Query Decoder
An Enhanced Single-Frame Baseline
Experiment
Experimental Setup
Comparisons with State-of-the-art Methods
Ablation Study
Limitations and Future Work
Conclusion

Figures (3)

Figure 1: The simplified pipeline corresponds to various temporal representations, categorized as follows: (a) BEV features, (b) perspective features, (c) query features, and (d) predictions. Items highlighted in yellow represent the temporal modules.
Figure 2: (a) Overall architecture of the proposed PrevPredMap, consisting of three primary modules. The BEV feature extractor is a standard part to obtain BEV features from multi-view images. The previous-predictions-based query generator and the dynamic-position-query decoder are meticulously designed to effectively encode and utilize previous predictions for producing current predictions. (b) The dual-mode strategy of the previous-predictions-based query generator. (c) The dynamic update mechanism of the dynamic-position-query decoder. Yellow arrows indicate the generation of dynamic position queries based on location predictions of the preceding decoder layer.
Figure 3: Comparison of PrevPredMap with single-frame SOTA methods on qualitative visualization under various occlusion scenarios. Each sub-part displays four qualitative results: Ground Truth, MapTRv2, PrevPredMap Single-Frame, and PrevPredMap Temporal. The * indicates that MapTRv2 has been re-implemented with the number of instance queries set to 100. Green, orange and blue lines denote road boundaries, lane dividers and pedestrian crossings, respectively.

PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction

TL;DR

Abstract

PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)