Table of Contents
Fetching ...

Stream Query Denoising for Vectorized HD Map Construction

Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

TL;DR

This work tackles the challenge of incorporating temporal information into streaming, vectorized HD-map construction. It introduces Stream Query Denoising (SQD), a training-time strategy that denoises the previous-frame ground truth to simulate stream-query predictions, enabling the model to learn temporal consistency for map elements. SQD comprises normal query denoising for curve perturbations and a dedicated stream denoising pathway with Adaptive Temporal Matching and Dynamic Query Noising, all guided by a joint loss that couples map predictions with denoising predictions. Empirical results on nuScenes and Argoverse2 show that SQD-MapNet surpasses prior streaming approaches across short and long perception ranges, with ablations highlighting the value of ATM and dynamic noise; the method significantly advances robust, temporally coherent HD-map construction for autonomous driving.

Abstract

To enhance perception performance in complex and extensive scenarios within the realm of autonomous driving, there has been a noteworthy focus on temporal modeling, with a particular emphasis on streaming methods. The prevailing trend in streaming models involves the utilization of stream queries for the propagation of temporal information. Despite the prevalence of this approach, the direct application of the streaming paradigm to the construction of vectorized high-definition maps (HD-maps) fails to fully harness the inherent potential of temporal information. This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction. SQD is designed to facilitate the learning of temporal consistency among map elements within the streaming model. The methodology involves denoising the queries that have been perturbed by the addition of noise to the ground-truth information from the preceding frame. This denoising process aims to reconstruct the ground-truth information for the current frame, thereby simulating the prediction process inherent in stream queries. The SQD strategy can be applied to those streaming methods (e.g., StreamMapNet) to enhance the temporal modeling. The proposed SQD-MapNet is the StreamMapNet equipped with SQD. Extensive experiments on nuScenes and Argoverse2 show that our method is remarkably superior to other existing methods across all settings of close range and long range. The code will be available soon.

Stream Query Denoising for Vectorized HD Map Construction

TL;DR

This work tackles the challenge of incorporating temporal information into streaming, vectorized HD-map construction. It introduces Stream Query Denoising (SQD), a training-time strategy that denoises the previous-frame ground truth to simulate stream-query predictions, enabling the model to learn temporal consistency for map elements. SQD comprises normal query denoising for curve perturbations and a dedicated stream denoising pathway with Adaptive Temporal Matching and Dynamic Query Noising, all guided by a joint loss that couples map predictions with denoising predictions. Empirical results on nuScenes and Argoverse2 show that SQD-MapNet surpasses prior streaming approaches across short and long perception ranges, with ablations highlighting the value of ATM and dynamic noise; the method significantly advances robust, temporally coherent HD-map construction for autonomous driving.

Abstract

To enhance perception performance in complex and extensive scenarios within the realm of autonomous driving, there has been a noteworthy focus on temporal modeling, with a particular emphasis on streaming methods. The prevailing trend in streaming models involves the utilization of stream queries for the propagation of temporal information. Despite the prevalence of this approach, the direct application of the streaming paradigm to the construction of vectorized high-definition maps (HD-maps) fails to fully harness the inherent potential of temporal information. This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction. SQD is designed to facilitate the learning of temporal consistency among map elements within the streaming model. The methodology involves denoising the queries that have been perturbed by the addition of noise to the ground-truth information from the preceding frame. This denoising process aims to reconstruct the ground-truth information for the current frame, thereby simulating the prediction process inherent in stream queries. The SQD strategy can be applied to those streaming methods (e.g., StreamMapNet) to enhance the temporal modeling. The proposed SQD-MapNet is the StreamMapNet equipped with SQD. Extensive experiments on nuScenes and Argoverse2 show that our method is remarkably superior to other existing methods across all settings of close range and long range. The code will be available soon.
Paper Structure (20 sections, 11 equations, 5 figures, 6 tables)

This paper contains 20 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: (a). The decoder process of StreamMapNet yuan2023streammapnet. (b). StreamMapNet with the proposed stream query denoising (SQD). Purple blocks are global map queries learned by the network, orange blocks represent the stream queries from memory cache, and green blocks are the noised queries generated by adding noise to the ground truth $G_{t-1}$ of previous frame $t-1$.
  • Figure 2: (a) shows the ground truth of previous frame. (b) is the transformation of (a) to current frame according to ego motion. (c) is the ground truth of current frame.
  • Figure 3: (a) shows the overall framework of SQD-MapNet. (b) and (c) are the specific implementations of adaptive temporal matching and dynamic query noising, respectively. $G_{t-1}$ is the ground-truth of the last frame $t-1$.
  • Figure 4: (a) is the original curve, which consists of a number of points and is surrounded by the minimum bounding rectangle. (b) and (c) shows box shifting and box scaling, respectively. The light-colored curves indicate the curves before noise addition.
  • Figure 5: Comparison with the single-frame model and StreamMapNet yuan2023streammapnet on qualitative visualization under different scenarios. In the HD-map, green lines denote road boundaries, red lines indicate lane-dividers, and blue lines denote pedestrian crossings.