AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction
Thomas Monninger, Md Zafar Anwar, Stanislaw Antol, Steffen Staab, Sihao Ding
TL;DR
AugMapNet tackles online vectorized HD map construction by enriching the latent BEV grid with dense spatial cues from a raster map while decoding vectorized map elements. It introduces latent BEV grid augmentation with gradient stopping to treat the raster-derived prior as immutable, and adds BEV processing CNN blocks to induce a more structured latent space. Empirical results on nuScenes and Argoverse2 show significant vector map gains, including strong improvements at longer perception ranges and successful transfer to the SQD-MapNet baseline, with latent-space analyses (PCA and mutual information) indicating closer alignment to ground-truth rasters. These findings demonstrate that integrating dense spatial supervision into BEV latent spaces can meaningfully enhance vectorized HD map construction for robust autonomous driving systems.
Abstract
Autonomous driving requires understanding infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird's-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, from this latent space, an intermediate raster map is predicted, providing dense spatial supervision but requiring post-processing into the desired vectorized form. More recent models directly derive infrastructure elements as polylines using vectorized map decoders, providing instance-level information. Our approach, Augmentation Map Network (AugMapNet), proposes latent BEV feature grid augmentation, a novel technique that significantly enhances the latent BEV representation. AugMapNet combines vector decoding and dense spatial supervision more effectively than existing architectures while remaining easy to integrate compared to other hybrid approaches. It additionally benefits from extra processing on its latent BEV features. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements on vectorized map prediction of up to 13.3% over the StreamMapNet baseline on 60 m range and greater improvements on larger ranges. We confirm transferability by applying our method to another baseline, SQD-MapNet, and find similar improvements. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet and shows the value of our novel concept beyond pure performance improvement. The code can be found at https://github.com/tmonnin/augmapnet
