Table of Contents
Fetching ...

DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation

Deguo Xia, Weiming Zhang, Xiyan Liu, Wei Zhang, Chenting Gong, Jizhou Huang, Mengmeng Yang, Diange Yang

TL;DR

DuMapNet tackles city-scale lane-level map generation by turning BEV images into vectorized map elements and topology in an end-to-end framework. It introduces Contextual Prompts Encoder (CPE) and Group-wise Lane Prediction (GLP), plus a topology predictor to ensure global consistency across frames. The training uses a hierarchical matching scheme and multi-task losses, enabling end-to-end optimization without heavy post-processing. In production at Baidu Maps, it now serves over 360 cities with a 95% cost reduction, demonstrating strong practical impact.

Abstract

Generating city-scale lane-level maps faces significant challenges due to the intricate urban environments, such as blurred or absent lane markings. Additionally, a standard lane-level map requires a comprehensive organization of lane groupings, encompassing lane direction, style, boundary, and topology, yet has not been thoroughly examined in prior research. These obstacles result in labor-intensive human annotation and high maintenance costs. This paper overcomes these limitations and presents an industrial-grade solution named DuMapNet that outputs standardized, vectorized map elements and their topology in an end-to-end paradigm. To this end, we propose a group-wise lane prediction (GLP) system that outputs vectorized results of lane groups by meticulously tailoring a transformer-based network. Meanwhile, to enhance generalization in challenging scenarios, such as road wear and occlusions, as well as to improve global consistency, a contextual prompts encoder (CPE) module is proposed, which leverages the predicted results of spatial neighborhoods as contextual information. Extensive experiments conducted on large-scale real-world datasets demonstrate the superiority and effectiveness of DuMapNet. Additionally, DuMap-Net has already been deployed in production at Baidu Maps since June 2023, supporting lane-level map generation tasks for over 360 cities while bringing a 95% reduction in costs. This demonstrates that DuMapNet serves as a practical and cost-effective industrial solution for city-scale lane-level map generation.

DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation

TL;DR

DuMapNet tackles city-scale lane-level map generation by turning BEV images into vectorized map elements and topology in an end-to-end framework. It introduces Contextual Prompts Encoder (CPE) and Group-wise Lane Prediction (GLP), plus a topology predictor to ensure global consistency across frames. The training uses a hierarchical matching scheme and multi-task losses, enabling end-to-end optimization without heavy post-processing. In production at Baidu Maps, it now serves over 360 cities with a 95% cost reduction, demonstrating strong practical impact.

Abstract

Generating city-scale lane-level maps faces significant challenges due to the intricate urban environments, such as blurred or absent lane markings. Additionally, a standard lane-level map requires a comprehensive organization of lane groupings, encompassing lane direction, style, boundary, and topology, yet has not been thoroughly examined in prior research. These obstacles result in labor-intensive human annotation and high maintenance costs. This paper overcomes these limitations and presents an industrial-grade solution named DuMapNet that outputs standardized, vectorized map elements and their topology in an end-to-end paradigm. To this end, we propose a group-wise lane prediction (GLP) system that outputs vectorized results of lane groups by meticulously tailoring a transformer-based network. Meanwhile, to enhance generalization in challenging scenarios, such as road wear and occlusions, as well as to improve global consistency, a contextual prompts encoder (CPE) module is proposed, which leverages the predicted results of spatial neighborhoods as contextual information. Extensive experiments conducted on large-scale real-world datasets demonstrate the superiority and effectiveness of DuMapNet. Additionally, DuMap-Net has already been deployed in production at Baidu Maps since June 2023, supporting lane-level map generation tasks for over 360 cities while bringing a 95% reduction in costs. This demonstrates that DuMapNet serves as a practical and cost-effective industrial solution for city-scale lane-level map generation.
Paper Structure (18 sections, 7 equations, 6 figures, 5 tables)

This paper contains 18 sections, 7 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: DuMapNet introduces a learning-based methodology for lane-level map vectorization. Our proposed method incorporates a scheme based on contextual prompts and components dedicated to group-wise lane prediction. With these advancements, DuMapNet achieves cost-effective generation of city-scale vectorized maps and significantly supports various applications, such as lane-level navigation in Baidu Maps.
  • Figure 2: Overall architecture of DuMapNet. DuMapNet processes the entire city-scale land area using a sliding window approach. For each local area, an image encoder is utilized to extract image features from the BEV image. Meanwhile, we propose a novel Contextual Prompts Encoder (CPE) to encode the predictions of adjacent scanned areas. To achieve Group-wise Lane Prediction (GLP), we meticulously tailor key network components, including the query, decoder, and prediction heads. Consequently, the network is capable of generating a vectorized map, which encompasses vectorized elements and their topology. Additionally, two auxiliary predictions are generated: the use of group polygons aids in the organization of lane groups, while foreground segmentation enhances lane point localization. For detailed illustrations, please refer to Section \ref{['section:dumapnet']}.
  • Figure 3: Topology prediction. The topology matrix is produced as an additional output of the decoder to indicate the connections between $N_{ins}$ element instances in the current land area and $M_{ins}$ element instances in the contextual land areas.
  • Figure 4: Comparisons of our method with state-of-the-art models in lane-level map generation.
  • Figure 5: Qualitative visualization of the proposed group-guided supervision.
  • ...and 1 more figures