Table of Contents
Fetching ...

UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data

Yujian Yuan, Changjie Wu, Xinyuan Chang, Sijin Wang, Hang Zhang, Shiyi Liang, Shuang Zeng, Mu Xu, Ning Guo

TL;DR

UniMapGen tackles large-scale lane-level map construction from multi-modal data by reframing it as iterative generative vector prediction. It serializes map vectors into discrete tokens, uses an LLM-based backbone with BEV, PV, and text prompts, and employs a state update strategy to ensure global continuity. The method achieves state-of-the-art results on OpenSatMap and can infer occluded roads and annotate missing roads, with robust ablations showing benefits from multi-modality, equal-distance serialization, and the state-update mechanism. This framework offers a flexible, scalable path toward up-to-date, lane-accurate vector maps in autonomous driving and navigation.

Abstract

Large-scale map construction plays a vital role in applications like autonomous driving and navigation systems. Traditional large-scale map construction approaches mainly rely on costly and inefficient special data collection vehicles and labor-intensive annotation processes. While existing satellite-based methods have demonstrated promising potential in enhancing the efficiency and coverage of map construction, they exhibit two major limitations: (1) inherent drawbacks of satellite data (e.g., occlusions, outdatedness) and (2) inefficient vectorization from perception-based methods, resulting in discontinuous and rough roads that require extensive post-processing. This paper presents a novel generative framework, UniMapGen, for large-scale map construction, offering three key innovations: (1) representing lane lines as \textbf{discrete sequence} and establishing an iterative strategy to generate more complete and smooth map vectors than traditional perception-based methods. (2) proposing a flexible architecture that supports \textbf{multi-modal} inputs, enabling dynamic selection among BEV, PV, and text prompt, to overcome the drawbacks of satellite data. (3) developing a \textbf{state update} strategy for global continuity and consistency of the constructed large-scale map. UniMapGen achieves state-of-the-art performance on the OpenSatMap dataset. Furthermore, UniMapGen can infer occluded roads and predict roads missing from dataset annotations. Our code will be released.

UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data

TL;DR

UniMapGen tackles large-scale lane-level map construction from multi-modal data by reframing it as iterative generative vector prediction. It serializes map vectors into discrete tokens, uses an LLM-based backbone with BEV, PV, and text prompts, and employs a state update strategy to ensure global continuity. The method achieves state-of-the-art results on OpenSatMap and can infer occluded roads and annotate missing roads, with robust ablations showing benefits from multi-modality, equal-distance serialization, and the state-update mechanism. This framework offers a flexible, scalable path toward up-to-date, lane-accurate vector maps in autonomous driving and navigation.

Abstract

Large-scale map construction plays a vital role in applications like autonomous driving and navigation systems. Traditional large-scale map construction approaches mainly rely on costly and inefficient special data collection vehicles and labor-intensive annotation processes. While existing satellite-based methods have demonstrated promising potential in enhancing the efficiency and coverage of map construction, they exhibit two major limitations: (1) inherent drawbacks of satellite data (e.g., occlusions, outdatedness) and (2) inefficient vectorization from perception-based methods, resulting in discontinuous and rough roads that require extensive post-processing. This paper presents a novel generative framework, UniMapGen, for large-scale map construction, offering three key innovations: (1) representing lane lines as \textbf{discrete sequence} and establishing an iterative strategy to generate more complete and smooth map vectors than traditional perception-based methods. (2) proposing a flexible architecture that supports \textbf{multi-modal} inputs, enabling dynamic selection among BEV, PV, and text prompt, to overcome the drawbacks of satellite data. (3) developing a \textbf{state update} strategy for global continuity and consistency of the constructed large-scale map. UniMapGen achieves state-of-the-art performance on the OpenSatMap dataset. Furthermore, UniMapGen can infer occluded roads and predict roads missing from dataset annotations. Our code will be released.

Paper Structure

This paper contains 20 sections, 6 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Methods and challenges in large-scale map construction. (Top) Previous segmentation methods process image patches separately, causing incomplete and discontinuous lines. (Bottom) UniMapGen uses flexible multi-modal inputs to construct complete and continuous maps, overcoming satellite challenges including occlusion, outdateness, and incomplete annotation.
  • Figure 2: Overview. (a) Model Architecture: UniMapGen supports multi-modal data inputs, including BEV, PV, text, and maps. (b) Map Serialization: we apply equidistant sampling to the raw map vectors, and then reorder them in the specified order. Finally, they are converted into special tokens. (c) State Update: we propose a state update strategy to incrementally construct large-scale maps. This process requires no post-processing, yielding smooth and connected outputs.
  • Figure 3: Examples of data augmentation, including overlapped crop and inclined crop with rotation.
  • Figure 4: Qualitative results of UniMapGen. (a) Comparison with SOTA. Different color refers to different line instances. (b) Ablation on State Update. The black lines are the patch edges. (c) BEV and PV map construction. PV provides up-to-date (purple) and complementary (red) road information. The line with purple circles is worn out or outdated in BEV image but clear in PV frames. (d) PV-based map construction. (e) Occluded road construction even without PV frames. (f) UniMapGen generates target maps given text prompts. (g) Global constructed map (missing intersection due to OpenSatMap annotation).
  • Figure 5: Complex examples generated by UniMapGen. The samples come from different cities.
  • ...and 5 more figures