Table of Contents
Fetching ...

MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert

Dapeng Zhang, Dayu Chen, Peng Zhi, Yinda Chen, Zhenlong Yuan, Chenyang Li, Sunjing, Rui Zhou, Qingguo Zhou

TL;DR

MapExpert addresses the challenge of online HD map construction for autonomous driving by introducing sparse map element experts that specializedly model non-cubic map elements, coupled with a Learnable Weighted Moving Descent mechanism to fuse current and historical BEV information without letting history overwhelm the present frame. The sparse expert transformer layer uses routers to select a small subset of expert blocks tailored to lane dividers, pedestrian crossings, and road boundaries, along with an auxiliary expert balance loss to ensure even load across experts. The architecture also includes a refined slice head to improve geometric regression. Through extensive experiments on nuScenes and Argoverse2, MapExpert achieves state-of-the-art performance, with substantial gains over prior methods and robust ablations validating each component, demonstrating practical potential for real-time HD map construction in autonomous systems.

Abstract

Constructing online High-Definition (HD) maps is crucial for the static environment perception of autonomous driving systems (ADS). Existing solutions typically attempt to detect vectorized HD map elements with unified models; however, these methods often overlook the distinct characteristics of different non-cubic map elements, making accurate distinction challenging. To address these issues, we introduce an expert-based online HD map method, termed MapExpert. MapExpert utilizes sparse experts, distributed by our routers, to describe various non-cubic map elements accurately. Additionally, we propose an auxiliary balance loss function to distribute the load evenly across experts. Furthermore, we theoretically analyze the limitations of prevalent bird's-eye view (BEV) feature temporal fusion methods and introduce an efficient temporal fusion module called Learnable Weighted Moving Descentage. This module effectively integrates relevant historical information into the final BEV features. Combined with an enhanced slice head branch, the proposed MapExpert achieves state-of-the-art performance and maintains good efficiency on both nuScenes and Argoverse2 datasets.

MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert

TL;DR

MapExpert addresses the challenge of online HD map construction for autonomous driving by introducing sparse map element experts that specializedly model non-cubic map elements, coupled with a Learnable Weighted Moving Descent mechanism to fuse current and historical BEV information without letting history overwhelm the present frame. The sparse expert transformer layer uses routers to select a small subset of expert blocks tailored to lane dividers, pedestrian crossings, and road boundaries, along with an auxiliary expert balance loss to ensure even load across experts. The architecture also includes a refined slice head to improve geometric regression. Through extensive experiments on nuScenes and Argoverse2, MapExpert achieves state-of-the-art performance, with substantial gains over prior methods and robust ablations validating each component, demonstrating practical potential for real-time HD map construction in autonomous systems.

Abstract

Constructing online High-Definition (HD) maps is crucial for the static environment perception of autonomous driving systems (ADS). Existing solutions typically attempt to detect vectorized HD map elements with unified models; however, these methods often overlook the distinct characteristics of different non-cubic map elements, making accurate distinction challenging. To address these issues, we introduce an expert-based online HD map method, termed MapExpert. MapExpert utilizes sparse experts, distributed by our routers, to describe various non-cubic map elements accurately. Additionally, we propose an auxiliary balance loss function to distribute the load evenly across experts. Furthermore, we theoretically analyze the limitations of prevalent bird's-eye view (BEV) feature temporal fusion methods and introduce an efficient temporal fusion module called Learnable Weighted Moving Descentage. This module effectively integrates relevant historical information into the final BEV features. Combined with an enhanced slice head branch, the proposed MapExpert achieves state-of-the-art performance and maintains good efficiency on both nuScenes and Argoverse2 datasets.

Paper Structure

This paper contains 43 sections, 11 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Overview of our newly introduced MapExpert: (a) The pipeline of MapExpert, consisting of our BEV feature encoder, learnable Weighted Moving Descent (LWMD), and sparse expert decoder. This pipeline processes surrounding images as input and generates vectorized map elements in an end-to-end module. (b) The detailed process of the Learnable Weighted Moving Descent, which extracts critical information from previous BEV features and enhances the representation of BEV HD map elements. (c) The structure of our unique sparse expert transformer layer is designed to effectively extract features of various map elements, such as lane dividers, pedestrian crossings, and road boundaries.
  • Figure 2: Topology of map elements (red: lane dividers, blue: pedestrian crossings, green: road boundaries). Differ from detection objects, which are typically cube-shaped, map elements are non-cubic and can take various shapes.
  • Figure 3: Ablation studies on the expert quota, evaluated on the original nuScenes split dataset.
  • Figure 4: Proportion of queries assigned to each expert across different frames from nuScenes dataset, separated by whether the expert was selected as first or second choice, or either.
  • Figure 5: Additional qualitative results on the nuScenes dataset.
  • ...and 2 more figures