Table of Contents
Fetching ...

InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

Kuang Wu, Chuan Yang, Zhanbin Li

TL;DR

InteractionMap addresses online vectorized HD map construction by enabling joint spatial-temporal interaction across frames. It introduces point-to-instance position relation embedding, a key-frame-based hierarchical temporal fusion, and geometry-aware alignment losses to harmonize semantic scores with localization accuracy. The framework leverages a BEV encoder, a relation map decoder, and an auxiliary instance segmentation branch to boost vectorized map quality, achieving state-of-the-art results on nuScenes and Argoverse2. The approach improves robustness to occlusion and long-range temporal challenges, enabling more reliable real-time HD-map construction for autonomous driving.

Abstract

Vectorized high-definition (HD) maps are essential for an autonomous driving system. Recently, state-of-the-art map vectorization methods are mainly based on DETR-like framework to generate HD maps in an end-to-end manner. In this paper, we propose InteractionMap, which improves previous map vectorization methods by fully leveraging local-to-global information interaction in both time and space. Firstly, we explore enhancing DETR-like detectors by explicit position relation prior from point-level to instance-level, since map elements contain strong shape priors. Secondly, we propose a key-frame-based hierarchical temporal fusion module, which interacts temporal information from local to global. Lastly, the separate classification branch and regression branch lead to the problem of misalignment in the output distribution. We interact semantic information with geometric information by introducing a novel geometric-aware classification loss in optimization and a geometric-aware matching cost in label assignment. InteractionMap achieves state-of-the-art performance on both nuScenes and Argoverse2 benchmarks.

InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

TL;DR

InteractionMap addresses online vectorized HD map construction by enabling joint spatial-temporal interaction across frames. It introduces point-to-instance position relation embedding, a key-frame-based hierarchical temporal fusion, and geometry-aware alignment losses to harmonize semantic scores with localization accuracy. The framework leverages a BEV encoder, a relation map decoder, and an auxiliary instance segmentation branch to boost vectorized map quality, achieving state-of-the-art results on nuScenes and Argoverse2. The approach improves robustness to occlusion and long-range temporal challenges, enabling more reliable real-time HD-map construction for autonomous driving.

Abstract

Vectorized high-definition (HD) maps are essential for an autonomous driving system. Recently, state-of-the-art map vectorization methods are mainly based on DETR-like framework to generate HD maps in an end-to-end manner. In this paper, we propose InteractionMap, which improves previous map vectorization methods by fully leveraging local-to-global information interaction in both time and space. Firstly, we explore enhancing DETR-like detectors by explicit position relation prior from point-level to instance-level, since map elements contain strong shape priors. Secondly, we propose a key-frame-based hierarchical temporal fusion module, which interacts temporal information from local to global. Lastly, the separate classification branch and regression branch lead to the problem of misalignment in the output distribution. We interact semantic information with geometric information by introducing a novel geometric-aware classification loss in optimization and a geometric-aware matching cost in label assignment. InteractionMap achieves state-of-the-art performance on both nuScenes and Argoverse2 benchmarks.

Paper Structure

This paper contains 27 sections, 24 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Visual comparison between MapTRv2 liao2023maptrv2 and our improved results. Our method effectively eliminates error map elements, leading to better precision and stability.
  • Figure 1: The visual results under the weather condition of cloudy.
  • Figure 2: Overview of InteractionMap framework. InteractionMap mainly consists of four components: (1) BEV encoder transforms sensor input to a unified BEV representation; (2) Key-frame-based temporal fusion module leverage temporal information from local to global; (3) Relation map decoder utilizes relation embedding in point-level and instance-level; (4) Geometry-aware alignment module is designed to solve the misalignment problem of classification and position output.
  • Figure 2: The visual results under the weather condition of cloudy.
  • Figure 3: Temporal fusion strategy.
  • ...and 6 more figures