Table of Contents
Fetching ...

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma, Yasutaka Furukawa

TL;DR

A vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time and makes benchmark contributions by improving processing code for existing datasets to produce consistent ground truth with temporal alignments.

Abstract

This paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time. Our method, MapTracker, accumulates a sensor stream into memory buffers of two latent representations: 1) Raster latents in the bird's-eye-view (BEV) space and 2) Vector latents over the road elements (i.e., pedestrian-crossings, lane-dividers, and road-boundaries). The approach borrows the query propagation paradigm from the tracking literature that explicitly associates tracked road elements from the previous frame to the current, while fusing a subset of memory latents selected with distance strides to further enhance temporal consistency. A vector latent is decoded to reconstruct the geometry of a road element. The paper further makes benchmark contributions by 1) Improving processing code for existing datasets to produce consistent ground truth with temporal alignments and 2) Augmenting existing mAP metrics with consistency checks. MapTracker significantly outperforms existing methods on both nuScenes and Agroverse2 datasets by over 8% and 19% on the conventional and the new consistency-aware metrics, respectively. The code and models are available on our project page: https://map-tracker.github.io.

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

TL;DR

A vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time and makes benchmark contributions by improving processing code for existing datasets to produce consistent ground truth with temporal alignments.

Abstract

This paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time. Our method, MapTracker, accumulates a sensor stream into memory buffers of two latent representations: 1) Raster latents in the bird's-eye-view (BEV) space and 2) Vector latents over the road elements (i.e., pedestrian-crossings, lane-dividers, and road-boundaries). The approach borrows the query propagation paradigm from the tracking literature that explicitly associates tracked road elements from the previous frame to the current, while fusing a subset of memory latents selected with distance strides to further enhance temporal consistency. A vector latent is decoded to reconstruct the geometry of a road element. The paper further makes benchmark contributions by 1) Improving processing code for existing datasets to produce consistent ground truth with temporal alignments and 2) Augmenting existing mAP metrics with consistency checks. MapTracker significantly outperforms existing methods on both nuScenes and Agroverse2 datasets by over 8% and 19% on the conventional and the new consistency-aware metrics, respectively. The code and models are available on our project page: https://map-tracker.github.io.
Paper Structure (28 sections, 6 equations, 12 figures, 8 tables, 3 algorithms)

This paper contains 28 sections, 6 equations, 12 figures, 8 tables, 3 algorithms.

Figures (12)

  • Figure 1: MapTracker produces high-quality and temporally consistent vector HD maps, which are progressively merged into a global vector HD map by a simple online algorithm. The current state-of-the-art methods, MapTRv2 liao2023maptrv2 and StreamMapNet yuan2024streammapnet, fail to produce consistent reconstructions, leading to very noisy global maps. The figure shows two challenging scenarios (cars are turning) from the nuScenescaesar2020nuscenes dataset.
  • Figure 2: (Top) The overall architecture of MapTracker. (Bottom) The close-up views of the BEV and the Vector fusion layers.
  • Figure 3: The architecture details of the BEV and the Vector modules. The BEV-related representations are in green, while the vector-related representations are in cyan. Details of the attention layers are described in §\ref{['sec:method']}.
  • Figure 4: Qualitative comparisons of the two representative baselines, MapTracker(Ours), and the ground truth. A simple online algorithm merges per-frame vector HD map reconstructions across a single drive-through into a global vector HD map. The top five examples are from nuScenes, while the bottom two are from Argoverse2.
  • Figure 5: Typical examples of problematic pedestrian crossing annotations in existing ground truth for nuScenes. (Top) MapTR's ground truth merges or splits nearby pedestrian crossings at the perception boundary, leading to temporal inconsistencies. (Bottom) StreamMapNet's ground truth does not have the above merge/split issue but sometimes fails to fuse small polygons (from raw annotations) into a global one.
  • ...and 7 more figures