Table of Contents
Fetching ...

MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction

Jonas Merkert, Alexander Blumberg, Jan-Hendrik Pauls, Christoph Stiller

TL;DR

This work focuses on improving the latent birds-eye-view (BEV) feature grid representation within a vectorized online HD map construction model by enforcing geospatial consistency between overlapping BEV feature grids as part of a contrastive loss function.

Abstract

Autonomous vehicles rely on map information to understand the world around them. However, the creation and maintenance of offline high-definition (HD) maps remains costly. A more scalable alternative lies in online HD map construction, which only requires map annotations at training time. To further reduce the need for annotating vast training labels, self-supervised training provides an alternative. This work focuses on improving the latent birds-eye-view (BEV) feature grid representation within a vectorized online HD map construction model by enforcing geospatial consistency between overlapping BEV feature grids as part of a contrastive loss function. To ensure geospatial overlap for contrastive pairs, we introduce an approach to analyze the overlap between traversals within a given dataset and generate subsidiary dataset splits following adjustable multi-traversal requirements. We train the same model supervised using a reduced set of single-traversal labeled data and self-supervised on a broader unlabeled set of data following our multi-traversal requirements, effectively implementing a semi-supervised approach. Our approach outperforms the supervised baseline across the board, both quantitatively in terms of the downstream tasks vectorized map perception performance and qualitatively in terms of segmentation in the principal component analysis (PCA) visualization of the BEV feature space.

MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction

TL;DR

This work focuses on improving the latent birds-eye-view (BEV) feature grid representation within a vectorized online HD map construction model by enforcing geospatial consistency between overlapping BEV feature grids as part of a contrastive loss function.

Abstract

Autonomous vehicles rely on map information to understand the world around them. However, the creation and maintenance of offline high-definition (HD) maps remains costly. A more scalable alternative lies in online HD map construction, which only requires map annotations at training time. To further reduce the need for annotating vast training labels, self-supervised training provides an alternative. This work focuses on improving the latent birds-eye-view (BEV) feature grid representation within a vectorized online HD map construction model by enforcing geospatial consistency between overlapping BEV feature grids as part of a contrastive loss function. To ensure geospatial overlap for contrastive pairs, we introduce an approach to analyze the overlap between traversals within a given dataset and generate subsidiary dataset splits following adjustable multi-traversal requirements. We train the same model supervised using a reduced set of single-traversal labeled data and self-supervised on a broader unlabeled set of data following our multi-traversal requirements, effectively implementing a semi-supervised approach. Our approach outperforms the supervised baseline across the board, both quantitatively in terms of the downstream tasks vectorized map perception performance and qualitatively in terms of segmentation in the principal component analysis (PCA) visualization of the BEV feature space.
Paper Structure (13 sections, 2 equations, 5 figures, 1 table)

This paper contains 13 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overlapping BEV feature grids visualized using PCA. In the background, we depict ground truth map labels with road boundaries, lane dividers, centerlines and pedestrian crossing. Both the geospatial consistency between overlapping BEV feature spaces as well as their alignment with the target road layout can be recognized.
  • Figure 2: Schematic overview of the semi-supervised learning pipeline. Data flows are shown for supervised (pink) and self-supervised (blue, orange) samples. $\mathcal{B}$ denotes bev grids with cell features $\mathbf{f}$ and embeddings $\mathbf{z}$ produced by projection head $h$. Losses $\mathcal{L}_{\text{sup}}$ and $\mathcal{L}_{\text{GCLR}}$ correspond to supervised and self-supervised branches, respectively.
  • Figure 3: Single- and multi-traversals within Argoverse 2: The histogram on the left shows the general distribution of intersecting drive logs. On the right, we visualize the geospatial distribution of single- (red) and multi-traversals (blue) within Miami.
  • Figure 4: Performance scaling and relative gains across increasing share of supervised data.
  • Figure 5: Qualitative visualization of PCA of the supervised baseline (middle) and our semi-supervised approach (right) with ground-truth labels in the background for reference. Input surround view on the left, 20 % supervised split.