Table of Contents
Fetching ...

Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars

Julieta Martinez, Sasha Doubov, Jack Fan, Ioan Andrei Bârsan, Shenlong Wang, Gellért Máttyus, Raquel Urtasun

TL;DR

Pit30M introduces a large-scale, multi-sensor benchmark for global localization in self-driving contexts, enabling sub-metre retrieval-based localization at city scale. The paper shows that simple convolutional backbones with pooling can match specialized retrieval architectures for both image and LiDAR data, with BEV-based LiDAR representations delivering top performance. Rich metadata, including weather, sun angle, and occlusion, enables nuanced analysis of failure modes and modality complementarity, while GPS-restricted retrieval reflects practical deployment considerations. Overall, Pit30M provides a scalable, diverse dataset and initial benchmarks that highlight strong cross-modal performance and point toward multi-sensor fusion as a promising direction for robust global localization.

Abstract

We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles. Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic), and provides accurate localization ground truth. We also automatically annotate our dataset with historical weather and astronomical data, as well as with image and LiDAR semantic segmentation as a proxy measure for occlusion. We benchmark multiple existing methods for image and LiDAR retrieval and, in the process, introduce a simple, yet effective convolutional network-based LiDAR retrieval method that is competitive with the state of the art. Our work provides, for the first time, a benchmark for sub-metre retrieval-based localization at city scale. The dataset, its Python SDK, as well as more information about the sensors, calibration, and metadata, are available on the project website: https://pit30m.github.io/

Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars

TL;DR

Pit30M introduces a large-scale, multi-sensor benchmark for global localization in self-driving contexts, enabling sub-metre retrieval-based localization at city scale. The paper shows that simple convolutional backbones with pooling can match specialized retrieval architectures for both image and LiDAR data, with BEV-based LiDAR representations delivering top performance. Rich metadata, including weather, sun angle, and occlusion, enables nuanced analysis of failure modes and modality complementarity, while GPS-restricted retrieval reflects practical deployment considerations. Overall, Pit30M provides a scalable, diverse dataset and initial benchmarks that highlight strong cross-modal performance and point toward multi-sensor fusion as a promising direction for robust global localization.

Abstract

We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles. Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic), and provides accurate localization ground truth. We also automatically annotate our dataset with historical weather and astronomical data, as well as with image and LiDAR semantic segmentation as a proxy measure for occlusion. We benchmark multiple existing methods for image and LiDAR retrieval and, in the process, introduce a simple, yet effective convolutional network-based LiDAR retrieval method that is competitive with the state of the art. Our work provides, for the first time, a benchmark for sub-metre retrieval-based localization at city scale. The dataset, its Python SDK, as well as more information about the sensors, calibration, and metadata, are available on the project website: https://pit30m.github.io/

Paper Structure

This paper contains 18 sections, 2 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Our new localization dataset, Pit30M. Left: Each square is 1 km$^2$, for a total area of about 50 km$^2$ plus over 20 km of highway in the Pittsburgh Metropolitan Area. Right: Examples of image and LiDAR point clouds taken in the same place at different times.
  • Figure 2: Probability density functions (PDFs) for metadata in Pit30M. For a complete description of these tags, please refer to the Appendix.
  • Figure 3: LiDAR representations benchmarked in this work. (a) Raw point cloud (not used by any method). (b) Point cloud after ground plane removal and downsampling to 4 096 points pointnetvladzhang2019pcanlpdnet. (c) BEV voxelization with intensities. We use the latter as input to CNNs.
  • Figure 4: Performance of retrieval-based methods. Left: Image retrieval results. Right: LiDAR retrieval results.
  • Figure 5: Qualitative results under exhaustive search. Left: Query. Middle: Image retrieval method. Right: LiDAR retrieval methods.
  • ...and 9 more figures