Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars
Julieta Martinez, Sasha Doubov, Jack Fan, Ioan Andrei Bârsan, Shenlong Wang, Gellért Máttyus, Raquel Urtasun
TL;DR
Pit30M introduces a large-scale, multi-sensor benchmark for global localization in self-driving contexts, enabling sub-metre retrieval-based localization at city scale. The paper shows that simple convolutional backbones with pooling can match specialized retrieval architectures for both image and LiDAR data, with BEV-based LiDAR representations delivering top performance. Rich metadata, including weather, sun angle, and occlusion, enables nuanced analysis of failure modes and modality complementarity, while GPS-restricted retrieval reflects practical deployment considerations. Overall, Pit30M provides a scalable, diverse dataset and initial benchmarks that highlight strong cross-modal performance and point toward multi-sensor fusion as a promising direction for robust global localization.
Abstract
We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles. Towards this goal, we introduce Pit30M, a new image and LiDAR dataset with over 30 million frames, which is 10 to 100 times larger than those used in previous work. Pit30M is captured under diverse conditions (i.e., season, weather, time of the day, traffic), and provides accurate localization ground truth. We also automatically annotate our dataset with historical weather and astronomical data, as well as with image and LiDAR semantic segmentation as a proxy measure for occlusion. We benchmark multiple existing methods for image and LiDAR retrieval and, in the process, introduce a simple, yet effective convolutional network-based LiDAR retrieval method that is competitive with the state of the art. Our work provides, for the first time, a benchmark for sub-metre retrieval-based localization at city scale. The dataset, its Python SDK, as well as more information about the sensors, calibration, and metadata, are available on the project website: https://pit30m.github.io/
