Table of Contents
Fetching ...

Towards classification-based representation learning for place recognition on LiDAR scans

Maksim Konoplia, Dmitrii Khizbullin

TL;DR

The paper investigates a classification-based formulation for LiDAR-based place recognition by discretizing spatial locations across multiple NuScenes maps and training an encoder–decoder to predict location classes. A Masked Cross-Entropy loss is used to stabilize training, and predictions are evaluated via KNN search over a pre-indexed database of embeddings, enabling efficient retrieval. While results are competitive with some baselines, they lag behind contrastive-learning models, highlighting trade-offs between training stability, efficiency, and accuracy. The work also discusses data-splitting strategies, out-of-domain evaluation, and scalability considerations, underscoring the potential and challenges of classification-based localization for large-scale deployment.

Abstract

Place recognition is a crucial task in autonomous driving, allowing vehicles to determine their position using sensor data. While most existing methods rely on contrastive learning, we explore an alternative approach by framing place recognition as a multi-class classification problem. Our method assigns discrete location labels to LiDAR scans and trains an encoder-decoder model to classify each scan's position directly. We evaluate this approach on the NuScenes dataset and show that it achieves competitive performance compared to contrastive learning-based methods while offering advantages in training efficiency and stability.

Towards classification-based representation learning for place recognition on LiDAR scans

TL;DR

The paper investigates a classification-based formulation for LiDAR-based place recognition by discretizing spatial locations across multiple NuScenes maps and training an encoder–decoder to predict location classes. A Masked Cross-Entropy loss is used to stabilize training, and predictions are evaluated via KNN search over a pre-indexed database of embeddings, enabling efficient retrieval. While results are competitive with some baselines, they lag behind contrastive-learning models, highlighting trade-offs between training stability, efficiency, and accuracy. The work also discusses data-splitting strategies, out-of-domain evaluation, and scalability considerations, underscoring the potential and challenges of classification-based localization for large-scale deployment.

Abstract

Place recognition is a crucial task in autonomous driving, allowing vehicles to determine their position using sensor data. While most existing methods rely on contrastive learning, we explore an alternative approach by framing place recognition as a multi-class classification problem. Our method assigns discrete location labels to LiDAR scans and trains an encoder-decoder model to classify each scan's position directly. We evaluate this approach on the NuScenes dataset and show that it achieves competitive performance compared to contrastive learning-based methods while offering advantages in training efficiency and stability.

Paper Structure

This paper contains 34 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Visualization of time overlap among the four NuScenes maps. Black segments indicate sampling periods, while high-intensity colored segments denote intervals between the first and the last recording. Red bounding boxes indicate samples from validation subset.
  • Figure 2: Proposed method for the formation of class labels for location recognition. Black circles are locations where the scans were taken, the solid curved lines are the trajectory of the vehicle. Dark green boxes are non-empty cells. The 3-tuples are formed as X and Y integer coordinates of cells inside a map, extended with the map identifier M. The class labels are formed as a continuous numbering of all 3-tuples.
  • Figure 3: Model architecture: Training phase. The model consists of a PointNet++ backbone, a linear embedding layer, and a linear classification head that outputs class probabilities. Masked cross-entropy loss is used to train the network (see Section \ref{['Training setup']}). Inference phase: The frozen backbone extracts embeddings from the query point cloud, which are then used to perform a KNN search in the pre-indexed database to retrieve the closest matches.