Table of Contents
Fetching ...

Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models

Pierpaolo Serio, Giulio Pisaneschi, Andrea Dan Ryals, Vincenzo Infantino, Lorenzo Gentilini, Valentina Donzella, Lorenzo Pollini

TL;DR

This paper systematically compares 2-D LiDAR-to-image projections for place recognition using a state-of-the-art vision foundation model, isolating the effect of projection while keeping the backbone and aggregation fixed. It introduces a modular pipeline built around DINOv3 that evaluates BEV, polar, range image, and front-view representations with consistent channels, across KITTI, NCLT, HELILPR, and a challenging warehouse dataset. The results show that polar projection provides the most robust and discriminative descriptors across outdoor domains, with BEV as a strong secondary option, while range and front views lag in outdoor scenarios though can be useful in dense indoor environments. The work offers practical guidance for projection selection, demonstrates how lightweight projection-head tuning interacts with representation choice, and suggests directions for improvement such as projection-head co-design and ensemble approaches to further enhance real-time LiDAR-based place recognition.

Abstract

This work presents a systematic investigation into how alternative LiDAR-to-image projections affect metric place recognition when coupled with a state-of-the-art vision foundation model. We introduce a modular retrieval pipeline that controls for backbone, aggregation, and evaluation protocol, thereby isolating the influence of the 2-D projection itself. Using consistent geometric and structural channels across multiple datasets and deployment scenarios, we identify the projection characteristics that most strongly determine discriminative power, robustness to environmental variation, and suitability for real-time autonomy. Experiments with different datasets, including integration into an operational place recognition policy, validate the practical relevance of these findings and demonstrate that carefully designed projections can serve as an effective surrogate for end-to-end 3-D learning in LiDAR place recognition.

Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models

TL;DR

This paper systematically compares 2-D LiDAR-to-image projections for place recognition using a state-of-the-art vision foundation model, isolating the effect of projection while keeping the backbone and aggregation fixed. It introduces a modular pipeline built around DINOv3 that evaluates BEV, polar, range image, and front-view representations with consistent channels, across KITTI, NCLT, HELILPR, and a challenging warehouse dataset. The results show that polar projection provides the most robust and discriminative descriptors across outdoor domains, with BEV as a strong secondary option, while range and front views lag in outdoor scenarios though can be useful in dense indoor environments. The work offers practical guidance for projection selection, demonstrates how lightweight projection-head tuning interacts with representation choice, and suggests directions for improvement such as projection-head co-design and ensemble approaches to further enhance real-time LiDAR-based place recognition.

Abstract

This work presents a systematic investigation into how alternative LiDAR-to-image projections affect metric place recognition when coupled with a state-of-the-art vision foundation model. We introduce a modular retrieval pipeline that controls for backbone, aggregation, and evaluation protocol, thereby isolating the influence of the 2-D projection itself. Using consistent geometric and structural channels across multiple datasets and deployment scenarios, we identify the projection characteristics that most strongly determine discriminative power, robustness to environmental variation, and suitability for real-time autonomy. Experiments with different datasets, including integration into an operational place recognition policy, validate the practical relevance of these findings and demonstrate that carefully designed projections can serve as an effective surrogate for end-to-end 3-D learning in LiDAR place recognition.

Paper Structure

This paper contains 27 sections, 25 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Example of different representations obtained from a single pointcloud, channel information extraction and multi-channel image.
  • Figure 2: Trajectory followed by the vehicle in the Warehouse environment
  • Figure 3: The towing vehicle used in the Warehouse environment
  • Figure 4: Visual concept of the place recognition capabilities across three different sequences (one for each public dataset)
  • Figure 5: Precision recall for different representations and dataset