Table of Contents
Fetching ...

FlashMix: Fast Map-Free LiDAR Localization via Feature Mixing and Contrastive-Constrained Accelerated Training

Raktim Gautam Goswami, Naman Patel, Prashanth Krishnamurthy, Farshad Khorrami

TL;DR

This work proposes FlashMix, which uses a frozen, scene-agnostic backbone to extract local point descriptors, aggregated with an MLP mixer to predict sensor pose, and demonstrates its effectiveness for rapid and accurate LiDAR localization in real-world scenarios.

Abstract

Map-free LiDAR localization systems accurately localize within known environments by predicting sensor position and orientation directly from raw point clouds, eliminating the need for large maps and descriptors. However, their long training times hinder rapid adaptation to new environments. To address this, we propose FlashMix, which uses a frozen, scene-agnostic backbone to extract local point descriptors, aggregated with an MLP mixer to predict sensor pose. A buffer of local descriptors is used to accelerate training by orders of magnitude, combined with metric learning or contrastive loss regularization of aggregated descriptors to improve performance and convergence. We evaluate FlashMix on various LiDAR localization benchmarks, examining different regularizations and aggregators, demonstrating its effectiveness for rapid and accurate LiDAR localization in real-world scenarios. The code is available at https://github.com/raktimgg/FlashMix.

FlashMix: Fast Map-Free LiDAR Localization via Feature Mixing and Contrastive-Constrained Accelerated Training

TL;DR

This work proposes FlashMix, which uses a frozen, scene-agnostic backbone to extract local point descriptors, aggregated with an MLP mixer to predict sensor pose, and demonstrates its effectiveness for rapid and accurate LiDAR localization in real-world scenarios.

Abstract

Map-free LiDAR localization systems accurately localize within known environments by predicting sensor position and orientation directly from raw point clouds, eliminating the need for large maps and descriptors. However, their long training times hinder rapid adaptation to new environments. To address this, we propose FlashMix, which uses a frozen, scene-agnostic backbone to extract local point descriptors, aggregated with an MLP mixer to predict sensor pose. A buffer of local descriptors is used to accelerate training by orders of magnitude, combined with metric learning or contrastive loss regularization of aggregated descriptors to improve performance and convergence. We evaluate FlashMix on various LiDAR localization benchmarks, examining different regularizations and aggregators, demonstrating its effectiveness for rapid and accurate LiDAR localization in real-world scenarios. The code is available at https://github.com/raktimgg/FlashMix.
Paper Structure (23 sections, 9 equations, 5 figures, 12 tables)

This paper contains 23 sections, 9 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Comparision of LiDAR pose regression-based framework (top) with our fast map-free LiDAR localization system.
  • Figure 2: FlashMix framework: A scene-agnostic backbone extracts local descriptors from farthest point sampled point clouds to store in a training buffer. An MLP Mixer and global average pooled aggregate descriptor predicts pose from trained pose and contrastive loss.
  • Figure 3: MLP-Mixer Aggregator that fuses local descriptor using point and channel mixing MLPs followed by average pooling.
  • Figure 4: Analysis of relocalization rate as a function of train time.
  • Figure 5: Visualization of different methods on test trajectories from Oxford-Radar, DCC, and vReLoC dataset. Trajectory visualization: The ground truth and estimated positions are shown in dark blue and red dots, respectively. The star shows the starting position.