Table of Contents
Fetching ...

Submodular Optimization for Keyframe Selection & Usage in SLAM

David Thorne, Nathan Chan, Yanlong Ma, Christa S. Robison, Philip R. Osteen, Brett T. Lopez

TL;DR

This paper addresses the challenge of memory- and compute-efficient LiDAR SLAM by proposing online keyframe selection and submap generation guided by submodular optimization, enabled by a neural descriptor for point-cloud similarity. It defines three coupled components: (i) a keyframe selection strategy using a neural descriptor-based diversity objective with a provable submodular structure, (ii) a submap generation method that maximizes the minimum Hessian eigenvalue to constrain scan alignment, and (iii) a streaming map summarization approach that yields size-constrained summaries in one pass. The results show substantial savings in keyframe counts and memory, improved per-scan computation times, and effective map summarization without compromising localization performance, demonstrated on long-range UAV/train-like loops and ARL Graces Quarters datasets. The methods enable scalable SLAM with on-the-fly and offline map sharing capabilities, driven by submodular theory and neural descriptor-based similarity.

Abstract

Keyframes are LiDAR scans saved for future reference in Simultaneous Localization And Mapping (SLAM), but despite their central importance most algorithms leave choices of which scans to save and how to use them to wasteful heuristics. This work proposes two novel keyframe selection strategies for localization and map summarization, as well as a novel approach to submap generation which selects keyframes that best constrain localization. Our results show that online keyframe selection and submap generation reduce the number of saved keyframes and improve per scan computation time without compromising localization performance. We also present a map summarization feature for quickly capturing environments under strict map size constraints.

Submodular Optimization for Keyframe Selection & Usage in SLAM

TL;DR

This paper addresses the challenge of memory- and compute-efficient LiDAR SLAM by proposing online keyframe selection and submap generation guided by submodular optimization, enabled by a neural descriptor for point-cloud similarity. It defines three coupled components: (i) a keyframe selection strategy using a neural descriptor-based diversity objective with a provable submodular structure, (ii) a submap generation method that maximizes the minimum Hessian eigenvalue to constrain scan alignment, and (iii) a streaming map summarization approach that yields size-constrained summaries in one pass. The results show substantial savings in keyframe counts and memory, improved per-scan computation times, and effective map summarization without compromising localization performance, demonstrated on long-range UAV/train-like loops and ARL Graces Quarters datasets. The methods enable scalable SLAM with on-the-fly and offline map sharing capabilities, driven by submodular theory and neural descriptor-based similarity.

Abstract

Keyframes are LiDAR scans saved for future reference in Simultaneous Localization And Mapping (SLAM), but despite their central importance most algorithms leave choices of which scans to save and how to use them to wasteful heuristics. This work proposes two novel keyframe selection strategies for localization and map summarization, as well as a novel approach to submap generation which selects keyframes that best constrain localization. Our results show that online keyframe selection and submap generation reduce the number of saved keyframes and improve per scan computation time without compromising localization performance. We also present a map summarization feature for quickly capturing environments under strict map size constraints.
Paper Structure (10 sections, 3 theorems, 11 equations, 8 figures)

This paper contains 10 sections, 3 theorems, 11 equations, 8 figures.

Key Result

Proposition 1

The set function $f: \mathcal{K} \rightarrow \mathbb{R}$ in eq:keyframe_id_obj is a non-decreasing monotone submodular function.

Figures (8)

  • Figure 1: (Left) Dense point cloud map of 2.3km forest loop with smaller summary maps. The dense map was generated during online SLAM and uses 456 keyframes as opposed to the summary maps which use 300 and 75 keyframes and are built in under one second. (Top-right) Modified Clearpath Robotics Warthog robot with LiDAR sensor used for collecting the forest loop dataset. (Bottom-right) Unique keyframe selection, each keyframe (axis) is chosen to capture a unique part of the map.
  • Figure 2: Global descriptor generation. The point cloud is projected into a range image (gray) and fed into the local features extraction module (orange) which consists of convolutional neural networks and a multi-head self-attention block. The NetVLAD module (blue) pools all the local features to generate the global descriptor.
  • Figure 3: Map summaries for 6.2 km Mout Water Dataset from Graces Quarters. Left figure shows the full keyframe map using 527 keyframes and 186 MB. Right shows the summary map using 300 keyframes and 110 MB. Insets show detailed comparison of the maps.
  • Figure 4: Submap generation comparison using 6.2 km Mout Water dataset using identical keyframe selection strategies. Top shows keyframes used per submap (DLIOM avg. 10.4 Ours avg. 2.3). Middle shows the minimum Hessian eigenvalue (DLIOM avg. 377 Ours avg. 407). Bottom shows per scan computation time (DLIOM avg. 67.3 ms Ours avg. 45.4 ms).
  • Figure :
  • ...and 3 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Remark 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • ...and 2 more