Fast LiDAR Informed Visual Search in Unseen Indoor Environments

Ryan Gupta; Kyle Morgenstein; Steven Ortega; Luis Sentis

Fast LiDAR Informed Visual Search in Unseen Indoor Environments

Ryan Gupta, Kyle Morgenstein, Steven Ortega, Luis Sentis

TL;DR

This work addresses fast visual search in unseen indoor environments by marrying frontier-based planning with a map-free LiDAR perception module that labels non-map points to guide viewpoint selection. The core approach trains a pixel-wise LiDAR classifier from map-based ground truth and uses a history-augmented, autoregressive model to run online, enabling the planner to prioritize non-permanent features. A two-map occupancy framework and a four-candidate, centroid-based viewpoint generation scheme yield a novel utility that accelerates target localization while handling unknown space. Results in simulation and real Spot experiments show faster search and robustness to label-noise, with performance close to ground-truth-informed planning. The method demonstrates practical impact for multi-sensor indoor exploration where map information is unavailable or outdated, offering real-time, scalable guidance for autonomous search tasks.

Abstract

This paper details a system for fast visual exploration and search without prior map information. We leverage frontier based planning with both LiDAR and visual sensing and augment it with a perception module that contextually labels points in the surroundings from wide Field of View 2D LiDAR scans. The goal of the perception module is to recognize surrounding points more likely to be the search target in order to provide an informed prior on which to plan next best viewpoints. The robust map-free scan classifier used to label pixels in the robot's surroundings is trained from expert data collected using a simple cart platform equipped with a map-based classifier. We propose a novel utility function that accounts for the contextual data found from the classifier. The resulting viewpoints encourage the robot to explore points unlikely to be permanent in the environment, leading the robot to locate objects of interest faster than several existing baseline algorithms. Our proposed system is further validated in real-world search experiments for single and multiple search objects with a Spot robot in two unseen environments. Videos of experiments, implementation details and open source code can be found at https://sites.google.com/view/lives-2024/home.

Fast LiDAR Informed Visual Search in Unseen Indoor Environments

TL;DR

Abstract

Paper Structure (20 sections, 4 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 4 equations, 8 figures, 1 table, 1 algorithm.

INTRODUCTION
Related Works
Methods
Environment
Ground Truth LiDAR Scan Classification
LTF
STF
Map-free LiDAR Scan Classification
Dataset and Data Acquisition
Architecture and Training
Map Updates
Viewpoint Planning
Candidate Viewpoint Generation
Viewpoint Selection
Results
...and 5 more sections

Figures (8)

Figure 1: The cart used for labeled data acquisition and the Spot robot used for deployment. The RealSense provides odometry estimate to the cart, required for ground-truth estimation. Spot is equipped with an RGB-D Azure Kinect for detection.
Figure 2: Pixel-wise LiDAR scan classification model architecture used to speed up search by providing information to the planner. $k$ is the length of the history buffer. (a) $[3,k]$ pose history matrix containing $[x,y,\theta]$. (b) $[n_{i},k]$ LiDAR range history matrix. (c) $[n_{i},k-1] \cup [n_{i},1]$ estimated label history matrix concatenated with its pixel-wise exponential weighted average. (d) The model consists of three temporal-convolutional encoders (TCN) (i.e. pose encoder, scan encoder, and label encoder). The encoded poses, scans, and labels are combined to produce a pixel-wise classification of the LiDAR scan. In (e) a threshold (positive/negative) is applied to the raw logits such that each pixel is classified as either a map point or a non-map point.
Figure 3: Map updates are performed from observation at each timestep for visual (a) and LiDAR (b) sensors. White cells are known free and blue/grey cells are unknown. Next, frontiers and their centroids are computed for each of the two sensors (c) and (d).
Figure 4: An overview of the viewpoint sampling process. Four viewpoints are considered at each centroid, shown as red triangles. The highest scoring viewpoint is shown as a yellow triangle to be sent to the robot.
Figure 5: Ablation studies over policy architecture (top) and injected noise during training (bottom). The inclusion of the label history buffer yields 11.63% higher test accuracy. The policy is robust up to 30% to bit-flipping errors in the label history buffer. The mean accuracy depicts the 5-step moving average.
...and 3 more figures

Fast LiDAR Informed Visual Search in Unseen Indoor Environments

TL;DR

Abstract

Fast LiDAR Informed Visual Search in Unseen Indoor Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (8)