Table of Contents
Fetching ...

LaserSAM: Zero-Shot Change Detection Using Visual Segmentation of Spinning LiDAR

Alexander Krawciw, Sven Lilge, Timothy D. Barfoot

TL;DR

This work tackles robust change detection for mobile robots operating in unstructured environments by turning spinning LiDAR scans into perspective camera views and applying zero-shot semantic segmentation via the Segment Anything Model (SAM). The method renders two aligned LiDAR submaps (map and live) into a common virtual camera frame, colorizes by range and intensity, and detects changes through IoU-based mask comparisons plus a 3D consistency check, with changes back-projected into 3D for planning. It achieves $IoU=73.3\%$ overall and $IoU=80.4\%$ within planning corridors on real-world data, while remaining effective under day-to-night illumination changes and enabling a 2–3 Hz planning loop in closed-loop experiments. The approach offers a practical, sensor-efficient pathway to semantically aware obstacle avoidance in off-road robotics, using a single LiDAR and pre-trained vision priors without additional labeling or sensor fusion. Future improvements target temporal tracking, dynamic-change handling, and faster prompting to broaden real-time applicability.

Abstract

This paper presents an approach for applying camera perception techniques to spinning LiDAR data. To improve the robustness of long-term change detection from a 3D LiDAR, range and intensity information are rendered into virtual perspectives using a pinhole camera model. Hue-saturation-value image encoding is used to colourize the images by range and near-IR intensity. The LiDAR's active scene illumination makes it invariant to ambient brightness, which enables night-to-day change detection without additional processing. Using the range-colourized, perspective image allows existing foundation models to detect semantic regions. Specifically, the Segment Anything Model detects semantically similar regions in both a previously acquired map and live view from a path-repeating robot. By comparing the masks in both views, changes in the live scan are detected. Results indicate that the Segment Anything Model accurately captures the shape of arbitrary changes introduced into scenes. The proposed method achieves a segmentation intersection over union of 73.3% when evaluated in unstructured environments and 80.4% when evaluated within the planning corridor. Changes can be detected reliably through day-to-night illumination variations. After pixel-level masks are generated, the one-to-one correspondence with 3D points means that the 2D masks can be used directly to recover the 3D location of the changes. The detected 3D changes are avoided in a closed loop by treating them as obstacles in a local motion planner. Experiments on an unmanned ground vehicle demonstrate the performance of the method.

LaserSAM: Zero-Shot Change Detection Using Visual Segmentation of Spinning LiDAR

TL;DR

This work tackles robust change detection for mobile robots operating in unstructured environments by turning spinning LiDAR scans into perspective camera views and applying zero-shot semantic segmentation via the Segment Anything Model (SAM). The method renders two aligned LiDAR submaps (map and live) into a common virtual camera frame, colorizes by range and intensity, and detects changes through IoU-based mask comparisons plus a 3D consistency check, with changes back-projected into 3D for planning. It achieves overall and within planning corridors on real-world data, while remaining effective under day-to-night illumination changes and enabling a 2–3 Hz planning loop in closed-loop experiments. The approach offers a practical, sensor-efficient pathway to semantically aware obstacle avoidance in off-road robotics, using a single LiDAR and pre-trained vision priors without additional labeling or sensor fusion. Future improvements target temporal tracking, dynamic-change handling, and faster prompting to broaden real-time applicability.

Abstract

This paper presents an approach for applying camera perception techniques to spinning LiDAR data. To improve the robustness of long-term change detection from a 3D LiDAR, range and intensity information are rendered into virtual perspectives using a pinhole camera model. Hue-saturation-value image encoding is used to colourize the images by range and near-IR intensity. The LiDAR's active scene illumination makes it invariant to ambient brightness, which enables night-to-day change detection without additional processing. Using the range-colourized, perspective image allows existing foundation models to detect semantic regions. Specifically, the Segment Anything Model detects semantically similar regions in both a previously acquired map and live view from a path-repeating robot. By comparing the masks in both views, changes in the live scan are detected. Results indicate that the Segment Anything Model accurately captures the shape of arbitrary changes introduced into scenes. The proposed method achieves a segmentation intersection over union of 73.3% when evaluated in unstructured environments and 80.4% when evaluated within the planning corridor. Changes can be detected reliably through day-to-night illumination variations. After pixel-level masks are generated, the one-to-one correspondence with 3D points means that the 2D masks can be used directly to recover the 3D location of the changes. The detected 3D changes are avoided in a closed loop by treating them as obstacles in a local motion planner. Experiments on an unmanned ground vehicle demonstrate the performance of the method.
Paper Structure (17 sections, 6 equations, 9 figures, 1 table)

This paper contains 17 sections, 6 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The Clearpath Warthog robot driving at night during snowfall, repeating a path that was previously taught during daytime. This paper proposes LaserSAM to detect and mask environmental changes between teach and repeat paths by creating virtual camera views from LiDAR data and applying deep-learning-based segmentations.
  • Figure 2: Data processing pipeline of LaserSAM. The pipeline runs for each new frame obtained from the Ouster LiDAR.
  • Figure 3: The left column contains equirectangular and perspective images aligned with the LiDAR. The right column shows the two projections with a two-meter lateral offset. The perspective view has a smaller blind spot around the base of the robot.
  • Figure 4: Change-detection masks for a sample frame. The left column shows the input images. There are three changed objects, two pedestrians and a static cone, marked in the ground truth. The segmentation results for each algorithm are shown in their respective panel.
  • Figure 5: A temporal sequence of semantic regions generated by SAM sam_fb. The shared prompt point is highlighted in the pink bulls-eye. The mask colours match between the teach and repeat.
  • ...and 4 more figures