Table of Contents
Fetching ...

LST-SLAM: A Stereo Thermal SLAM System for Kilometer-Scale Dynamic Environments

Zeyu Jiang, Kuan Xu, Changhao Chen

TL;DR

LST-SLAM is proposed, a novel large-scale stereo thermal SLAM system that achieves robust performance in complex, dynamic scenes and introduces a semantic-geometric hybrid constraint that suppresses potentially dynamic features lacking strong inter-frame geometric consistency.

Abstract

Thermal cameras offer strong potential for robot perception under challenging illumination and weather conditions. However, thermal Simultaneous Localization and Mapping (SLAM) remains difficult due to unreliable feature extraction, unstable motion tracking, and inconsistent global pose and map construction, particularly in dynamic large-scale outdoor environments. To address these challenges, we propose LST-SLAM, a novel large-scale stereo thermal SLAM system that achieves robust performance in complex, dynamic scenes. Our approach combines self-supervised thermal feature learning, stereo dual-level motion tracking, and geometric pose optimization. We also introduce a semantic-geometric hybrid constraint that suppresses potentially dynamic features lacking strong inter-frame geometric consistency. Furthermore, we develop an online incremental bag-of-words model for loop closure detection, coupled with global pose optimization to mitigate accumulated drift. Extensive experiments on kilometer-scale dynamic thermal datasets show that LST-SLAM significantly outperforms recent representative SLAM systems, including AirSLAM and DROID-SLAM, in both robustness and accuracy.

LST-SLAM: A Stereo Thermal SLAM System for Kilometer-Scale Dynamic Environments

TL;DR

LST-SLAM is proposed, a novel large-scale stereo thermal SLAM system that achieves robust performance in complex, dynamic scenes and introduces a semantic-geometric hybrid constraint that suppresses potentially dynamic features lacking strong inter-frame geometric consistency.

Abstract

Thermal cameras offer strong potential for robot perception under challenging illumination and weather conditions. However, thermal Simultaneous Localization and Mapping (SLAM) remains difficult due to unreliable feature extraction, unstable motion tracking, and inconsistent global pose and map construction, particularly in dynamic large-scale outdoor environments. To address these challenges, we propose LST-SLAM, a novel large-scale stereo thermal SLAM system that achieves robust performance in complex, dynamic scenes. Our approach combines self-supervised thermal feature learning, stereo dual-level motion tracking, and geometric pose optimization. We also introduce a semantic-geometric hybrid constraint that suppresses potentially dynamic features lacking strong inter-frame geometric consistency. Furthermore, we develop an online incremental bag-of-words model for loop closure detection, coupled with global pose optimization to mitigate accumulated drift. Extensive experiments on kilometer-scale dynamic thermal datasets show that LST-SLAM significantly outperforms recent representative SLAM systems, including AirSLAM and DROID-SLAM, in both robustness and accuracy.
Paper Structure (15 sections, 12 equations, 6 figures, 2 tables)

This paper contains 15 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: LST-SLAM enables robust localization and mapping in kilometer-scale dynamic thermal scenes. The system leverages self-supervised thermal features and a dual-level tracking strategy, while an incremental online BoW supports loop closure detection and global pose optimization.
  • Figure 2: LST-SLAM system pipeline. Stereo inputs are processed with self-supervised feature point and dynamic filtering networks to obtain robust thermal features, which drive tracking, mapping, and local optimization. Binarized descriptors build an incremental BoW (iBoW) for loop closure and global pose optimization.
  • Figure 3: Top: Architecture and details of the STP network. Bottom: Schematic diagram of adaptive self-supervised training in STP networks under thermal modality.
  • Figure 4: Qualitative results of feature extraction and tracking on the M2DGR dataset. Each row visualizes feature matches over continuous video segments sampled at larger frame intervals. The three sequences were chosen to represent different motion directions.
  • Figure 5: The STP network significantly outperforms other methods and substantially enhances the SuperPoint behavior in terms of the number of matches and inlier points between different frames.
  • ...and 1 more figures