Table of Contents
Fetching ...

SLAM in the Dark: Self-Supervised Learning of Pose, Depth and Loop-Closure from Thermal Images

Yangfan Xu, Qu Hao, Lilian Zhang, Jun Mao, Xiaofeng He, Wenqi Wu, Changhao Chen

TL;DR

DarkSLAM addresses the challenge of monocular thermal SLAM in outdoor, low-light conditions by combining self-supervised pose and depth learning with targeted architectural enhancements. It introduces Efficient Channel Attention (ECA) for PoseNet and Dino-ResNet50 with a Selective Kernel Attention (SKA)–based DepthNet, along with a Siamese LoopNet for robust loop-closure detection, all integrated into a pose-graph optimization backend. The framework achieves large-scale localization and dense mapping in complex thermal environments and outperforms prior methods in pose accuracy and loop-closure reliability, with real-time capable performance on a high-end GPU. By reducing reliance on labeled data and improving feature robustness in degraded thermal imagery, DarkSLAM holds practical potential for night-time navigation, search-and-rescue, and autonomous monitoring where visible-light SLAM fails. Future work will target better loop-detection under varying thermal conditions and dynamics handling, as well as porting the system to resource-constrained edge devices.

Abstract

Visual SLAM is essential for mobile robots, drone navigation, and VR/AR, but traditional RGB camera systems struggle in low-light conditions, driving interest in thermal SLAM, which excels in such environments. However, thermal imaging faces challenges like low contrast, high noise, and limited large-scale annotated datasets, restricting the use of deep learning in outdoor scenarios. We present DarkSLAM, a noval deep learning-based monocular thermal SLAM system designed for large-scale localization and reconstruction in complex lighting conditions.Our approach incorporates the Efficient Channel Attention (ECA) mechanism in visual odometry and the Selective Kernel Attention (SKA) mechanism in depth estimation to enhance pose accuracy and mitigate thermal depth degradation. Additionally, the system includes thermal depth-based loop closure detection and pose optimization, ensuring robust performance in low-texture thermal scenes. Extensive outdoor experiments demonstrate that DarkSLAM significantly outperforms existing methods like SC-Sfm-Learner and Shin et al., delivering precise localization and 3D dense mapping even in challenging nighttime environments.

SLAM in the Dark: Self-Supervised Learning of Pose, Depth and Loop-Closure from Thermal Images

TL;DR

DarkSLAM addresses the challenge of monocular thermal SLAM in outdoor, low-light conditions by combining self-supervised pose and depth learning with targeted architectural enhancements. It introduces Efficient Channel Attention (ECA) for PoseNet and Dino-ResNet50 with a Selective Kernel Attention (SKA)–based DepthNet, along with a Siamese LoopNet for robust loop-closure detection, all integrated into a pose-graph optimization backend. The framework achieves large-scale localization and dense mapping in complex thermal environments and outperforms prior methods in pose accuracy and loop-closure reliability, with real-time capable performance on a high-end GPU. By reducing reliance on labeled data and improving feature robustness in degraded thermal imagery, DarkSLAM holds practical potential for night-time navigation, search-and-rescue, and autonomous monitoring where visible-light SLAM fails. Future work will target better loop-detection under varying thermal conditions and dynamics handling, as well as porting the system to resource-constrained edge devices.

Abstract

Visual SLAM is essential for mobile robots, drone navigation, and VR/AR, but traditional RGB camera systems struggle in low-light conditions, driving interest in thermal SLAM, which excels in such environments. However, thermal imaging faces challenges like low contrast, high noise, and limited large-scale annotated datasets, restricting the use of deep learning in outdoor scenarios. We present DarkSLAM, a noval deep learning-based monocular thermal SLAM system designed for large-scale localization and reconstruction in complex lighting conditions.Our approach incorporates the Efficient Channel Attention (ECA) mechanism in visual odometry and the Selective Kernel Attention (SKA) mechanism in depth estimation to enhance pose accuracy and mitigate thermal depth degradation. Additionally, the system includes thermal depth-based loop closure detection and pose optimization, ensuring robust performance in low-texture thermal scenes. Extensive outdoor experiments demonstrate that DarkSLAM significantly outperforms existing methods like SC-Sfm-Learner and Shin et al., delivering precise localization and 3D dense mapping even in challenging nighttime environments.

Paper Structure

This paper contains 28 sections, 10 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: PoseNet computes relative poses and builds a pose graph from the image stream, while LoopNet extracts keyframe features for loop closure detection. Localization and mapping are achieved by combining the optimized pose graph and depth maps from DepthNet.
  • Figure 2: Comparison of the original thermal image (left) and the transformed image (right). Adjusting brightness and contrast enhances image features, clarifying dark details and highlighting grayscale differences. Filtering the detail layer reduces noise, resulting in a cleaner image.
  • Figure 3: In our proposed DarkSLAM framework, the pose and depth estimation modules adopt a self-supervised learning architecture. The predicted pose and depth are used to warp the source image to generate new neighbor images, construct a mask to compute the image loss.
  • Figure 4: The figure illustrates the process of training LoopNet using a Siamese network. By minimizing the feature distance of positive samples and maximizing the feature distance of negative samples, we effectively enhance the loop closure detection performance.
  • Figure 5: Experimental platform.
  • ...and 4 more figures