Table of Contents
Fetching ...

ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement

Anning Hu, Ang Li, Xirui Jin, Danping Zou

TL;DR

ThermoStereoRT tackles real-time thermal stereo matching under all-weather conditions by combining a lightweight shallow encoder, multi-scale attention-based aggregation, and a novel spatial attention refinement. A knowledge distillation framework uses a dense teacher to overcome sparse ground-truth challenges inherent in thermal data, enhancing disparity accuracy without added computation. The method achieves state-of-the-art real-time performance on MS2 and CATStreible datasets, while ablation studies confirm the critical role of the attention refinement and distillation components. Practical impact includes robust depth perception for night-time robots and drones, with deployable performance on edge hardware. The authors also provide an open-source implementation to facilitate future research and real-world deployment.

Abstract

We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments. Our code will be released on https://github.com/SJTU-ViSYS-team/ThermoStereoRT

ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement

TL;DR

ThermoStereoRT tackles real-time thermal stereo matching under all-weather conditions by combining a lightweight shallow encoder, multi-scale attention-based aggregation, and a novel spatial attention refinement. A knowledge distillation framework uses a dense teacher to overcome sparse ground-truth challenges inherent in thermal data, enhancing disparity accuracy without added computation. The method achieves state-of-the-art real-time performance on MS2 and CATStreible datasets, while ablation studies confirm the critical role of the attention refinement and distillation components. Practical impact includes robust depth perception for night-time robots and drones, with deployable performance on edge hardware. The authors also provide an open-source implementation to facilitate future research and real-world deployment.

Abstract

We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments. Our code will be released on https://github.com/SJTU-ViSYS-team/ThermoStereoRT

Paper Structure

This paper contains 18 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Our method achieves the best trade off in accuracy and inference speed on the MS2shin2023MS2 dataset.
  • Figure 2: Results in both indoor and outdoor scenarios of CATStreible2017cats dataset. Our method produces more accurate predictions with smaller disparity errors and more regular object shapes.
  • Figure 3: Overview of our proposed ThermoStereoRT. First, stereo thermal images are fed into A. (shallow Encoder) to generate features at different scales and construct a cost volume. Subsequently, B. (Multi-Scale Aggregation module) aggregates the cost and utilizes information from different scales. The initial disparity, along with the merged left and right features, is then fed into C. (Spatial Attention Refinement module) to refine details. The lower part of the figure illustrates the knowledge distillation process, where the Selective-IGEVwang2024selective acts as the teacher for our work.
  • Figure 4: Detailed architecture of the shallow encoder
  • Figure 5: Qualitative results on MS2shin2023MS2 dataset. Our method is capable of predicting fine disparity from thermal images with a small error (blue in error map).
  • ...and 1 more figures