ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement
Anning Hu, Ang Li, Xirui Jin, Danping Zou
TL;DR
ThermoStereoRT tackles real-time thermal stereo matching under all-weather conditions by combining a lightweight shallow encoder, multi-scale attention-based aggregation, and a novel spatial attention refinement. A knowledge distillation framework uses a dense teacher to overcome sparse ground-truth challenges inherent in thermal data, enhancing disparity accuracy without added computation. The method achieves state-of-the-art real-time performance on MS2 and CATStreible datasets, while ablation studies confirm the critical role of the attention refinement and distillation components. Practical impact includes robust depth perception for night-time robots and drones, with deployable performance on edge hardware. The authors also provide an open-source implementation to facilitate future research and real-world deployment.
Abstract
We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments. Our code will be released on https://github.com/SJTU-ViSYS-team/ThermoStereoRT
