MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation
Xingxing Zuo, Nikhil Ranganathan, Connor Lee, Georgia Gkioxari, Soon-Jo Chung
TL;DR
MonoTher-Depth tackles the challenge of thermal monocular depth estimation with scarce labeled data by transferring priors from a large RGB MDE model through confidence-aware distillation. The method introduces a confidence predictor that weights RGB-to-thermal depth guidance based on cross-modal and depth-consistency metadata, and uses sub-pixel warping to align depths across modalities. It demonstrates strong gains on MS^2 and ViViD++ datasets, achieving a zero-shot AbsRel improvement of $22.88\%$ in scenarios without ground-truth depth, and supports self-supervised fine-tuning while remaining robust to imperfect RGB–thermal alignment. This approach enables accurate thermal depth estimation in challenging conditions and supports real-world robotic deployments without requiring tightly co-registered RGB–T data. Overall, MonoTher-Depth advances reliable thermal perception by effectively leveraging large RGB priors in a confidence-aware, accessibly trainable framework.
Abstract
Monocular depth estimation (MDE) from thermal images is a crucial technology for robotic systems operating in challenging conditions such as fog, smoke, and low light. The limited availability of labeled thermal data constrains the generalization capabilities of thermal MDE models compared to foundational RGB MDE models, which benefit from datasets of millions of images across diverse scenarios. To address this challenge, we introduce a novel pipeline that enhances thermal MDE through knowledge distillation from a versatile RGB MDE model. Our approach features a confidence-aware distillation method that utilizes the predicted confidence of the RGB MDE to selectively strengthen the thermal MDE model, capitalizing on the strengths of the RGB model while mitigating its weaknesses. Our method significantly improves the accuracy of the thermal MDE, independent of the availability of labeled depth supervision, and greatly expands its applicability to new scenarios. In our experiments on new scenarios without labeled depth, the proposed confidence-aware distillation method reduces the absolute relative error of thermal MDE by 22.88\% compared to the baseline without distillation.
