RIDERS: Radar-Infrared Depth Estimation for Robust Sensing
Han Li, Yukai Ma, Yuehao Huang, Yaqing Gu, Weihua Xu, Yong Liu, Xingxing Zuo
TL;DR
This work tackles dense metric depth estimation in challenging weather where short-wave sensors falter. It proposes RIDERS, a three-stage Radar–Infrared fusion framework: monocular depth prediction on thermal images with global scale alignment, quasi-dense Radar augmentation via a Transformer-based RC-Net, and local refinement through a Scale Map Learner that yields the final metric depth $\hat{d} = \hat{s} / \hat{z}_{ga}$. The approach demonstrates robust performance across smoky, nighttime, and low-light scenarios on NTU4DRadLM and the authors’ ZJU-Multispectrum dataset, surpassing prior Radar–Camera methods in both accuracy and reliability. By exploiting long-wave sensing modalities, RIDERS achieves reliable depth irrespective of ambient light and scattering particles, with publicly released code and a new Multispectrum dataset to spur further research.
Abstract
Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave Radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense Radar augmentation by learning Radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios. Our code will be released at \url{https://github.com/MMOCKING/RIDERS}.
