Table of Contents
Fetching ...

RIDERS: Radar-Infrared Depth Estimation for Robust Sensing

Han Li, Yukai Ma, Yuehao Huang, Yaqing Gu, Weihua Xu, Yong Liu, Xingxing Zuo

TL;DR

This work tackles dense metric depth estimation in challenging weather where short-wave sensors falter. It proposes RIDERS, a three-stage Radar–Infrared fusion framework: monocular depth prediction on thermal images with global scale alignment, quasi-dense Radar augmentation via a Transformer-based RC-Net, and local refinement through a Scale Map Learner that yields the final metric depth $\hat{d} = \hat{s} / \hat{z}_{ga}$. The approach demonstrates robust performance across smoky, nighttime, and low-light scenarios on NTU4DRadLM and the authors’ ZJU-Multispectrum dataset, surpassing prior Radar–Camera methods in both accuracy and reliability. By exploiting long-wave sensing modalities, RIDERS achieves reliable depth irrespective of ambient light and scattering particles, with publicly released code and a new Multispectrum dataset to spur further research.

Abstract

Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave Radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense Radar augmentation by learning Radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios. Our code will be released at \url{https://github.com/MMOCKING/RIDERS}.

RIDERS: Radar-Infrared Depth Estimation for Robust Sensing

TL;DR

This work tackles dense metric depth estimation in challenging weather where short-wave sensors falter. It proposes RIDERS, a three-stage Radar–Infrared fusion framework: monocular depth prediction on thermal images with global scale alignment, quasi-dense Radar augmentation via a Transformer-based RC-Net, and local refinement through a Scale Map Learner that yields the final metric depth . The approach demonstrates robust performance across smoky, nighttime, and low-light scenarios on NTU4DRadLM and the authors’ ZJU-Multispectrum dataset, surpassing prior Radar–Camera methods in both accuracy and reliability. By exploiting long-wave sensing modalities, RIDERS achieves reliable depth irrespective of ambient light and scattering particles, with publicly released code and a new Multispectrum dataset to spur further research.

Abstract

Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave Radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense Radar augmentation by learning Radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios. Our code will be released at \url{https://github.com/MMOCKING/RIDERS}.
Paper Structure (29 sections, 8 equations, 13 figures, 5 tables)

This paper contains 29 sections, 8 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Left: Our approach can provide high-quality depth estimation beyond the visible spectrum, unaffected by micrometer-sized particles. Right: Millimeter-wave Radar and infrared thermal cameras have longer operational wavelengths than LiDAR and RGB cameras to penetrate atmospheric particles.
  • Figure 2: The overall framework of our proposed RIDERS is comprised of three stages: monocular depth estimation from infrared images, quasi-dense augmentation of Radar depth, and scale map learner for refining the local scale of dense depth.
  • Figure 3: Zero-shot generalized depth predictions. From left to right: the input thermal image, zero-shot generalized depth prediction from DPT ranftl2021vision, MiDaS birkl2023midas, ZoeDepth bhat2023zoedepth, LeReS yin2021learningyin2022towards, and Depth Anything yang2024depth, which are trained on RGB images. The second and third rows correspond to consecutive frames with a time interval of about 0.1 seconds. LeReS, ZoeDepth, and Depth Anything exhibit good performance without fine-tuning. Specifically, LeReS provides precise edges and fine details, while ZoeDepth and Depth Anything demonstrate temporal consistency in consecutive frames.
  • Figure 4: The architecture of our sparse Radar augmentation network, RC-Net. The network takes Radar depths and image patches cropped around each Radar point as input. The architecture consists of two encoder branches, a transformer module, and a multi-scale decoder, aiming to infer pixel-level confidence scores for each Radar point-image patch pairing.
  • Figure 5: Radar augmentation result. Left: sparse Radar points back-projected onto the thermal image plane. Right: augmented quasi-dense depth $\hat{\mathbf{d}}_q$ from our RC-Net.
  • ...and 8 more figures