Advancing Autonomous Driving: DepthSense with Radar and Spatial Attention
Muhamamd Ishfaq Hussain, Zubia Naz, Muhammad Aasim Rafique, Moongu Jeon
TL;DR
DepthSense addresses the challenge of reliable depth estimation for autonomous driving by fusing monocular RGB data with radar information. It introduces a deep encoder-decoder RGB network augmented by a Radar Residual Network, a late fusion pipeline with a Spatial Attention Mechanism, and an ordinal regression head guided by SID, along with MER-based radar data augmentation. On the nuScenes dataset, DepthSense with MERs achieves superior depth accuracy and efficiency (approximately 68M parameters and 0.118 s inference for a batch of 3) compared to state-of-the-art monocular and radar-fusion methods. The approach demonstrates robust performance across diverse conditions and highlights the practical potential of radar-assisted monocular depth for robust, real-time autonomous driving perception.
Abstract
Depth perception is crucial for spatial understanding and has traditionally been achieved through stereoscopic imaging. However, the precision of depth estimation using stereoscopic methods depends on the accurate calibration of binocular vision sensors. Monocular cameras, while more accessible, often suffer from reduced accuracy, especially under challenging imaging conditions. Optical sensors, too, face limitations in adverse environments, leading researchers to explore radar technology as a reliable alternative. Although radar provides coarse but accurate signals, its integration with fine-grained monocular camera data remains underexplored. In this research, we propose DepthSense, a novel radar-assisted monocular depth enhancement approach. DepthSense employs an encoder-decoder architecture, a Radar Residual Network, feature fusion with a spatial attention mechanism, and an ordinal regression layer to deliver precise depth estimations. We conducted extensive experiments on the nuScenes dataset to validate the effectiveness of DepthSense. Our methodology not only surpasses existing approaches in quantitative performance but also reduces parameter complexity and inference times. Our findings demonstrate that DepthSense represents a significant advancement over traditional stereo methods, offering a robust and efficient solution for depth estimation in autonomous driving. By leveraging the complementary strengths of radar and monocular camera data, DepthSense sets a new benchmark in the field, paving the way for more reliable and accurate spatial perception systems.
