Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Huadong Li, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji
TL;DR
The paper tackles radar-camera depth completion under sparse LiDAR supervision and identifies LiDAR Distribution Leakage (LDL) as the cause of stripe-like artifacts. It introduces a Disruption-Compensation framework that disrupts LDL during training via Camera Intrinsics Disruption and Radar Disruption and compensates with a Radar-aware Mask Decoder and a Radar-Position Injection Module within a multi-scale Depth Completion Network, guided by a weighted loss. On nuScenes, the approach surpasses state-of-the-art dense supervision methods with a $11.6\%$ improvement in $MAE$ and a $1.6\times$ faster $FPS$, demonstrating that carefully designed sparse supervision can outperform dense supervision while reducing data-noise issues. This work challenges the assumption that denser supervision is always superior and suggests broader applicability of LDL mitigation strategies to improve efficiency and accuracy in 3D perception tasks.
Abstract
It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying reasons for this are rarely discussed. To this end, we revisit the task of radar-camera depth completion and present a new method with sparse LiDAR supervision to outperform previous dense LiDAR supervision methods in both accuracy and speed. Specifically, when trained by sparse LiDAR supervision, depth completion models usually output depth maps containing significant stripe-like artifacts. We find that such a phenomenon is caused by the implicitly learned positional distribution pattern from sparse LiDAR supervision, termed as LiDAR Distribution Leakage (LDL) in this paper. Based on such understanding, we present a novel Disruption-Compensation radar-camera depth completion framework to address this issue. The Disruption part aims to deliberately disrupt the learning of LiDAR distribution from sparse supervision, while the Compensation part aims to leverage 3D spatial and 2D semantic information to compensate for the information loss of previous disruptions. Extensive experimental results demonstrate that by reducing the impact of LDL, our framework with sparse supervision outperforms the state-of-the-art dense supervision methods with 11.6% improvement in Mean Absolute Error (MAE)} and 1.6x speedup in Frame Per Second (FPS)}. The code is available at https://github.com/megvii-research/Sparse-Beats-Dense.
