Table of Contents
Fetching ...

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Huadong Li, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji

TL;DR

The paper tackles radar-camera depth completion under sparse LiDAR supervision and identifies LiDAR Distribution Leakage (LDL) as the cause of stripe-like artifacts. It introduces a Disruption-Compensation framework that disrupts LDL during training via Camera Intrinsics Disruption and Radar Disruption and compensates with a Radar-aware Mask Decoder and a Radar-Position Injection Module within a multi-scale Depth Completion Network, guided by a weighted loss. On nuScenes, the approach surpasses state-of-the-art dense supervision methods with a $11.6\%$ improvement in $MAE$ and a $1.6\times$ faster $FPS$, demonstrating that carefully designed sparse supervision can outperform dense supervision while reducing data-noise issues. This work challenges the assumption that denser supervision is always superior and suggests broader applicability of LDL mitigation strategies to improve efficiency and accuracy in 3D perception tasks.

Abstract

It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying reasons for this are rarely discussed. To this end, we revisit the task of radar-camera depth completion and present a new method with sparse LiDAR supervision to outperform previous dense LiDAR supervision methods in both accuracy and speed. Specifically, when trained by sparse LiDAR supervision, depth completion models usually output depth maps containing significant stripe-like artifacts. We find that such a phenomenon is caused by the implicitly learned positional distribution pattern from sparse LiDAR supervision, termed as LiDAR Distribution Leakage (LDL) in this paper. Based on such understanding, we present a novel Disruption-Compensation radar-camera depth completion framework to address this issue. The Disruption part aims to deliberately disrupt the learning of LiDAR distribution from sparse supervision, while the Compensation part aims to leverage 3D spatial and 2D semantic information to compensate for the information loss of previous disruptions. Extensive experimental results demonstrate that by reducing the impact of LDL, our framework with sparse supervision outperforms the state-of-the-art dense supervision methods with 11.6% improvement in Mean Absolute Error (MAE)} and 1.6x speedup in Frame Per Second (FPS)}. The code is available at https://github.com/megvii-research/Sparse-Beats-Dense.

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

TL;DR

The paper tackles radar-camera depth completion under sparse LiDAR supervision and identifies LiDAR Distribution Leakage (LDL) as the cause of stripe-like artifacts. It introduces a Disruption-Compensation framework that disrupts LDL during training via Camera Intrinsics Disruption and Radar Disruption and compensates with a Radar-aware Mask Decoder and a Radar-Position Injection Module within a multi-scale Depth Completion Network, guided by a weighted loss. On nuScenes, the approach surpasses state-of-the-art dense supervision methods with a improvement in and a faster , demonstrating that carefully designed sparse supervision can outperform dense supervision while reducing data-noise issues. This work challenges the assumption that denser supervision is always superior and suggests broader applicability of LDL mitigation strategies to improve efficiency and accuracy in 3D perception tasks.

Abstract

It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying reasons for this are rarely discussed. To this end, we revisit the task of radar-camera depth completion and present a new method with sparse LiDAR supervision to outperform previous dense LiDAR supervision methods in both accuracy and speed. Specifically, when trained by sparse LiDAR supervision, depth completion models usually output depth maps containing significant stripe-like artifacts. We find that such a phenomenon is caused by the implicitly learned positional distribution pattern from sparse LiDAR supervision, termed as LiDAR Distribution Leakage (LDL) in this paper. Based on such understanding, we present a novel Disruption-Compensation radar-camera depth completion framework to address this issue. The Disruption part aims to deliberately disrupt the learning of LiDAR distribution from sparse supervision, while the Compensation part aims to leverage 3D spatial and 2D semantic information to compensate for the information loss of previous disruptions. Extensive experimental results demonstrate that by reducing the impact of LDL, our framework with sparse supervision outperforms the state-of-the-art dense supervision methods with 11.6% improvement in Mean Absolute Error (MAE)} and 1.6x speedup in Frame Per Second (FPS)}. The code is available at https://github.com/megvii-research/Sparse-Beats-Dense.
Paper Structure (14 sections, 4 equations, 7 figures, 4 tables)

This paper contains 14 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) Without our proposed framework, directly exploiting sparse supervision of a single LiDAR frame leads to stripe-like scanning patterns in outputs. However, our proposed Disruption-Compensation radar-camera depth completion framework mitigates this issue and relights the sparse supervision for this task. (b) Our proposed framework under sparse supervision outperforms state-of-the-art dense supervision methods, with $\mathbf{11.6\%}$ improvement in MAE (Mean Absolute Error) and $\mathbf{1.6 \times}$ speedup in FPS (Frame Per Second).
  • Figure 2: Multi-frame dense supervision is noisy. As shown in the red box, the depth supervision of the traffic signs and the background buildings appears messy and inaccurate due to inter-frame noises, which will confuse the depth completion models under dense supervision singh2023depthlong2021radar.
  • Figure 3: The architecture of Disruption-Compensation radar-camera depth completion framework. In the Disruption part, we propose Camera Intrinsics Disruption and Radar Disruption to mitigate the impact of LiDAR Distribution Leakage. In the Compensation part, we design the Radar-Position Injection Module and the Radar-aware Mask Decoder to compensate for the information loss in the previous disruption process. Notably, the Radar-aware Mask Decoder is only used during training, bringing no additional computational cost in the inference phase.
  • Figure 4: (a) LiDAR Distribution Leakage (LDL). We denote the LiDAR distribution as the stripe-like positional distribution pattern in the ground truth 2D projected LiDAR depth maps. We notice that although certain regions (e.g., region A and B) on LiDAR images contain no depth values for supervision, the depth completion model still fulfilled depth values for these regions, following the stripe-like artifacts pattern of the LiDAR distribution (e.g., region C). Such results qualitatively indicate the leakage of LiDAR distribution during training. (b) Quantifying the impact of LDL with our newly proposed metric OMAE. Between the truck region in the ideal output dense map and the scan-pattern output depth map, the MAE metric used in singh2023depth returns the same value 0 since it only considers limited supervised positions of sparse LiDAR data (e.g., depth values marked in red), failing to reflect the impact of LDL. In comparison, OMAE considers object-level depth differences, which takes the stripe-like areas of each object into account, enabling the reflection about the impact of LDL.
  • Figure 5: The Disruption part: Camera Intrinsics Disruption and Radar Disruption. (a) In the Camera Intrinsics Disruption process, for the input image/radar data and the ground truth LiDAR data, we propose to randomly upsample them with a ratio $s$ and randomly crop them with the offset $\Delta x$ and $\Delta y$. This process is equivalent to disrupting camera intrinsics from $K$ to $K^{\prime}$. Here $f_x$, $f_y$, $c_x$, and $c_y$ are camera intrinsics. (b) In the Radar Disruption process, we propose to extend the radar points on the input 2D radar image into vertical lines. (c) We provide analysis results here to justify the reasons for the Radar Disruption process. Here Fig. (c-1)/(c-2) both denote that camera intrinsics $f_y$ and $c_y$ can be roughly inferred based on a simple inverse proportional function, which can be easily captured by models. Fig. (c-3) shows that after applying the Radar Disruption, $f_y$ and $c_y$ can no longer be easily obtained through a simple curve function, hindering models from capturing camera intrinsics. See Sec. \ref{['sec:Radar_Disruption']} for details.
  • ...and 2 more figures