Table of Contents
Fetching ...

SD4R: Sparse-to-Dense Learning for 3D Object Detection with 4D Radar

Xiaokai Bai, Jiahao Cheng, Songkai Wang, Yixuan Luo, Lianqing Zheng, Xiaohan Zhang, Si-Yuan Cao, Hui-Liang Shen

TL;DR

SD4R is a novel framework that transforms sparse radar point clouds into dense representations and achieves state-of-the-art performance on the publicly available View-of-Delft dataset.

Abstract

4D radar measurements offer an affordable and weather-robust solution for 3D perception. However, the inherent sparsity and noise of radar point clouds present significant challenges for accurate 3D object detection, underscoring the need for effective and robust point clouds densification. Despite recent progress, existing densification methods often fail to address the extreme sparsity of 4D radar point clouds and exhibit limited robustness when processing scenes with a small number of points. In this paper, we propose SD4R, a novel framework that transforms sparse radar point clouds into dense representations. SD4R begins by utilizing a foreground point generator (FPG) to mitigate noise propagation and produce densified point clouds. Subsequently, a logit-query encoder (LQE) enhances conventional pillarization, resulting in robust feature representations. Through these innovations, our SD4R demonstrates strong capability in both noise reduction and foreground point densification. Extensive experiments conducted on the publicly available View-of-Delft dataset demonstrate that SD4R achieves state-of-the-art performance. Source code is available at https://github.com/lancelot0805/SD4R.

SD4R: Sparse-to-Dense Learning for 3D Object Detection with 4D Radar

TL;DR

SD4R is a novel framework that transforms sparse radar point clouds into dense representations and achieves state-of-the-art performance on the publicly available View-of-Delft dataset.

Abstract

4D radar measurements offer an affordable and weather-robust solution for 3D perception. However, the inherent sparsity and noise of radar point clouds present significant challenges for accurate 3D object detection, underscoring the need for effective and robust point clouds densification. Despite recent progress, existing densification methods often fail to address the extreme sparsity of 4D radar point clouds and exhibit limited robustness when processing scenes with a small number of points. In this paper, we propose SD4R, a novel framework that transforms sparse radar point clouds into dense representations. SD4R begins by utilizing a foreground point generator (FPG) to mitigate noise propagation and produce densified point clouds. Subsequently, a logit-query encoder (LQE) enhances conventional pillarization, resulting in robust feature representations. Through these innovations, our SD4R demonstrates strong capability in both noise reduction and foreground point densification. Extensive experiments conducted on the publicly available View-of-Delft dataset demonstrate that SD4R achieves state-of-the-art performance. Source code is available at https://github.com/lancelot0805/SD4R.
Paper Structure (15 sections, 14 equations, 5 figures, 4 tables)

This paper contains 15 sections, 14 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Performance of our SD4R in real-scene 3D detection. The sparse points refer to the original radar measurements, while the virtual points are generated only based on the foreground points. The second row shows the corresponding visualization results. Our SD4R framework demonstrates strong capability in both noise reduction and foreground point densification.
  • Figure 2: (a) The proposed SD4R pipeline begins with voxelization of 4D radar point cloud, followed by processing through the VoteHead. This step predicts offsets between points and their corresponding object centers, classification logits, and point-wise features. These point-wise features are then concatenated with the logits to classify the point clouds. Subsequently, virtual points are generated at positions determined by the offsets, resulting in a densified point cloud. (b) The densified point cloud undergoes pillarization RCFusion to extract features. To further address the sparsity of radar data, we introduce a logit-query encoder (LQE) module, which aggregates features from neighboring points into pillars, leading to more robust representations. Finally, the detection head processes these features to generate the final detection outputs.
  • Figure 3: Based on each virtual point, we select $k$ original points. The upper branch conducts the selection and feature extraction of $k$ points, while the lower branch assigns weights to $k$ points based on distance. Finally, the features of $k$ points are multiplied by their weights and summed to obtain the feature of the virtual point.
  • Figure 4: Here are the details about LQE. We firstly computes the aggregation radius (R) based on the inside points, then aggregate the features of the outside points within this radius (R). Ultimately, the updated pillar features are obtained.
  • Figure 5: Some visualization results on the VoD VoD validation. Each column corresponds to a frame of data containing radar points in BEV and an image, where the red triangle denotes the position of the ego-vehicle. Ground-truth boxes are shown in orange (perspective) and yellow (bird’s-eye), while predicted boxes appear in blue and green, respectively. The second row overlays SD4R’s image-plane predictions.