RadarPillars: Efficient Object Detection from 4D Radar Point Clouds

Alexander Musiat; Laurenz Reichardt; Michael Schulze; Oliver Wasenmüller

RadarPillars: Efficient Object Detection from 4D Radar Point Clouds

Alexander Musiat, Laurenz Reichardt, Michael Schulze, Oliver Wasenmüller

TL;DR

RadarPillars tackles efficient 4D radar object detection by designing a pillar-based network that exploits velocity information through a novel velocity decomposition and a pillar-level self-attention mechanism. The approach introduces 4D Radar Features with $v_r$ decomposed into $v_{r,x}$ and $v_{r,y}$, plus velocity-offset features, and a PillarAttention layer that treats each pillar as a token to achieve a global receptive field with low computation. A uniform backbone scaling strategy further aligns model capacity with the extreme sparsity of radar data, yielding a lightweight model of $0.27$M parameters and $1.99$ GFLOPS that achieves state-of-the-art results on View-of-Delft while enabling real-time edge performance. Together, these design choices substantively improve radar-only detection efficiency and accuracy, and point to future work in end-to-end transformer-based radar perception and broader sensor fusion applications.

Abstract

Automotive radar systems have evolved to provide not only range, azimuth and Doppler velocity, but also elevation data. This additional dimension allows for the representation of 4D radar as a 3D point cloud. As a result, existing deep learning methods for 3D object detection, which were initially developed for LiDAR data, are often applied to these radar point clouds. However, this neglects the special characteristics of 4D radar data, such as the extreme sparsity and the optimal utilization of velocity information. To address these gaps in the state-of-the-art, we present RadarPillars, a pillar-based object detection network. By decomposing radial velocity data, introducing PillarAttention for efficient feature extraction, and studying layer scaling to accommodate radar sparsity, RadarPillars significantly outperform state-of-the-art detection results on the View-of-Delft dataset. Importantly, this comes at a significantly reduced parameter count, surpassing existing methods in terms of efficiency and enabling real-time performance on edge devices.

RadarPillars: Efficient Object Detection from 4D Radar Point Clouds

TL;DR

decomposed into

and

, plus velocity-offset features, and a PillarAttention layer that treats each pillar as a token to achieve a global receptive field with low computation. A uniform backbone scaling strategy further aligns model capacity with the extreme sparsity of radar data, yielding a lightweight model of

M parameters and

GFLOPS that achieves state-of-the-art results on View-of-Delft while enabling real-time edge performance. Together, these design choices substantively improve radar-only detection efficiency and accuracy, and point to future work in end-to-end transformer-based radar perception and broader sensor fusion applications.

Abstract

Paper Structure (14 sections, 2 equations, 5 figures, 5 tables)

This paper contains 14 sections, 2 equations, 5 figures, 5 tables.

INTRODUCTION
RELATED WORK
4D Radar Object Detection
Transformers in Point Cloud Perception
METHOD
4D Radar Features
PillarAttention
Architecture and Scaling
EVALUATION
RadarPillars
4D Radar Features
PillarAttention
Backbone Scaling
CONCLUSION

Figures (5)

Figure 1: Example of our RadarPillars detection results on 4D radar. Cars are marked in red, pedestrians in green and cyclists in blue. The radial velocities of the points are indicated by arrows.
Figure 2: Absolute radial velocity $v_{r}$ compensated with ego motion of 4D radar. As an object moves, $v_{r}$ changes depending on its heading angle to the sensor. The cars actual velocity $v$ remains unknown, as its heading cannot be determined. However, $v_{r}$ can be decomposed into its $x$ and $y$ components to provide additional features. The coordinate system and nomenclature follows the View-of-Delft dataset View-of-Delft.
Figure 3: Overview of our PillarAttention. We leverage the sparsity of radar point clouds by using a mask to gather features from non-empty pillars, reducing spatial size from $H, W$ to $p$. Each pillar-feature with $C$ channels is treated as a token for the calculation of self-attention. Our PillarAttenion is encapsulated in a transformer layer, with the feed-forward network (FFN) consisting of Layer Norm, followed by two MLPs with the GeLU activation between them. The hidden dimension $E$ of PillarAttention is controlled by a MLP before and after the layer. Finally, the pillar features with $C$ channels are scattered back to their original position within the grid. Our PillarAttention does not use position embedding.
Figure 4: Combination of our proposed methods forming RadarPillars, in comparison to the baseline PointPillars PointPillars. Results for 1-frame object detection precision for the entire radar area on the View-of-Delft dataset View-of-Delft. The frame rate was evaluated on a Nvidia AGX Xavier 32GB.
Figure 5: Weight magnitude analysis comparing various channel sizes for uniformly scaling RadarPillars. Results show that the weight strength increases with decreased network size. This visualization excludes dead weights and outliers.

RadarPillars: Efficient Object Detection from 4D Radar Point Clouds

TL;DR

Abstract

RadarPillars: Efficient Object Detection from 4D Radar Point Clouds

Authors

TL;DR

Abstract

Table of Contents

Figures (5)