Table of Contents
Fetching ...

SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection

Dong-Hee Paek, Seung-Hyun Kong

TL;DR

4D Radar enables robust 3D object detection but imposes high energy costs due to dense point clouds. The authors propose SpikingRTNH, the first spiking neural network for 4D Radar-based detection, replacing ReLU with leaky integrate-and-fire neurons and introducing Biological Top-down Inference (BTI) to process high-density to low-density point clouds. On the K-Radar dataset, SpikingRTNH with BTI achieves a $78\%$ energy reduction while maintaining competitive accuracy ($AP_{3D}=51.1\%$, $AP_{BEV}=57.0\%$) compared to the ANN baseline, demonstrating the practicality of SNNs for energy-efficient autonomous driving perception. These results showcase a viable path toward low-power, high-performance 4D Radar perception in real-world driving scenarios, with code released at https://github.com/kaist-avelab/k-radar.

Abstract

Recently, 4D Radar has emerged as a crucial sensor for 3D object detection in autonomous vehicles, offering both stable perception in adverse weather and high-density point clouds for object shape recognition. However, processing such high-density data demands substantial computational resources and energy consumption. We propose SpikingRTNH, the first spiking neural network (SNN) for 3D object detection using 4D Radar data. By replacing conventional ReLU activation functions with leaky integrate-and-fire (LIF) spiking neurons, SpikingRTNH achieves significant energy efficiency gains. Furthermore, inspired by human cognitive processes, we introduce biological top-down inference (BTI), which processes point clouds sequentially from higher to lower densities. This approach effectively utilizes points with lower noise and higher importance for detection. Experiments on K-Radar dataset demonstrate that SpikingRTNH with BTI significantly reduces energy consumption by 78% while achieving comparable detection performance to its ANN counterpart (51.1% AP 3D, 57.0% AP BEV). These results establish the viability of SNNs for energy-efficient 4D Radar-based object detection in autonomous driving systems. All codes are available at https://github.com/kaist-avelab/k-radar.

SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection

TL;DR

4D Radar enables robust 3D object detection but imposes high energy costs due to dense point clouds. The authors propose SpikingRTNH, the first spiking neural network for 4D Radar-based detection, replacing ReLU with leaky integrate-and-fire neurons and introducing Biological Top-down Inference (BTI) to process high-density to low-density point clouds. On the K-Radar dataset, SpikingRTNH with BTI achieves a energy reduction while maintaining competitive accuracy (, ) compared to the ANN baseline, demonstrating the practicality of SNNs for energy-efficient autonomous driving perception. These results showcase a viable path toward low-power, high-performance 4D Radar perception in real-world driving scenarios, with code released at https://github.com/kaist-avelab/k-radar.

Abstract

Recently, 4D Radar has emerged as a crucial sensor for 3D object detection in autonomous vehicles, offering both stable perception in adverse weather and high-density point clouds for object shape recognition. However, processing such high-density data demands substantial computational resources and energy consumption. We propose SpikingRTNH, the first spiking neural network (SNN) for 3D object detection using 4D Radar data. By replacing conventional ReLU activation functions with leaky integrate-and-fire (LIF) spiking neurons, SpikingRTNH achieves significant energy efficiency gains. Furthermore, inspired by human cognitive processes, we introduce biological top-down inference (BTI), which processes point clouds sequentially from higher to lower densities. This approach effectively utilizes points with lower noise and higher importance for detection. Experiments on K-Radar dataset demonstrate that SpikingRTNH with BTI significantly reduces energy consumption by 78% while achieving comparable detection performance to its ANN counterpart (51.1% AP 3D, 57.0% AP BEV). These results establish the viability of SNNs for energy-efficient 4D Radar-based object detection in autonomous driving systems. All codes are available at https://github.com/kaist-avelab/k-radar.

Paper Structure

This paper contains 12 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: 4D Radar system and data representation: (a) Overview of the 4D Radar-based 3D object detection system. Radio waves reflected from target objects (e.g., vehicles) are converted into digital signals in RF front-end. A 2D FFT then generates a Range-Doppler Map, and an Angle-FFT extracts the 4D Radar tensor. Next, CFAR filtering produces the 4D Radar point cloud, which is finally processed by a neural network to detect objects. (b) Various data modalities of 4D Radar shown with different point cloud densities. From left to right: a camera image (with a yellow 3D bounding box indicating a vehicle), a raw 4D Radar tensor (showing all measurements), and 4D Radar point clouds at different densities (0.1%, 1%, 10%). A LiDAR point cloud is overlaid for reference. The bottom row compares key characteristics (Density, Noise, Computation) between conventional Radar and recent 4D Radar radar_tutorial. While conventional Radar uses approximately 0.1% of data points for presence detection (resulting in lower noise and computational cost), modern 4D Radar leverages around 10% density for detailed 3D shape recognition (leading to higher noise but richer shape information).
  • Figure 2: Network architecture comparison between RTNH and SpikingRTNH. Top: RTNH processes a single high-density 4D Radar point cloud using ReLU activation functions, generating three stages of feature maps (FM1,2,3). The feature maps are projected to bird's-eye-view (BEV) and concatenated for final object detection. Bottom: SpikingRTNH sequentially processes point clouds from higher to lower densities ($t$=1:$T$) by replacing ReLU with leaky integrate-and-fire (LIF) neurons. This biological top-down inference (BTI) approach leverages the observation that lower-density point clouds contain fewer noisy points while retaining the most critical points for object detection. LiDAR point cloud is included for reference only and is not used in the network processing.
  • Figure 3: Qualitative comparison of 3D object detection results across different weather conditions (Normal, Sleet, and Heavy snow). The first column shows ground truth bounding boxes (red), while the second column displays RTNH (ANN) predictions (yellow). The next three columns demonstrate SpikingRTNH (SNN) predictions at different time steps ($T$=1,2,3), where solid yellow boxes indicate predictions and dashed orange boxes represent false alarms. The rightmost column shows corresponding front camera images for reference. The heatmap colors represent normalized power values from 0.0 (blue) to 1.0 (red).