Table of Contents
Fetching ...

RadarMP: Motion Perception for 4D mmWave Radar in Autonomous Driving

Ruiqi Cheng, Huijun Di, Jian Li, Feng Liu, Wei Liang

TL;DR

RadarMP addresses the challenge of accurate 3D motion perception for autonomous driving using sparse, noisy 4D mmWave radar by jointly performing target detection and scene flow estimation on two consecutive radar tesseracts. The method introduces Doppler-aware encoding, cross-frame deformable attention for inter-frame correlation, and global motion pattern-aware self-attention, all trained with tailored self-supervised losses that leverage energy distribution and Doppler cues. Key contributions include a unified architecture that outputs consistent radar point clouds and pointwise 3D scene flow, a Doppler-encoded representation $\digamma_{dv}$, and three loss terms ($L_{se},L_{ef},L_{rfs}$) that supervise both segmentation and flow without explicit annotations. Experiments on the K-Radar dataset show substantial gains over decoupled radar pipelines and LiDAR-supervised baselines, demonstrating robust motion perception across weather and illumination conditions and enabling improved full-scenario autonomous driving perception using radar alone when optical sensors degrade.

Abstract

Accurate 3D scene motion perception significantly enhances the safety and reliability of an autonomous driving system. Benefiting from its all-weather operational capability and unique perceptual properties, 4D mmWave radar has emerged as an essential component in advanced autonomous driving. However, sparse and noisy radar points often lead to imprecise motion perception, leaving autonomous vehicles with limited sensing capabilities when optical sensors degrade under adverse weather conditions. In this paper, we propose RadarMP, a novel method for precise 3D scene motion perception using low-level radar echo signals from two consecutive frames. Unlike existing methods that separate radar target detection and motion estimation, RadarMP jointly models both tasks in a unified architecture, enabling consistent radar point cloud generation and pointwise 3D scene flow prediction. Tailored to radar characteristics, we design specialized self-supervised loss functions guided by Doppler shifts and echo intensity, effectively supervising spatial and motion consistency without explicit annotations. Extensive experiments on the public dataset demonstrate that RadarMP achieves reliable motion perception across diverse weather and illumination conditions, outperforming radar-based decoupled motion perception pipelines and enhancing perception capabilities for full-scenario autonomous driving systems.

RadarMP: Motion Perception for 4D mmWave Radar in Autonomous Driving

TL;DR

RadarMP addresses the challenge of accurate 3D motion perception for autonomous driving using sparse, noisy 4D mmWave radar by jointly performing target detection and scene flow estimation on two consecutive radar tesseracts. The method introduces Doppler-aware encoding, cross-frame deformable attention for inter-frame correlation, and global motion pattern-aware self-attention, all trained with tailored self-supervised losses that leverage energy distribution and Doppler cues. Key contributions include a unified architecture that outputs consistent radar point clouds and pointwise 3D scene flow, a Doppler-encoded representation , and three loss terms () that supervise both segmentation and flow without explicit annotations. Experiments on the K-Radar dataset show substantial gains over decoupled radar pipelines and LiDAR-supervised baselines, demonstrating robust motion perception across weather and illumination conditions and enabling improved full-scenario autonomous driving perception using radar alone when optical sensors degrade.

Abstract

Accurate 3D scene motion perception significantly enhances the safety and reliability of an autonomous driving system. Benefiting from its all-weather operational capability and unique perceptual properties, 4D mmWave radar has emerged as an essential component in advanced autonomous driving. However, sparse and noisy radar points often lead to imprecise motion perception, leaving autonomous vehicles with limited sensing capabilities when optical sensors degrade under adverse weather conditions. In this paper, we propose RadarMP, a novel method for precise 3D scene motion perception using low-level radar echo signals from two consecutive frames. Unlike existing methods that separate radar target detection and motion estimation, RadarMP jointly models both tasks in a unified architecture, enabling consistent radar point cloud generation and pointwise 3D scene flow prediction. Tailored to radar characteristics, we design specialized self-supervised loss functions guided by Doppler shifts and echo intensity, effectively supervising spatial and motion consistency without explicit annotations. Extensive experiments on the public dataset demonstrate that RadarMP achieves reliable motion perception across diverse weather and illumination conditions, outperforming radar-based decoupled motion perception pipelines and enhancing perception capabilities for full-scenario autonomous driving systems.

Paper Structure

This paper contains 50 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Motivation schematic. The red boxes mark a target inter-frame motion in three modalities. The image and LiDAR are shown for visualization purposes only. In the radar heatmap, target motion aligns with the direction of energy propagation, revealing our key motivation.
  • Figure 2: Tesseract generation pipeline. Radar antenna array transmits multiple chirp signals per cycle. After receiving the echoes, the signals are mixed and sampled by ADCs to obtain the raw radar data, which is transformed into a radar tesseract via multi-dimensional FFT.
  • Figure 3: Pipeline Overview. RadarMP processes two consecutive radar tesseracts through Doppler encoding and correlation feature extraction, followed by global motion pattern perception to derive motion cues, and finally decodes them into segmentation masks and flow predictions.
  • Figure 4: Qualitative results. The left side shows the motion perception output of RadarMP alongside LiDAR point clouds filtered by RoI with ground-truth scene flow. Columns 1–4 on the right side display radar target detection results from three segmentation baseline methods and our RadarMP (RadarMP-P), while rows 1–3 correspond to flow prediction results from different scene flow baselines. A dynamic object in the scene is zoomed in at the bottom right to highlight the accuracy of non-rigid motion estimation. Colors indicate motion vectors in the XY plane only and RGB image is used for visualization only.
  • Figure A1: Qualitative results under different scenes. RadarMP achieves excellent performance across four distinct scenes. In particular, under heavy snow where both camera and LiDAR modalities experience severe degradation, our method continues to deliver reliable target detection and motion estimation, highlighting its potential to improve the safety of autonomous driving in adverse environments. The dynamic objects in the scene have been zoomed in at the bottom right to highlight the accuracy of non-rigid motion estimation. Colors indicate motion vectors in the XY plane only, scene conditions are shown at the bottom, and the RGB image is used for visualization only.