Table of Contents
Fetching ...

Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model

Ruibin Zhang, Donglai Xue, Yuhan Wang, Ruixu Geng, Fei Gao

TL;DR

This letter proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning and introduces diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data.

Abstract

Millimeter wave (mmWave) radars have attracted significant attention from both academia and industry due to their capability to operate in extreme weather conditions. However, they face challenges in terms of sparsity and noise interference, which hinder their application in the field of micro aerial vehicle (MAV) autonomous navigation. To this end, this paper proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning. Specifically, we introduce diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data. We also incorporate the most recent diffusion model inference accelerating techniques to ensure that the proposed method can be implemented on MAVs with limited computing resources.We validate the proposed method through extensive benchmark comparisons and real-world experiments, demonstrating its superior performance and generalization ability. Code and pretrained models will be available at https://github.com/ZJU-FAST-Lab/Radar-Diffusion.

Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model

TL;DR

This letter proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning and introduces diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data.

Abstract

Millimeter wave (mmWave) radars have attracted significant attention from both academia and industry due to their capability to operate in extreme weather conditions. However, they face challenges in terms of sparsity and noise interference, which hinder their application in the field of micro aerial vehicle (MAV) autonomous navigation. To this end, this paper proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning. Specifically, we introduce diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data. We also incorporate the most recent diffusion model inference accelerating techniques to ensure that the proposed method can be implemented on MAVs with limited computing resources.We validate the proposed method through extensive benchmark comparisons and real-world experiments, demonstrating its superior performance and generalization ability. Code and pretrained models will be available at https://github.com/ZJU-FAST-Lab/Radar-Diffusion.
Paper Structure (23 sections, 10 equations, 6 figures, 2 tables)

This paper contains 23 sections, 10 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A reconstruction quality comparison carried out at Huzhou Institute of Zhejiang University. Compared to the dense and accurate mapping results of LiDAR, a single-chip mmWave wave radar can only generate sparse and noisy point clouds through traditional radar target detectors. However, with the proposed diffusion model-based method, we are able to generate radar point clouds that are close to the ground truth in one-step generation. Moreover, the model we use is trained on a public dataset with completely different scenes and sensor configurations, revealing the generalization ability and robustness of our method.
  • Figure 2: (a) the image restoration tasks where low-resolution and noise-corrupted images can be restored to ground truth images using neural networks. (b) Spatially and temporally aligned radar range-azimuth heatmap (RAH) and LiDAR bird's eye view (BEV) image expressed in polar coordinate. As can be seen, the angular resolution of millimeter-wave radar is considerably lower than that of LiDAR, and it is heavily affected by noise. Therefore, predicting LiDAR BEV images from paired RAHs can be modeled as image restoration. (c) Diagram of the architecture of our proposed approach. During training, ground truth Lidar point clouds $\mathbf{x}_0$ are corrupted to $\mathbf{x}_t$ through diffusion process, then the neural network is trained to estimate the ground truth conditioned on the paired radar RAHs. During inference, The neural network directly predicts $\hat{\mathbf{x}_0}$ from pure Gaussian noise $\mathbf{x}_T$ and radar RAHs. To resolve the iterative sampling issue in diffusion models, we incorporate consistency modelssong2023consistency that enable one-step generation from $\mathbf{x}_T$ to $\hat{\mathbf{x}_0}$ in our approach.
  • Figure 3: Illustration of the mmWave radar signal model and data preprocessing. In the radar data cube, the elevation dimensions is omitted for the convenience of visualization.
  • Figure 4: Our customized hand-held sensor platform. An embedded computer is placed inside it for sensor driving and data recording.
  • Figure 5: Qualitative comparisons of single-frame 2D point clouds conducted on the ColoRadar dataset. Typical examples of radar RAH input, LiDAR ground truth and the generated radar point clouds in the Aspen (indoor), Edgar (mine), Outdoor (outdoor), and Longboard (outdoor) scenes are shown above. The results of each method are visualized in different colors. Note that in RPDNet and OS-CFAR, the radar input also includes Doppler information. However, for the sake of visualization convenience, we only present the RAHs.
  • ...and 1 more figures