Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

Cong Fan; Shengkai Zhang; Kezhong Liu; Shuai Wang; Zheng Yang; Wei Wang

Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

Cong Fan, Shengkai Zhang, Kezhong Liu, Shuai Wang, Zheng Yang, Wei Wang

TL;DR

mmEMP tackles sparse mmWave radar point clouds by leveraging visual-inertial supervision from low-cost cameras and IMUs to densify radar data without LiDAR; it introduces a dynamic VI 3D reconstruction to recover moving feature positions and a VI-informed refinement pipeline to remove spurious multipath points while densifying the radar point cloud. The method uses a non-linear least-squares formulation for moving features, a GAN-based densification of range-Doppler inputs, and a rigid-transformation learning/space-time stability mechanism to align adjacent frames. A large real-world dataset of radar RDMs, images, and IMU data demonstrates that mmEMP achieves performance competitive with LiDAR-supervised state-of-the-art in point-density and geometry metrics, and yields tangible improvements in object detection, localization, and mapping. This approach enables crowdsourced training on commercial vehicles, potentially lowering the cost barrier for robust radar-based perception in adverse weather conditions.

Abstract

Complementary to prevalent LiDAR and camera systems, millimeter-wave (mmWave) radar is robust to adverse weather conditions like fog, rainstorms, and blizzards but offers sparse point clouds. Current techniques enhance the point cloud by the supervision of LiDAR's data. However, high-performance LiDAR is notably expensive and is not commonly available on vehicles. This paper presents mmEMP, a supervised learning approach that enhances radar point clouds using a low-cost camera and an inertial measurement unit (IMU), enabling crowdsourcing training data from commercial vehicles. Bringing the visual-inertial (VI) supervision is challenging due to the spatial agnostic of dynamic objects. Moreover, spurious radar points from the curse of RF multipath make robots misunderstand the scene. mmEMP first devises a dynamic 3D reconstruction algorithm that restores the 3D positions of dynamic features. Then, we design a neural network that densifies radar data and eliminates spurious radar points. We build a new dataset in the real world. Extensive experiments show that mmEMP achieves competitive performance compared with the SOTA approach training by LiDAR's data. In addition, we use the enhanced point cloud to perform object detection, localization, and mapping to demonstrate mmEMP's effectiveness.

Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

TL;DR

Abstract

Paper Structure (9 sections, 11 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 9 sections, 11 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Related Work
System Design of mmEMP
Dynamic Visual-Inertial 3D Reconstruction
Point Cloud Generation and Refinement
System Implementation and Evaluation
Implementation and Experimental Setup
Performance Evaluation
Conclusion

Figures (10)

Figure 1: mmEMP takes images, inertial measurements, and range-Doppler matrices (RDMs) to train a model for enhancing radar point clouds. In the test, the vehicle uses the enhanced point clouds to improve various applications, e.g., object detection, localization, and mapping.
Figure 2: A preliminary study shows that dynamic visual features with wrong 3D positions significantly degenerate the performance of point cloud generation.
Figure 3: The geometry of a dynamic feature between two camera frames.
Figure 4: Dynamic features on a rigid object share the same translation.
Figure 5: An overview of our data processing pipeline. The measurements from the visual-inertial sensor suit are only used in training (modules in the yellow box) so that mmEMP works fine in adverse weather conditions.
...and 5 more figures

Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

TL;DR

Abstract

Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

Authors

TL;DR

Abstract

Table of Contents

Figures (10)