RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation
Changsong Pang, Xieyuanli Chen, Yimin Liu, Huimin Lu, Yuwei Cheng
TL;DR
This work targets MOS and EVE using radar data, addressing limitations of LiDAR/camera under adverse conditions. It proposes RadarMOSEVE, a spatial-temporal transformer that employs radar-specific self-attention (object and scenario) and cross-attention to exploit sparse radar points and Doppler velocity from two-frame inputs $oldsymbol{P}_t$ and $oldsymbol{P}_{t-a}$, while a Doppler-based loss guides ego-velocity $\\hat{v}$ and velocity compensation calibrates radial velocities $v_i$ for MOS. The approach yields a dual-task framework with an EVE module feeding a MOS module, achieving state-of-the-art performance on the VoD dataset and a new radar dataset, and it provides substantial ablation evidence for the importance of velocity-aware calibration and the proposed attention mechanisms. This work advances radar-based MOSEVE research, enabling robust ego-velocity estimation and moving-object segmentation in challenging sensing conditions and paving the way for radar-centric autonomous systems.
Abstract
Moving object segmentation (MOS) and Ego velocity estimation (EVE) are vital capabilities for mobile systems to achieve full autonomy. Several approaches have attempted to achieve MOSEVE using a LiDAR sensor. However, LiDAR sensors are typically expensive and susceptible to adverse weather conditions. Instead, millimeter-wave radar (MWR) has gained popularity in robotics and autonomous driving for real applications due to its cost-effectiveness and resilience to bad weather. Nonetheless, publicly available MOSEVE datasets and approaches using radar data are limited. Some existing methods adopt point convolutional networks from LiDAR-based approaches, ignoring the specific artifacts and the valuable radial velocity information of radar measurements, leading to suboptimal performance. In this paper, we propose a novel transformer network that effectively addresses the sparsity and noise issues and leverages the radial velocity measurements of radar points using our devised radar self- and cross-attention mechanisms. Based on that, our method achieves accurate EVE of the robot and performs MOS using only radar data simultaneously. To thoroughly evaluate the MOSEVE performance of our method, we annotated the radar points in the public View-of-Delft (VoD) dataset and additionally constructed a new radar dataset in various environments. The experimental results demonstrate the superiority of our approach over existing state-of-the-art methods. The code is available at https://github.com/ORCA-Uboat/RadarMOSEVE.
