RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation

Changsong Pang; Xieyuanli Chen; Yimin Liu; Huimin Lu; Yuwei Cheng

RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation

Changsong Pang, Xieyuanli Chen, Yimin Liu, Huimin Lu, Yuwei Cheng

TL;DR

This work targets MOS and EVE using radar data, addressing limitations of LiDAR/camera under adverse conditions. It proposes RadarMOSEVE, a spatial-temporal transformer that employs radar-specific self-attention (object and scenario) and cross-attention to exploit sparse radar points and Doppler velocity from two-frame inputs $oldsymbol{P}_t$ and $oldsymbol{P}_{t-a}$, while a Doppler-based loss guides ego-velocity $\\hat{v}$ and velocity compensation calibrates radial velocities $v_i$ for MOS. The approach yields a dual-task framework with an EVE module feeding a MOS module, achieving state-of-the-art performance on the VoD dataset and a new radar dataset, and it provides substantial ablation evidence for the importance of velocity-aware calibration and the proposed attention mechanisms. This work advances radar-based MOSEVE research, enabling robust ego-velocity estimation and moving-object segmentation in challenging sensing conditions and paving the way for radar-centric autonomous systems.

Abstract

Moving object segmentation (MOS) and Ego velocity estimation (EVE) are vital capabilities for mobile systems to achieve full autonomy. Several approaches have attempted to achieve MOSEVE using a LiDAR sensor. However, LiDAR sensors are typically expensive and susceptible to adverse weather conditions. Instead, millimeter-wave radar (MWR) has gained popularity in robotics and autonomous driving for real applications due to its cost-effectiveness and resilience to bad weather. Nonetheless, publicly available MOSEVE datasets and approaches using radar data are limited. Some existing methods adopt point convolutional networks from LiDAR-based approaches, ignoring the specific artifacts and the valuable radial velocity information of radar measurements, leading to suboptimal performance. In this paper, we propose a novel transformer network that effectively addresses the sparsity and noise issues and leverages the radial velocity measurements of radar points using our devised radar self- and cross-attention mechanisms. Based on that, our method achieves accurate EVE of the robot and performs MOS using only radar data simultaneously. To thoroughly evaluate the MOSEVE performance of our method, we annotated the radar points in the public View-of-Delft (VoD) dataset and additionally constructed a new radar dataset in various environments. The experimental results demonstrate the superiority of our approach over existing state-of-the-art methods. The code is available at https://github.com/ORCA-Uboat/RadarMOSEVE.

RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation

TL;DR

and

, while a Doppler-based loss guides ego-velocity

and velocity compensation calibrates radial velocities

for MOS. The approach yields a dual-task framework with an EVE module feeding a MOS module, achieving state-of-the-art performance on the VoD dataset and a new radar dataset, and it provides substantial ablation evidence for the importance of velocity-aware calibration and the proposed attention mechanisms. This work advances radar-based MOSEVE research, enabling robust ego-velocity estimation and moving-object segmentation in challenging sensing conditions and paving the way for radar-centric autonomous systems.

Abstract

Paper Structure (19 sections, 7 equations, 4 figures, 4 tables)

This paper contains 19 sections, 7 equations, 4 figures, 4 tables.

Introduction
Related Work
Moving Object Segmentation
Ego-velocity Estimation
Radar Transformer
Radar Self-Attention
Radar Cross-Attention
Radar MOSEVE Network
Overview
Ego-velocity Estimation
Moving Object Segmentation
Network Training
Experimental Results
Experimental Setup
Evaluation on MOS
...and 4 more sections

Figures (4)

Figure 1: Our MOSEVE network takes two frames of radar point clouds as input and outputs the current ego velocity of the robot. The MOS module takes the velocity-calibrated point clouds to provide the moving segmentation.
Figure 2: The orange point is the source point, the green points are the points sampled by the source point and the blue points are the other points. (a) is the sampling result if $k$ is small, (b) is the sampling result if $k$ is large, (c) is the sampling strategy of Object Attention and (d) is the sampling strategy for Scenario Attention.
Figure 3: MOS-EVE network for ego-velocity estimation(EVE) and moving object segmentation(MOS module)
Figure 4: Qualitative results of PT, 4DMOS and ours. Red points are the true moving points, gray points are the true static points, green points are the false moving points, and blue points are the false static points. Better view with colors.

RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation

TL;DR

Abstract

RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)