Self-Supervised Moving Object Segmentation of Sparse and Noisy Radar Point Clouds
Leon Schwarzer, Matthias Zeller, Daniel Casado Herraez, Simon Dierl, Michael Heidingsfeld, Cyrill Stachniss
TL;DR
The paper tackles single-scan moving object segmentation in sparse, noisy radar point clouds by proposing a self-supervised pretraining strategy built on a clustering-based motion-aware contrastive loss (MACL) within a student–teacher framework over a Radar Instance Transformer backbone. MACL leverages clustering (via HDBSCAN) and dynamic points removal (DPR) to form motion-aware representations and aligns clusters across the student and teacher to shape a discriminative representation space, followed by supervised fine-tuning with limited annotations. Empirical results on View-of-D Delft and RadarScenes demonstrate improved label efficiency and competitive or superior MOS performance compared to fully supervised baselines and prior self-supervised methods, with notable gains at low annotation budgets. Overall, the approach reduces annotation requirements while enhancing radar-based perception robustness in dynamic driving scenarios, contributing to safer autonomous systems.
Abstract
Moving object segmentation is a crucial task for safe and reliable autonomous mobile systems like self-driving cars, improving the reliability and robustness of subsequent tasks like SLAM or path planning. While the segmentation of camera or LiDAR data is widely researched and achieves great results, it often introduces an increased latency by requiring the accumulation of temporal sequences to gain the necessary temporal context. Radar sensors overcome this problem with their ability to provide a direct measurement of a point's Doppler velocity, which can be exploited for single-scan moving object segmentation. However, radar point clouds are often sparse and noisy, making data annotation for use in supervised learning very tedious, time-consuming, and cost-intensive. To overcome this problem, we address the task of self-supervised moving object segmentation of sparse and noisy radar point clouds. We follow a two-step approach of contrastive self-supervised representation learning with subsequent supervised fine-tuning using limited amounts of annotated data. We propose a novel clustering-based contrastive loss function with cluster refinement based on dynamic points removal to pretrain the network to produce motion-aware representations of the radar data. Our method improves label efficiency after fine-tuning, effectively boosting state-of-the-art performance by self-supervised pretraining.
