Table of Contents
Fetching ...

Self-Supervised Moving Object Segmentation of Sparse and Noisy Radar Point Clouds

Leon Schwarzer, Matthias Zeller, Daniel Casado Herraez, Simon Dierl, Michael Heidingsfeld, Cyrill Stachniss

TL;DR

The paper tackles single-scan moving object segmentation in sparse, noisy radar point clouds by proposing a self-supervised pretraining strategy built on a clustering-based motion-aware contrastive loss (MACL) within a student–teacher framework over a Radar Instance Transformer backbone. MACL leverages clustering (via HDBSCAN) and dynamic points removal (DPR) to form motion-aware representations and aligns clusters across the student and teacher to shape a discriminative representation space, followed by supervised fine-tuning with limited annotations. Empirical results on View-of-D Delft and RadarScenes demonstrate improved label efficiency and competitive or superior MOS performance compared to fully supervised baselines and prior self-supervised methods, with notable gains at low annotation budgets. Overall, the approach reduces annotation requirements while enhancing radar-based perception robustness in dynamic driving scenarios, contributing to safer autonomous systems.

Abstract

Moving object segmentation is a crucial task for safe and reliable autonomous mobile systems like self-driving cars, improving the reliability and robustness of subsequent tasks like SLAM or path planning. While the segmentation of camera or LiDAR data is widely researched and achieves great results, it often introduces an increased latency by requiring the accumulation of temporal sequences to gain the necessary temporal context. Radar sensors overcome this problem with their ability to provide a direct measurement of a point's Doppler velocity, which can be exploited for single-scan moving object segmentation. However, radar point clouds are often sparse and noisy, making data annotation for use in supervised learning very tedious, time-consuming, and cost-intensive. To overcome this problem, we address the task of self-supervised moving object segmentation of sparse and noisy radar point clouds. We follow a two-step approach of contrastive self-supervised representation learning with subsequent supervised fine-tuning using limited amounts of annotated data. We propose a novel clustering-based contrastive loss function with cluster refinement based on dynamic points removal to pretrain the network to produce motion-aware representations of the radar data. Our method improves label efficiency after fine-tuning, effectively boosting state-of-the-art performance by self-supervised pretraining.

Self-Supervised Moving Object Segmentation of Sparse and Noisy Radar Point Clouds

TL;DR

The paper tackles single-scan moving object segmentation in sparse, noisy radar point clouds by proposing a self-supervised pretraining strategy built on a clustering-based motion-aware contrastive loss (MACL) within a student–teacher framework over a Radar Instance Transformer backbone. MACL leverages clustering (via HDBSCAN) and dynamic points removal (DPR) to form motion-aware representations and aligns clusters across the student and teacher to shape a discriminative representation space, followed by supervised fine-tuning with limited annotations. Empirical results on View-of-D Delft and RadarScenes demonstrate improved label efficiency and competitive or superior MOS performance compared to fully supervised baselines and prior self-supervised methods, with notable gains at low annotation budgets. Overall, the approach reduces annotation requirements while enhancing radar-based perception robustness in dynamic driving scenarios, contributing to safer autonomous systems.

Abstract

Moving object segmentation is a crucial task for safe and reliable autonomous mobile systems like self-driving cars, improving the reliability and robustness of subsequent tasks like SLAM or path planning. While the segmentation of camera or LiDAR data is widely researched and achieves great results, it often introduces an increased latency by requiring the accumulation of temporal sequences to gain the necessary temporal context. Radar sensors overcome this problem with their ability to provide a direct measurement of a point's Doppler velocity, which can be exploited for single-scan moving object segmentation. However, radar point clouds are often sparse and noisy, making data annotation for use in supervised learning very tedious, time-consuming, and cost-intensive. To overcome this problem, we address the task of self-supervised moving object segmentation of sparse and noisy radar point clouds. We follow a two-step approach of contrastive self-supervised representation learning with subsequent supervised fine-tuning using limited amounts of annotated data. We propose a novel clustering-based contrastive loss function with cluster refinement based on dynamic points removal to pretrain the network to produce motion-aware representations of the radar data. Our method improves label efficiency after fine-tuning, effectively boosting state-of-the-art performance by self-supervised pretraining.

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Our method improves the moving object segmentation performance of the underlying radar instance transformer architecture zeller2024tor by utilizing unannotated data to pretrain the network. The reference image shows that our approach correctly predicts the two cars in front as moving, in contrast to the base architecture trained from scratch. Best viewed in color.
  • Figure 2: Our self-supervised framework uses a student-teacher architecture employing a modified version of the radar instance transformer architecture zeller2024tor together with our proposed contrastive cluster-based L2-loss with cluster refinement using the dynamic points removal kellner2013itsc.
  • Figure 3: Illustration of the final process of loss calculation given the representation space centroids and dynamic points removal motion segmentation masks for the student and teacher inputs