Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
Ziliang Miao, Runjian Chen, Yixi Cai, Buwei He, Wenquan Zhao, Wenqi Shao, Bo Zhang, Fu Zhang
TL;DR
This work tackles the labeling burden in LiDAR Moving Object Segmentation (MOS) by introducing Temporal Overlapping Prediction (TOP), a self-supervised pre-training method that leverages occupancy changes of temporal overlapping points observed across the current and adjacent LiDAR scans. TOP pre-trains a sparse 4D UNet encoder by predicting occupancy states of overlapping points and by reconstructing current scene occupancy, avoiding noisy flow learning inherent in forecasting approaches. Through extensive few-shot and cross-dataset experiments on nuScenes and SemanticKITTI, TOP consistently improves object-level Recall$_{\text{obj}}$ and, to a degree, IoU$_{\text{w/o}}$, demonstrating strong transferability across LiDAR setups and applicability to related tasks like semantic segmentation. The results underscore the method’s practical significance for robust dynamic object perception in autonomous systems, with potential extensions to other temporal perception tasks.
Abstract
Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose Temporal Overlapping Prediction (TOP), a self-supervised pre-training method that alleviate the labeling burden for MOS. TOP explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called mIoU_obj to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that TOPoutperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
