4D-CS: Exploiting Cluster Prior for 4D Spatio-Temporal LiDAR Semantic Segmentation
Jiexi Zhong, Zhiheng Li, Yubo Cui, Zheng Fang
TL;DR
The paper tackles the problem of inconsistent LiDAR point segmentation across space and time in autonomous driving contexts. It introduces 4D-CS, a dual-branch architecture that combines a point-based branch with multi-view temporal fusion and a cluster-based branch that uses DBSCAN-derived cluster labels and temporal cluster enhancement, followed by adaptive fusion to produce coherent predictions. Key contributions include explicit generation of cluster labels across frames, multi-view temporal fusion, temporal cluster enhancement, and an adaptive prediction fusion mechanism, achieving state-of-the-art results on SemanticKITTI and nuScenes for multi-scan semantic and moving-object segmentation. The approach improves segmentation integrity for large foreground objects and enhances motion-state estimation, with practical impact for robust autonomous perception and mapping.
Abstract
Semantic segmentation of LiDAR points has significant value for autonomous driving and mobile robot systems. Most approaches explore spatio-temporal information of multi-scan to identify the semantic classes and motion states for each point. However, these methods often overlook the segmentation consistency in space and time, which may result in point clouds within the same object being predicted as different categories. To handle this issue, our core idea is to generate cluster labels across multiple frames that can reflect the complete spatial structure and temporal information of objects. These labels serve as explicit guidance for our dual-branch network, 4D-CS, which integrates point-based and cluster-based branches to enable more consistent segmentation. Specifically, in the point-based branch, we leverage historical knowledge to enrich the current feature through temporal fusion on multiple views. In the cluster-based branch, we propose a new strategy to produce cluster labels of foreground objects and apply them to gather point-wise information to derive cluster features. We then merge neighboring clusters across multiple scans to restore missing features due to occlusion. Finally, in the point-cluster fusion stage, we adaptively fuse the information from the two branches to optimize segmentation results. Extensive experiments confirm the effectiveness of the proposed method, and we achieve state-of-the-art results on the multi-scan semantic and moving object segmentation on SemanticKITTI and nuScenes datasets. The code will be available at https://github.com/NEU-REAL/4D-CS.git.
