ICP-4D: Bridging Iterative Closest Point and LiDAR Panoptic Segmentation
Gyeongrok Oh, Youngdong Jang, Jonghyun Choi, Suk-Ju Kang, Guang Lin, Sangpil Kim
TL;DR
ICP-4D introduces a training-free framework for 4D LiDAR panoptic segmentation by aligning temporally consistent instance point sets with ICP, enhanced with Sinkhorn-based soft matching to handle noisy predictions. The method partitions instances into static, dynamic, and missing types and uses a memory bank to address occlusions, achieving robust, occlusion-aware matching with minimal inputs. Across SemanticKITTI and panoptic nuScenes, ICP-4D delivers state-of-the-art association quality (LSTQ), often using a single scan, and offers substantial memory and runtime efficiency advantages over training-dependent approaches. This work demonstrates that strong 3D panoptic models combined with geometry-based temporal registration can realize scalable, high-quality 4D perception without additional training data or multi-scan inputs.
Abstract
Dominant paradigms for 4D LiDAR panoptic segmentation are usually required to train deep neural networks with large superimposed point clouds or design dedicated modules for instance association. However, these approaches perform redundant point processing and consequently become computationally expensive, yet still overlook the rich geometric priors inherently provided by raw point clouds. To this end, we introduce ICP-4D, a simple yet effective training-free framework that unifies spatial and temporal reasoning through geometric relations among instance-level point sets. Specifically, we apply the Iterative Closest Point (ICP) algorithm to directly associate temporally consistent instances by aligning the source and target point sets through the estimated transformation. To stabilize association under noisy instance predictions, we introduce a Sinkhorn-based soft matching. This exploits the underlying instance distribution to obtain accurate point-wise correspondences, resulting in robust geometric alignment. Furthermore, our carefully designed pipeline, which considers three instance types-static, dynamic, and missing-offers computational efficiency and occlusion-aware matching. Our extensive experiments across both SemanticKITTI and panoptic nuScenes demonstrate that our method consistently outperforms state-of-the-art approaches, even without additional training or extra point cloud inputs.
