Table of Contents
Fetching ...

ICP-4D: Bridging Iterative Closest Point and LiDAR Panoptic Segmentation

Gyeongrok Oh, Youngdong Jang, Jonghyun Choi, Suk-Ju Kang, Guang Lin, Sangpil Kim

TL;DR

ICP-4D introduces a training-free framework for 4D LiDAR panoptic segmentation by aligning temporally consistent instance point sets with ICP, enhanced with Sinkhorn-based soft matching to handle noisy predictions. The method partitions instances into static, dynamic, and missing types and uses a memory bank to address occlusions, achieving robust, occlusion-aware matching with minimal inputs. Across SemanticKITTI and panoptic nuScenes, ICP-4D delivers state-of-the-art association quality (LSTQ), often using a single scan, and offers substantial memory and runtime efficiency advantages over training-dependent approaches. This work demonstrates that strong 3D panoptic models combined with geometry-based temporal registration can realize scalable, high-quality 4D perception without additional training data or multi-scan inputs.

Abstract

Dominant paradigms for 4D LiDAR panoptic segmentation are usually required to train deep neural networks with large superimposed point clouds or design dedicated modules for instance association. However, these approaches perform redundant point processing and consequently become computationally expensive, yet still overlook the rich geometric priors inherently provided by raw point clouds. To this end, we introduce ICP-4D, a simple yet effective training-free framework that unifies spatial and temporal reasoning through geometric relations among instance-level point sets. Specifically, we apply the Iterative Closest Point (ICP) algorithm to directly associate temporally consistent instances by aligning the source and target point sets through the estimated transformation. To stabilize association under noisy instance predictions, we introduce a Sinkhorn-based soft matching. This exploits the underlying instance distribution to obtain accurate point-wise correspondences, resulting in robust geometric alignment. Furthermore, our carefully designed pipeline, which considers three instance types-static, dynamic, and missing-offers computational efficiency and occlusion-aware matching. Our extensive experiments across both SemanticKITTI and panoptic nuScenes demonstrate that our method consistently outperforms state-of-the-art approaches, even without additional training or extra point cloud inputs.

ICP-4D: Bridging Iterative Closest Point and LiDAR Panoptic Segmentation

TL;DR

ICP-4D introduces a training-free framework for 4D LiDAR panoptic segmentation by aligning temporally consistent instance point sets with ICP, enhanced with Sinkhorn-based soft matching to handle noisy predictions. The method partitions instances into static, dynamic, and missing types and uses a memory bank to address occlusions, achieving robust, occlusion-aware matching with minimal inputs. Across SemanticKITTI and panoptic nuScenes, ICP-4D delivers state-of-the-art association quality (LSTQ), often using a single scan, and offers substantial memory and runtime efficiency advantages over training-dependent approaches. This work demonstrates that strong 3D panoptic models combined with geometry-based temporal registration can realize scalable, high-quality 4D perception without additional training data or multi-scan inputs.

Abstract

Dominant paradigms for 4D LiDAR panoptic segmentation are usually required to train deep neural networks with large superimposed point clouds or design dedicated modules for instance association. However, these approaches perform redundant point processing and consequently become computationally expensive, yet still overlook the rich geometric priors inherently provided by raw point clouds. To this end, we introduce ICP-4D, a simple yet effective training-free framework that unifies spatial and temporal reasoning through geometric relations among instance-level point sets. Specifically, we apply the Iterative Closest Point (ICP) algorithm to directly associate temporally consistent instances by aligning the source and target point sets through the estimated transformation. To stabilize association under noisy instance predictions, we introduce a Sinkhorn-based soft matching. This exploits the underlying instance distribution to obtain accurate point-wise correspondences, resulting in robust geometric alignment. Furthermore, our carefully designed pipeline, which considers three instance types-static, dynamic, and missing-offers computational efficiency and occlusion-aware matching. Our extensive experiments across both SemanticKITTI and panoptic nuScenes demonstrate that our method consistently outperforms state-of-the-art approaches, even without additional training or extra point cloud inputs.

Paper Structure

This paper contains 34 sections, 16 equations, 10 figures, 11 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparison of 4D LiDAR panoptic segmentation methods: (a) IoU-based, (b) Query-propagated, (c) Detect & Track, and (d) ICP-4D. Instead of relying on training with large-scale point clouds, ours achieves reliable association in a fully training-free manner. indicates methods that require training and denotes frozen network.
  • Figure 2: Illustration of Sinkhorn-based soft matching. (a) Nearest neighbor-based matching aligns each source point with its closest counterpart, focusing only on local point proximity. (b) In contrast, correspondences are computed by transport plan $\mathcal{Q}$, enabling instance-aware matching that respects the global geometry of each instance point set.
  • Figure 3: Qualitative comparison on SemanticKITTI validation set. We visualize the association results over five consecutive scans for both the baselines and our ICP-4D. Different colors represent different instances. The dotted boxes zoom in for clear comparison. For clarity, we use circled numbers (①–⑤) to represent the frame indices.
  • Figure 4: Comparison of computational efficiency. We illustrate the efficiency trade-offs between memory usage (left) & runtime (right) and performance across different methods. Yellow triangles show LSTQ scores for each model. Red dashed line denotes our memory & runtime, and yellow dashed line marks our LSTQ score.
  • Figure 5: Point-wise correspondence visualization. Visualization of point-wise correspondences is provided to illustrate the effect of the Sinkhorn-based soft matching. $\textcolor{red}{\bullet}$ and $\textcolor{blue}{\bullet}$ represent the target and source points, respectively.
  • ...and 5 more figures