Table of Contents
Fetching ...

Deformation-based In-Context Learning for Point Cloud Understanding

Chengxing Lin, Jinhong Deng, Yinjie Lei, Wen Li

Abstract

Recent advances in point cloud In-Context Learning (ICL) have demonstrated strong multitask capabilities. Existing approaches typically adopt a Masked Point Modeling (MPM)-based paradigm for point cloud ICL. However, MPM-based methods directly predict the target point cloud from masked tokens without leveraging geometric priors, requiring the model to infer spatial structure and geometric details solely from token-level correlations via transformers. Additionally, these methods suffer from a training-inference objective mismatch, as the model learns to predict the target point cloud using target-side information that is unavailable at inference time. To address these challenges, we propose DeformPIC, a deformation-based framework for point cloud ICL. Unlike existing approaches that rely on masked reconstruction, DeformPIC learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives. Extensive experiments demonstrate that DeformPIC consistently outperforms previous state-of-the-art methods, achieving reductions of 1.6, 1.8, and 4.7 points in average Chamfer Distance on reconstruction, denoising, and registration tasks, respectively. Furthermore, we introduce a new out-of-domain benchmark to evaluate generalization across unseen data distributions, where DeformPIC achieves state-of-the-art performance.

Deformation-based In-Context Learning for Point Cloud Understanding

Abstract

Recent advances in point cloud In-Context Learning (ICL) have demonstrated strong multitask capabilities. Existing approaches typically adopt a Masked Point Modeling (MPM)-based paradigm for point cloud ICL. However, MPM-based methods directly predict the target point cloud from masked tokens without leveraging geometric priors, requiring the model to infer spatial structure and geometric details solely from token-level correlations via transformers. Additionally, these methods suffer from a training-inference objective mismatch, as the model learns to predict the target point cloud using target-side information that is unavailable at inference time. To address these challenges, we propose DeformPIC, a deformation-based framework for point cloud ICL. Unlike existing approaches that rely on masked reconstruction, DeformPIC learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives. Extensive experiments demonstrate that DeformPIC consistently outperforms previous state-of-the-art methods, achieving reductions of 1.6, 1.8, and 4.7 points in average Chamfer Distance on reconstruction, denoising, and registration tasks, respectively. Furthermore, we introduce a new out-of-domain benchmark to evaluate generalization across unseen data distributions, where DeformPIC achieves state-of-the-art performance.

Paper Structure

This paper contains 22 sections, 12 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overall framework of DeformPIC. Our model consists of two core components. The Deformation Extraction Network (DEN) extracts the geometric transformation from an example pair (prompt input→ prompt target) into a task embedding. This embedding then modulates the Deformation Transfer Network (DTN), thereby guiding it to apply the same transformation to a new query input to produce the final prediction. The entire network is trained by minimizing the Chamfer Distance fan2017CD against the ground truth.
  • Figure 2: Qualitative comparison on ShapeNet In-Context dataset. From left to right: input point clouds, PIC-Cat fang2023PIC results, our DeformPIC results, and ground truth. Our method achieves better geometric detail preservation, improved structural coherence, and fewer visual artifacts across diverse tasks.
  • Figure 3: Task Feature Visualization of DeformPIC on the ShapeNet / ModelNet40 / ScanObjectNN In-Context datasets. We use t-SNE maaten2008tsne to reduce the dimensionality of the task features to a 2D space and visualize the distributions of task features.
  • Figure 4: Visualization results on registration task compared with the PIC fang2023PIC and our approach.
  • Figure 5: Visualization results on reconstruction task compared with the PIC fang2023PIC and our approach.
  • ...and 2 more figures