Table of Contents
Fetching ...

DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

TL;DR

DG-PIC tackles the problem of cross-domain generalization in multi-task point cloud understanding by unifying Domain Generalization with Point-In-Context Learning. It introduces dual-level source prototypes (global and local) and a dual-level test-time feature shifting mechanism (macro-level semantic and micro-level patch relations) to align unseen target data toward known source domains without any testing-time updates. The method is trained with a PIC-based, MPM Transformer framework and uses test-time prompt selection from the closest source-domain sample to handle reconstruction, denoising, and registration within a single model. A new multi-domain multi-task benchmark across four datasets (ModelNet40, ShapeNet, ScanNet, ScanObjectNN) demonstrates that DG-PIC achieves superior generalization and robustness compared to baselines and ablations, highlighting its practical value for real-world 3D understanding under distribution shifts.

Abstract

Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task learning capability, it usually relies on high-quality context-rich data and considers a single dataset, and has rarely been studied in point cloud understanding. In this paper, we introduce a novel, practical, multi-domain multi-task setting, handling multiple domains and multiple tasks within one unified model for domain generalized point cloud understanding. To this end, we propose Domain Generalized Point-In-Context Learning (DG-PIC) that boosts the generalizability across various tasks and domains at testing time. In particular, we develop dual-level source prototype estimation that considers both global-level shape contextual and local-level geometrical structures for representing source domains and a dual-level test-time feature shifting mechanism that leverages both macro-level domain semantic information and micro-level patch positional relationships to pull the target data closer to the source ones during the testing. Our DG-PIC does not require any model updates during the testing and can handle unseen domains and multiple tasks, \textit{i.e.,} point cloud reconstruction, denoising, and registration, within one unified model. We also introduce a benchmark for this new setting. Comprehensive experiments demonstrate that DG-PIC outperforms state-of-the-art techniques significantly.

DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

TL;DR

DG-PIC tackles the problem of cross-domain generalization in multi-task point cloud understanding by unifying Domain Generalization with Point-In-Context Learning. It introduces dual-level source prototypes (global and local) and a dual-level test-time feature shifting mechanism (macro-level semantic and micro-level patch relations) to align unseen target data toward known source domains without any testing-time updates. The method is trained with a PIC-based, MPM Transformer framework and uses test-time prompt selection from the closest source-domain sample to handle reconstruction, denoising, and registration within a single model. A new multi-domain multi-task benchmark across four datasets (ModelNet40, ShapeNet, ScanNet, ScanObjectNN) demonstrates that DG-PIC achieves superior generalization and robustness compared to baselines and ablations, highlighting its practical value for real-world 3D understanding under distribution shifts.

Abstract

Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task learning capability, it usually relies on high-quality context-rich data and considers a single dataset, and has rarely been studied in point cloud understanding. In this paper, we introduce a novel, practical, multi-domain multi-task setting, handling multiple domains and multiple tasks within one unified model for domain generalized point cloud understanding. To this end, we propose Domain Generalized Point-In-Context Learning (DG-PIC) that boosts the generalizability across various tasks and domains at testing time. In particular, we develop dual-level source prototype estimation that considers both global-level shape contextual and local-level geometrical structures for representing source domains and a dual-level test-time feature shifting mechanism that leverages both macro-level domain semantic information and micro-level patch positional relationships to pull the target data closer to the source ones during the testing. Our DG-PIC does not require any model updates during the testing and can handle unseen domains and multiple tasks, \textit{i.e.,} point cloud reconstruction, denoising, and registration, within one unified model. We also introduce a benchmark for this new setting. Comprehensive experiments demonstrate that DG-PIC outperforms state-of-the-art techniques significantly.
Paper Structure (15 sections, 9 equations, 4 figures, 2 tables)

This paper contains 15 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a-b) Previous train-time DG techniques for point cloud understanding are typically designed for one task and dedicated to domain-invariant features at training time while ignoring the contribution of the testing data. (c) In contrast, our DG-PIC aims at pulling the testing features to the source ones at dual levels to improve the pre-trained model's generalizability, which does not require any model updates at testing time, excelling at the newly proposed multi-domain and multi-task setting.
  • Figure 2: The proposed DG-PIC. (a) Pre-training: we select an arbitrary sample from source domains and form a query-prompt pair with the current one. The point cloud pairs will tackle the same task. Then, we mask some patches in the target point clouds randomly through the MPM framework and reconstruct them via the Transformer model. (b) Testing: we freeze the pre-trained model, and generalize it towards unseen target sample through two key components: estimating source domain prototypes using global-level and local-level features, aligning target features with source domains by considering macro-level semantic information and micro-level positional relationships. We select the most similar sample from the nearest source as the prompt.
  • Figure 3: Illustration of the presented method: (a) Dual-level Source Prototype Estimation. We estimate the prototype of the source domains at the global and local levels and consider the feature distance from the target sample to the prototypes at the dual levels. (b) Dual-level Test-time Target Feature Shifting. We shift the target feature by considering both macro-level semantic information across all source domains and micro-level positional relationships within corresponding patches.
  • Figure 4: Visualization results of our DG-PIC and their corresponding targets (denoted as ground truth) under $3$ different tasks, including reconstruction, denoising, and registration. The real-world dataset ScanObjectNN serves as the target domain.