Table of Contents
Fetching ...

Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation

Jie Liu, Wenzhe Yin, Haochen Wang, Yunlu CHen, Jan-Jakob Sonke, Efstratios Gavves

TL;DR

Dynamic Prototype Adaptation (DPA) addresses the mismatch between support prototypes and query features in few-shot point cloud segmentation by learning task-specific prototypes for each query. It leverages three components—prototype rectification, prototype-to-query attention, and prototype distillation—to adapt vanilla support prototypes into query-specific representations, enabling accurate per-point masks via a non-parametric metric that compares per-point features to prototypes. Evaluated on the S3DIS and ScanNet benchmarks under the $N$-way $K$-shot setting, DPA delivers state-of-the-art performance, exemplified by improvements of $7.43\%$ on S3DIS and $6.39\%$ on ScanNet in the $2$-way $1$-shot setting over prior methods. The results demonstrate improved generalization to unseen classes and object variations with an end-to-end framework that remains practically efficient.

Abstract

Few-shot point cloud segmentation seeks to generate per-point masks for previously unseen categories, using only a minimal set of annotated point clouds as reference. Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features. In this work, we present dynamic prototype adaptation (DPA), which explicitly learns task-specific prototypes for each query point cloud to tackle the object variation problem. DPA achieves the adaptation through prototype rectification, aligning vanilla prototypes from support with the query feature distribution, and prototype-to-query attention, extracting task-specific context from query point clouds. Furthermore, we introduce a prototype distillation regularization term, enabling knowledge transfer between early-stage prototypes and their deeper counterparts during adaption. By iteratively applying these adaptations, we generate task-specific prototypes for accurate mask predictions on query point clouds. Extensive experiments on two popular benchmarks show that DPA surpasses state-of-the-art methods by a significant margin, e.g., 7.43\% and 6.39\% under the 2-way 1-shot setting on S3DIS and ScanNet, respectively. Code is available at https://github.com/jliu4ai/DPA.

Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation

TL;DR

Dynamic Prototype Adaptation (DPA) addresses the mismatch between support prototypes and query features in few-shot point cloud segmentation by learning task-specific prototypes for each query. It leverages three components—prototype rectification, prototype-to-query attention, and prototype distillation—to adapt vanilla support prototypes into query-specific representations, enabling accurate per-point masks via a non-parametric metric that compares per-point features to prototypes. Evaluated on the S3DIS and ScanNet benchmarks under the -way -shot setting, DPA delivers state-of-the-art performance, exemplified by improvements of on S3DIS and on ScanNet in the -way -shot setting over prior methods. The results demonstrate improved generalization to unseen classes and object variations with an end-to-end framework that remains practically efficient.

Abstract

Few-shot point cloud segmentation seeks to generate per-point masks for previously unseen categories, using only a minimal set of annotated point clouds as reference. Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features. In this work, we present dynamic prototype adaptation (DPA), which explicitly learns task-specific prototypes for each query point cloud to tackle the object variation problem. DPA achieves the adaptation through prototype rectification, aligning vanilla prototypes from support with the query feature distribution, and prototype-to-query attention, extracting task-specific context from query point clouds. Furthermore, we introduce a prototype distillation regularization term, enabling knowledge transfer between early-stage prototypes and their deeper counterparts during adaption. By iteratively applying these adaptations, we generate task-specific prototypes for accurate mask predictions on query point clouds. Extensive experiments on two popular benchmarks show that DPA surpasses state-of-the-art methods by a significant margin, e.g., 7.43\% and 6.39\% under the 2-way 1-shot setting on S3DIS and ScanNet, respectively. Code is available at https://github.com/jliu4ai/DPA.
Paper Structure (18 sections, 11 equations, 4 figures, 4 tables)

This paper contains 18 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of the proposed method. Support and query point clouds often exhibit significant object variations, which can result in prototypes obtained from the support data being unsuitable for segmenting the query point cloud accurately. In this work, we propose to generate task-specific prototypes for the query point cloud by dynamically adapting prototypes with prototype distillation into query feature distribution.
  • Figure 2: Diagram of our proposed method.(a) Encoder: Given the support and query point clouds, the feature encoder $\mathcal{E}$ extracts per-point feature $f_s$ and $f_q$, respectively. The prototype encoder transforms support features into the set of vanilla prototypes $\mathbb{P}$. (b) Prototype Decoder: The prototype decoder adapts vanilla prototypes to task-specific prototypes $\Tilde{\mathbb{P}}$ through $L$ transformer-based decoder blocks, which is composed of prototype rectification, prototype-to-query attention, and prototype distillation. The prototype distillation is introduced to enable early-stage prototypes $\hat{\mathbb{P}}$ to glean insights from their deeper counterparts $\Tilde{\mathbb{P}}$. (c) Mask Decoder: The mask prediction module generates per-class mask logits and then produces a final prediction using a softmax.
  • Figure 3: Ablation study of modules and hyper-parameters under 2-way 1-shot setting on S3DIS dataset. (a) Effects of different adaptors. (b) t-SNE visualization of adaptation process on 3-way 5-shot task. (c) Effects of coefficient of prototype distillation (PD) loss $\gamma$. (d) Effects of number of decoder layers $L$.
  • Figure 4: Quantitative results of our method in 2-way 1-shot point cloud semantic segmentation in comparison to Ground Truth and AttMPTI zhao2021few. Left: S3DIS. Right: ScanNet. Our method achieves consistently better segmentation results on both S3DIS and ScanNet.