To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
Souhail Hadgi, Lei Li, Maks Ovsjanikov
TL;DR
This work quantitatively compares supervised and contrastive pre-training for 3D point-cloud transfer learning across diverse architectures and datasets, revealing that early 3D layers can be class-discriminative and that supervised pre-training often excels in linear probing while contrastive pre-training can outperform supervised with full fine-tuning. It introduces a simple layer-wise geometric regularization based on predicting point normals to enhance the adaptability of early layers, demonstrated to improve transfer across ModelNet40, ScanObjectNN, and S3DIS. The authors provide a consistent evaluation framework, thorough feature analyses (including layer-gradient norms and t-SNE visualizations), and show that the regularization improves performance and increases early-layer gradient norms, suggesting better downstream adaptability. These findings offer practical guidance for designing robust 3D pre-training pipelines and point to future work in architecture design and multi-scale geometric representations for transfer learning.
Abstract
Transfer learning has long been a key factor in the advancement of many fields including 2D image analysis. Unfortunately, its applicability in 3D data processing has been relatively limited. While several approaches for point cloud transfer learning have been proposed in recent literature, with contrastive learning gaining particular prominence, most existing methods in this domain have only been studied and evaluated in limited scenarios. Most importantly, there is currently a lack of principled understanding of both when and why point cloud transfer learning methods are applicable. Remarkably, even the applicability of standard supervised pre-training is poorly understood. In this work, we conduct the first in-depth quantitative and qualitative investigation of supervised and contrastive pre-training strategies and their utility in downstream 3D tasks. We demonstrate that layer-wise analysis of learned features provides significant insight into the downstream utility of trained networks. Informed by this analysis, we propose a simple geometric regularization strategy, which improves the transferability of supervised pre-training. Our work thus sheds light onto both the specific challenges of point cloud transfer learning, as well as strategies to overcome them.
