Transfer Learning with Point Transformers
Kartik Gupta, Rahul Vippala, Sahima Srivastava
TL;DR
The paper investigates transfer learning for Point Transformer-based point-cloud classification by pretraining on ModelNet10 and evaluating on 3D MNIST, with a comparison to training from scratch. It uses Point Transformer v1 with vector attention to capture long-range point relationships, and also evaluates a simple MLP baseline on 3D MNIST. Findings indicate that substantial distribution differences between ModelNet10 and 3D MNIST limit the effectiveness of transfer learning, though fine-tuning can speed up convergence and transfers learn foundational features. The work highlights practical considerations when applying transfer learning to highly dissimilar 3D domains and suggests that simpler baselines may outperform attention-based models in certain cross-domain settings.
Abstract
Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model to classify 3D MNIST dataset after finetuning. We also train the model from scratch on 3D MNIST dataset to compare the performance of finetuned and from-scratch model on the MNIST dataset. We observe that since the two datasets have a large difference in the degree of the distributions, transfer learned models do not outperform the from-scratch models in this case. Although we do expect transfer learned models to converge faster since they already know the lower level edges, corners, etc features from the ModelNet10 dataset.
