PMT-MAE: Dual-Branch Self-Supervised Learning with Distillation for Efficient Point Cloud Classification
Qiang Zheng, Chao Zhang, Jian Sun
TL;DR
PMT-MAE addresses efficient self-supervised learning for 3D point clouds by introducing a dual-branch architecture that combines Transformer and MLP modules, coupled with a two-stage distillation from the Point-M2AE teacher. It uses MAE-style masking with a non-pyramidal base and distills knowledge during pre-training (feature distillation) and fine-tuning (logit distillation), achieving 93.6% accuracy on ModelNet40 with only 40 training epochs. The approach outperforms the baseline Point-MAE and the teacher Point-M2AE while delivering favorable FLOPs, highlighting its practicality for resource-constrained scenarios. These results emphasize the value of feature diversity from dual-branch representations and targeted distillation for robust 3D representations in point-cloud classification.
Abstract
Advances in self-supervised learning are essential for enhancing feature extraction and understanding in point cloud processing. This paper introduces PMT-MAE (Point MLP-Transformer Masked Autoencoder), a novel self-supervised learning framework for point cloud classification. PMT-MAE features a dual-branch architecture that integrates Transformer and MLP components to capture rich features. The Transformer branch leverages global self-attention for intricate feature interactions, while the parallel MLP branch processes tokens through shared fully connected layers, offering a complementary feature transformation pathway. A fusion mechanism then combines these features, enhancing the model's capacity to learn comprehensive 3D representations. Guided by the sophisticated teacher model Point-M2AE, PMT-MAE employs a distillation strategy that includes feature distillation during pre-training and logit distillation during fine-tuning, ensuring effective knowledge transfer. On the ModelNet40 classification task, achieving an accuracy of 93.6\% without employing voting strategy, PMT-MAE surpasses the baseline Point-MAE (93.2\%) and the teacher Point-M2AE (93.4\%), underscoring its ability to learn discriminative 3D point cloud representations. Additionally, this framework demonstrates high efficiency, requiring only 40 epochs for both pre-training and fine-tuning. PMT-MAE's effectiveness and efficiency render it well-suited for scenarios with limited computational resources, positioning it as a promising solution for practical point cloud analysis.
