Parameter-efficient Prompt Learning for 3D Point Cloud Understanding
Hongyu Sun, Yongcai Wang, Wang Chen, Haoran Deng, Deying Li
TL;DR
This work tackles the challenge of adapting large multi-modal models to 3D point cloud understanding in a parameter- and data-efficient manner. It introduces PPT, consisting of a learnable PromptLearner to replace hand-crafted prompts and a lightweight PointAdapter, with the 3D encoder frozen to maximize efficiency. The method achieves state-of-the-art or strongly competitive results across 3D recognition, few-shot learning, and part segmentation on diverse datasets, while using orders of magnitude fewer trainable parameters than full fine-tuning. The findings demonstrate that parameter-efficient prompt tuning can effectively transfer rich multi-modal knowledge to 3D tasks, with clear gains in data efficiency and practical deployment potential.
Abstract
This paper presents a parameter-efficient prompt tuning method, named PPT, to adapt a large multi-modal model for 3D point cloud understanding. Existing strategies are quite expensive in computation and storage, and depend on time-consuming prompt engineering. We address the problems from three aspects. Firstly, a PromptLearner module is devised to replace hand-crafted prompts with learnable contexts to automate the prompt tuning process. Then, we lock the pre-trained backbone instead of adopting the full fine-tuning paradigm to substantially improve the parameter efficiency. Finally, a lightweight PointAdapter module is arranged near target tasks to enhance prompt tuning for 3D point cloud understanding. Comprehensive experiments are conducted to demonstrate the superior parameter and data efficiency of the proposed method.Meanwhile, we obtain new records on 4 public datasets and multiple 3D tasks, i.e., point cloud recognition, few-shot learning, and part segmentation. The implementation is available at https://github.com/auniquesun/PPT.
