Table of Contents
Fetching ...

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

Yiwen Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

TL;DR

This work introduces Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters that can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters.

Abstract

The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code is released at https://github.com/Ivan-Tang-3D/Point-PEFT.

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

TL;DR

This work introduces Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters that can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters.

Abstract

The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code is released at https://github.com/Ivan-Tang-3D/Point-PEFT.
Paper Structure (31 sections, 11 equations, 14 figures, 2 tables)

This paper contains 31 sections, 11 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Our Point-PEFT vs. Full Fine-tuning on ModelNet40 wu20153d dataset. We compare the fine-tuning of three popular pre-trained models, Point-BERT yu2022point, Point-MAE pang2022masked, and Point-M2AE zhang2022point, where our Point-PEFT achieves superior performance and parameter efficiency.
  • Figure 2: Overall Pipeline of Point-PEFT. For efficiently fine-tuning a pre-trained 3D encoder, our Point-PEFT contains two components: a Point-prior Prompt ($P^2$-Prompt) in the first $L$ blocks, which aggregates prior 3D knowledge from a $P^2$-Bank module, and a Geometry-aware Adapter inserted at the end of each block to effectively grasp the local geometric information.
  • Figure 3: Point-prior Prompt. To generate the prompt token with 3D prior knowledge, we construct a point-prior bank before fine-tuning, and conduct parameter-free attention for feature aggregation, which adaptively enhances the learnable prompt token with domain-specific semantics.
  • Figure 4: Geometry-aware Adapter. Inserted into every transformer block, the adapter aims to extract the fine-grained geometric information by local interactions.
  • Figure 5: Real-world 3D Classification on ScanObjectNN. We report the number of learnable parameters (#Param) and the accuracy (%) on the "PB-T50-RS" split of ScanObjectNN. $^\dagger$ indicates utilizing a stronger data augmentation zhang2023learning during fine-tuning.
  • ...and 9 more figures