Table of Contents
Fetching ...

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, Xinchao Wang

TL;DR

This paper proposes PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models, reducing the need for tunable parameters while enhancing global feature capture.

Abstract

Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. However, as pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. Parameter-efficient fine-tuning (PEFT) methods offer a promising solution to mitigate these resource requirements, yet most current approaches rely on complex adapter and prompt mechanisms that increase tunable parameters. In this paper, we propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models. Our approach embeds LoRA layers within the most parameter-intensive components of point cloud transformers, reducing the need for tunable parameters while enhancing global feature capture. Additionally, multi-scale token selection extracts critical local information to serve as prompts for downstream fine-tuning, effectively complementing the global context captured by LoRA. The experimental results across various pre-trained models and three challenging public datasets demonstrate that our approach achieves competitive performance with only 3.43% of the trainable parameters, making it highly effective for resource-constrained applications. Source code is available at: https://github.com/songw-zju/PointLoRA.

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

TL;DR

This paper proposes PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models, reducing the need for tunable parameters while enhancing global feature capture.

Abstract

Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. However, as pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. Parameter-efficient fine-tuning (PEFT) methods offer a promising solution to mitigate these resource requirements, yet most current approaches rely on complex adapter and prompt mechanisms that increase tunable parameters. In this paper, we propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models. Our approach embeds LoRA layers within the most parameter-intensive components of point cloud transformers, reducing the need for tunable parameters while enhancing global feature capture. Additionally, multi-scale token selection extracts critical local information to serve as prompts for downstream fine-tuning, effectively complementing the global context captured by LoRA. The experimental results across various pre-trained models and three challenging public datasets demonstrate that our approach achieves competitive performance with only 3.43% of the trainable parameters, making it highly effective for resource-constrained applications. Source code is available at: https://github.com/songw-zju/PointLoRA.

Paper Structure

This paper contains 28 sections, 11 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Comparing our proposed PointLoRA approach against vanilla LoRA methods. Both LoRA and our approach incorporate low-rank adaptation matrices into the pre-trained weights to extract global information from the point cloud sequence. Furthermore, our approach integrates tokens selected at various scales to capture local information, which is refined using a shared Prompt MLP and then output alongside the results derived from the original low-rank matrices.
  • Figure 2: Overview of PointLoRA integrated into point cloud transformer pipeline. Given an input point cloud, we first tokenize it using the original Point Tokenizer and perform token selection across multiple scales (Multi-Scale Token Selection). The tokens from both components are then concatenated and fed into the Transformer Block. Our approach is injected into the qkv projection and FFN layers, utilizing a shared Prompt MLP within these layers to enhance parameter efficiency.
  • Figure 3: Illustration of Multi-Scale Token Selection. In a two-scale setup, we first sample different numbers of center points, then cluster around each center point and apply $\operatorname{Mini-PointNet}$ encoding to generate the corresponding tokens. These tokens are also fed into a Mask Predictor to estimate importance scores, allowing us to select the Top-K tokens at each scale.
  • Figure 4: The t-SNE visualization results on the PB-T50-RS split of the ScanObjectNN dataset uy2019revisiting with different fine-tuning schemes. We adopt Point-MAE pang2022masked as the baseline model for fair comparison. TP: Number of tunable parameters. OA: Overall accuracy. Symbol $*$ denotes re-produced with official implementation. Best viewed in colors and zoomed-in for additional details.
  • Figure A1: Visualization results for part segmentation on ShapeNetPart yi2016scalable. We present projected prediction images from PointLoRA across four different viewpoints, including "Airplane", "Bag", "Chair" and "Guitar".
  • ...and 1 more figures