Table of Contents
Fetching ...

GFT: Graph Feature Tuning for Efficient Point Cloud Analysis

Manish Dhakal, Venkat R. Dasari, Rajshekhar Sunderraman, Yi Ding

TL;DR

GFT introduces a point-cloud–specific PEFT that learns dynamic graph features from transformer tokens via EdgeConv and injects them sparsely into a pretrained Point Transformer. By combining task-specific prompts, a multi-layer EdgeConv feature pyramid, and selective cross-attention, GFT achieves substantial parameter savings (~0.7M trainable params) while maintaining competitive accuracy on real-world and synthetic datasets. Ablation studies validate the contributions of each component and demonstrate robust parameter-efficiency trade-offs, though pretraining on synthetic data and added inference costs remain limitations. The work highlights the potential of graph-based feature integration for efficient point-cloud analysis and suggests future directions such as LoRA-like latency-free adaptation and improved real-world pretraining.

Abstract

Parameter-efficient fine-tuning (PEFT) significantly reduces computational and memory costs by updating only a small subset of the model's parameters, enabling faster adaptation to new tasks with minimal loss in performance. Previous studies have introduced PEFTs tailored for point cloud data, as general approaches are suboptimal. To further reduce the number of trainable parameters, we propose a point-cloud-specific PEFT, termed Graph Features Tuning (GFT), which learns a dynamic graph from initial tokenized inputs of the transformer using a lightweight graph convolution network and passes these graph features to deeper layers via skip connections and efficient cross-attention modules. Extensive experiments on object classification and segmentation tasks show that GFT operates in the same domain, rivalling existing methods, while reducing the trainable parameters. Code is available at https://github.com/manishdhakal/GFT.

GFT: Graph Feature Tuning for Efficient Point Cloud Analysis

TL;DR

GFT introduces a point-cloud–specific PEFT that learns dynamic graph features from transformer tokens via EdgeConv and injects them sparsely into a pretrained Point Transformer. By combining task-specific prompts, a multi-layer EdgeConv feature pyramid, and selective cross-attention, GFT achieves substantial parameter savings (~0.7M trainable params) while maintaining competitive accuracy on real-world and synthetic datasets. Ablation studies validate the contributions of each component and demonstrate robust parameter-efficiency trade-offs, though pretraining on synthetic data and added inference costs remain limitations. The work highlights the potential of graph-based feature integration for efficient point-cloud analysis and suggests future directions such as LoRA-like latency-free adaptation and improved real-world pretraining.

Abstract

Parameter-efficient fine-tuning (PEFT) significantly reduces computational and memory costs by updating only a small subset of the model's parameters, enabling faster adaptation to new tasks with minimal loss in performance. Previous studies have introduced PEFTs tailored for point cloud data, as general approaches are suboptimal. To further reduce the number of trainable parameters, we propose a point-cloud-specific PEFT, termed Graph Features Tuning (GFT), which learns a dynamic graph from initial tokenized inputs of the transformer using a lightweight graph convolution network and passes these graph features to deeper layers via skip connections and efficient cross-attention modules. Extensive experiments on object classification and segmentation tasks show that GFT operates in the same domain, rivalling existing methods, while reducing the trainable parameters. Code is available at https://github.com/manishdhakal/GFT.

Paper Structure

This paper contains 35 sections, 6 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Attention maps (top-view) from the last layer with an airplane point cloud: Brighter points indicate patches contributing more to the global feature. The pretrained transformer (left) is random and uneven, while the GFT (right) shows more uniform regional pooling. Additional visualizations of different objects are included in the supplementary (\ref{['sec:suppl_vizualizations']}).
  • Figure 2: Within the parameter budget of $(0.6M, 1.7M)$, baselines (IDPT and DAPT) exhibited maximum drops of $0.97\%$ and $0.51\%$, respectively, when evaluated with OBG_BJ dataset. Another ablation at \ref{['fig:params_vs_perf']} compares our method with the baselines regarding performance-efficiency trade-off.
  • Figure 3: Full fine-tuning (FFT) vs. IDPT zha2023instance vs. GFT. (classification task) (a) The whole model is fine-tuned in an end-to-end manner, including the encoder and the task head. (b) IDPT freezes the encoders and trains a heavy graph feature extractor at the last layer. (c) With light blocks, GFT extracts graph features from the earliest tokens of transformers and selectively injects features into the encoder.
  • Figure 4: Overall architecture of GFT. (a) GFT is the composition of learnable prompts, graph feature extraction, and cross-attention interaction. (b) EdgeConv generates graph features from K-nearest neighbouring tokens, where the edge feature is $\mathcal{E}_i=\mathcal{N}(T_i)-T_i$. (c) Cross-attention interaction blocks use the encoder features $E_i$ as query and the graph features $M$ as key/value for the attention module.
  • Figure 5: Violin plot comparing GFT and other PEFT methods across parameter sizes $(0.6M,1.7M)$ range. IDPT and DAPT display higher performance instability as parameters change, whereas GFT demonstrates greater immunity to parameter variation.
  • ...and 2 more figures