Table of Contents
Fetching ...

Robust Fine-tuning for Pre-trained 3D Point Cloud Models

Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

TL;DR

This work tackles robustness under distribution shift in downstream fine-tuning of pre-trained 3D point cloud models. It introduces WiSE-FT-LP, a weight-space interpolation followed by linear probing that integrates the original pre-training and fine-tuned backbones and then fixes the backbone to train only the head, achieving a favorable robustness–accuracy balance. Empirical results on ReCon and Point-M2AE show enhanced feature robustness with minimal or no loss in target-distribution performance, validated through linear SVM and few-shot analyses. The approach is simple, cost-efficient, and broadly applicable to point-cloud pre-training models without altering their architectures.

Abstract

This paper presents a robust fine-tuning method designed for pre-trained 3D point cloud models, to enhance feature robustness in downstream fine-tuned models. We highlight the limitations of current fine-tuning methods and the challenges of learning robust models. The proposed method, named Weight-Space Ensembles for Fine-Tuning then Linear Probing (WiSE-FT-LP), integrates the original pre-training and fine-tuning models through weight space integration followed by Linear Probing. This approach significantly enhances the performance of downstream fine-tuned models under distribution shifts, improving feature robustness while maintaining high performance on the target distribution. We apply this robust fine-tuning method to mainstream 3D point cloud pre-trained models and evaluate the quality of model parameters and the degradation of downstream task performance. Experimental results demonstrate the effectiveness of WiSE-FT-LP in enhancing model robustness, effectively balancing downstream task performance and model feature robustness without altering the model structures.

Robust Fine-tuning for Pre-trained 3D Point Cloud Models

TL;DR

This work tackles robustness under distribution shift in downstream fine-tuning of pre-trained 3D point cloud models. It introduces WiSE-FT-LP, a weight-space interpolation followed by linear probing that integrates the original pre-training and fine-tuned backbones and then fixes the backbone to train only the head, achieving a favorable robustness–accuracy balance. Empirical results on ReCon and Point-M2AE show enhanced feature robustness with minimal or no loss in target-distribution performance, validated through linear SVM and few-shot analyses. The approach is simple, cost-efficient, and broadly applicable to point-cloud pre-training models without altering their architectures.

Abstract

This paper presents a robust fine-tuning method designed for pre-trained 3D point cloud models, to enhance feature robustness in downstream fine-tuned models. We highlight the limitations of current fine-tuning methods and the challenges of learning robust models. The proposed method, named Weight-Space Ensembles for Fine-Tuning then Linear Probing (WiSE-FT-LP), integrates the original pre-training and fine-tuning models through weight space integration followed by Linear Probing. This approach significantly enhances the performance of downstream fine-tuned models under distribution shifts, improving feature robustness while maintaining high performance on the target distribution. We apply this robust fine-tuning method to mainstream 3D point cloud pre-trained models and evaluate the quality of model parameters and the degradation of downstream task performance. Experimental results demonstrate the effectiveness of WiSE-FT-LP in enhancing model robustness, effectively balancing downstream task performance and model feature robustness without altering the model structures.
Paper Structure (23 sections, 1 equation, 5 figures, 1 table)

This paper contains 23 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Fine-tuning steps for WiSE-FT-LP.
  • Figure 2: Comparison of linear SVM classification results between Point-M2AE and ReCon on the ModelNet40 dataset: Three backbone networks fine-tuned on the ScanObjectNN dataset were compared through weight space integration. These networks correspond to the hardest setting (red), no background setting (green), and background setting (blue).
  • Figure 3: Comparison of the results of robust fine-tuning of Point-M2AE on three ScanObjectNN dataset settings. Red, green, and blue represent the most difficult settings, settings without background, and settings with background respectively. The dashed line in the figure represents the original WiSE-FT, while the solid line represents the proposed WiSE-FT-LP.
  • Figure 4: Comparison of the results of robust fine-tuning of ReCon on three ScanObjectNN dataset settings. Red, green, and blue represent the most difficult settings, settings without background, and settings with background respectively. The dashed line in the figure represents the original WiSE-FT, while the solid line represents the proposed WiSE-FT-LP.
  • Figure 5: Comparison of ReCon's few-shot classification learning results on the ModelNet40 dataset: Three backbone networks fine-tuned on the ScanObjectNN dataset were compared through weight space integration. Red, green, and blue represent the most difficult settings, settings without background, and settings with background respectively.