Table of Contents
Fetching ...

Triple Point Masking

Jiaming Liu, Linghe Kong, Yue Wu, Maoguo Gong, Hao Li, Qiguang Miao, Wenping Ma, Can Qin

TL;DR

This paper introduces a triple point masking scheme, named TPM, which serves as a scalable plug-and-play framework for MAE pre-training to achieve multi-mask learning for 3D point clouds.

Abstract

Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask choices (i.e., medium mask and low mask) as our core insight is that the recovery process of an object can manifest in diverse ways. Previous high-masking schemes focus on capturing the global representation but lack the fine-grained recovery capability, so that the generated pre-trained weights tend to play a limited role in the fine-tuning process. With the support of the proposed TPM, available methods can exhibit more flexible and accurate completion capabilities, enabling the potential autoencoder in the pre-training stage to consider multiple representations of a single 3D object. In addition, an SVM-guided weight selection module is proposed to fill the encoder parameters for downstream networks with the optimal weight during the fine-tuning stage, maximizing linear accuracy and facilitating the acquisition of intricate representations for new objects. Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks. Our code and models are available at https://github.com/liujia99/TPM.

Triple Point Masking

TL;DR

This paper introduces a triple point masking scheme, named TPM, which serves as a scalable plug-and-play framework for MAE pre-training to achieve multi-mask learning for 3D point clouds.

Abstract

Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask choices (i.e., medium mask and low mask) as our core insight is that the recovery process of an object can manifest in diverse ways. Previous high-masking schemes focus on capturing the global representation but lack the fine-grained recovery capability, so that the generated pre-trained weights tend to play a limited role in the fine-tuning process. With the support of the proposed TPM, available methods can exhibit more flexible and accurate completion capabilities, enabling the potential autoencoder in the pre-training stage to consider multiple representations of a single 3D object. In addition, an SVM-guided weight selection module is proposed to fill the encoder parameters for downstream networks with the optimal weight during the fine-tuning stage, maximizing linear accuracy and facilitating the acquisition of intricate representations for new objects. Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks. Our code and models are available at https://github.com/liujia99/TPM.
Paper Structure (16 sections, 6 equations, 6 figures, 8 tables)

This paper contains 16 sections, 6 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Illustration of TPM. Given additional masks $m_1$ and $m_2$, multi-mask completion is performed under supervision of the same input (i.e., ground truth) during pre-training. The resulting optimal weight $w_0$ or $w_1$ or $w_2$ is adopted to fit the specific encoder, providing discriminative prior conditions for downstream tasks such as classification and segmentation, etc.
  • Figure 2: Comparison of training loss (left) and inference accuracy (right) of the orginal and the proposed $w_0 \mapsto {m_0}$ during fine-tuning. Results on ScanObjectNN (PB_T50_RS) uy2019revisiting are reported.
  • Figure 3: Overall pipeline of our TPM. Given triple masked point clouds, we extend the use of an autoencoder with shared weights corresponding to the number of inputs based on a pre-training framework (e.g. Point-MAE pang2022masked) . The autoencoder learns the recovery process under triple masks and records the respective optimal pretrained models. Being supervised by the same objective, triple mask learning can influence their respective weights for subsequently performing the weight selection operation during the fine-tuning phase.
  • Figure 4: Completion visualization of the baseline with/without TPM on the ShapeNet chang2015shapenet dataset. Our TPM focuses more on detail areas.
  • Figure 5: Comparison of SVM classification during pre-training and MLP prediction during fine-tuning under different masks.
  • ...and 1 more figures