Table of Contents
Fetching ...

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

Yaohua Zha, Naiqi Li, Yanzi Wang, Tao Dai, Hang Guo, Bin Chen, Zhi Wang, Zhihao Ouyang, Shu-Tao Xia

TL;DR

A Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder is proposed, which significantly surpasses existing Transformer-based models in both performance and efficiency.

Abstract

The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the idea that redundancy reduction is crucial for point cloud analysis. To this end, we propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder. Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency. Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder. This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density. Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCM-based Point-MAE model, compared to the Transformer-based model, achieved an improvement of 1.84%, 0.67%, and 0.60% in average accuracy on the three variants of ScanObjectNN while reducing parameters by 88% and computation by 73%. Code is available at https://github.com/zyh16143998882/LCM.

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

TL;DR

A Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder is proposed, which significantly surpasses existing Transformer-based models in both performance and efficiency.

Abstract

The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the idea that redundancy reduction is crucial for point cloud analysis. To this end, we propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder. Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency. Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder. This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density. Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCM-based Point-MAE model, compared to the Transformer-based model, achieved an improvement of 1.84%, 0.67%, and 0.60% in average accuracy on the three variants of ScanObjectNN while reducing parameters by 88% and computation by 73%. Code is available at https://github.com/zyh16143998882/LCM.
Paper Structure (31 sections, 1 theorem, 11 equations, 17 figures, 2 tables)

This paper contains 31 sections, 1 theorem, 11 equations, 17 figures, 2 tables.

Key Result

Theorem 1

Let $Y_1^M,Y_2^M$ and $Y_1^T,Y_2^T$ denote the outputs of the Mamba-based and Transformer-based decoders respectively, $I(Y_2^M; X_1, X_2)$ denote the mutual information preserved by the Mamba-based decoder, and $I(Y_2^T; X_1, X_2)$ denote that of the Transformer-based decoder. We have $I(Y_2^M; X_1

Figures (17)

  • Figure 1: Comparison of our LCM and Transformer in terms of performance and efficiency.
  • Figure 2: The effect of using top-K attention in feature space and geometric space by the Transformer on the classification performance in ScanObjectNN, all results are the averages of ten repeated experiments.
  • Figure 3: Point heatmap.
  • Figure 4: The pipeline of our Locally Constrained Compact Model (LCM) with Point-MAE pre-training. Our LCM consists of a locally constrained compact encoder and a locally constrained Mamba-based decoder.
  • Figure 5: The structure of $i$-th locally constrained compact encoder layer (a) and $i$-th locally constrained Mamba-based decoder layer (b).
  • ...and 12 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof