GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D

Ali Bahri; Moslem Yazdanpanah; Mehrdad Noori; Milad Cheraghalikhani; Gustavo Adolfo Vargas Hakim; David Osowiechi; Farzad Beizaee; Ismail Ben Ayed; Christian Desrosiers

GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D

Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

TL;DR

GeoMask3D (GM3D) addresses the inefficiency of random masking in self-supervised point-cloud learning by introducing geometry-guided patch masking powered by a teacher-student framework. It predicts patch geometric complexity (gc), employs a curriculum to progressively mask high-complexity regions, and integrates a knowledge-distillation pathway to align student features with a frozen teacher. The approach improves representations for Point-MAE and Point-M2AE, yielding stronger performance on ModelNet40, ScanObjectNN, and ShapeNetPart, while also accelerating pretraining convergence. By relying solely on 3D coordinates and geometric cues, GM3D advances 3D self-supervised learning without auxiliary modalities.

Abstract

We introduce a pioneering approach to self-supervised learning for point clouds, employing a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Auto Encoders (MAE). Unlike the conventional method of random masking, our technique utilizes a teacher-student model to focus on intricate areas within the data, guiding the model's focus toward regions with higher geometric complexity. This strategy is grounded in the hypothesis that concentrating on harder patches yields a more robust feature representation, as evidenced by the improved performance on downstream tasks. Our method also presents a complete-to-partial feature-level knowledge distillation technique designed to guide the prediction of geometric complexity utilizing a comprehensive context from feature-level information. Extensive experiments confirm our method's superiority over State-Of-The-Art (SOTA) baselines, demonstrating marked improvements in classification, and few-shot tasks.

GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D

TL;DR

Abstract

Paper Structure (22 sections, 9 equations, 10 figures, 11 tables, 1 algorithm)

This paper contains 22 sections, 9 equations, 10 figures, 11 tables, 1 algorithm.

Introduction
Related Works
Method
Preliminaries
*geomask3d
Prediction of *gc
Geometric-Guided Masking
Curriculum Mask Selection
Knowledge-Distillation-Guided gc
Experiments
Pretraining Setup
Downstream Tasks
Additional Visualization
Additional Analyses
Ablation Study
...and 7 more sections

Figures (10)

Figure 1: A relative comparison of the sota point cloud MAE methods on different tasks. Here, the center and the outer circles represent the lowest and highest values on each task, respectively.
Figure 2: Visualization of estimated gc progression throughout training is depicted. The color spectrum denotes gc, ranging from low (Blue) to high (Red). gc values are normalized per object to reflect relative complexity across patches within each object's point cloud. As training progresses (from left to right), initial gc rankings display a random distribution (a). After 100 epochs, the model learns to assign lower complexity rankings to smooth areas (b) and higher rankings to complex regions (c). Through gc guided masking, the model increasingly focuses on complex areas from epochs 200 to 300, resulting in a reduction of gc ranking (d) and smoothing of the complexity ranking distribution, accompanied by a decrease in total complexity loss $\mathcal{L}^{GC}$ (e). Eventually, the model converges to a low $\mathcal{L}^{GC}$ value, consistently targeting canonical patches while maintaining a smoother gc distribution (f).
Figure 3: Overview of the *geomask3d method for self-supervised representation learning in point clouds. The Teacher network predicts *gc, and patches with the highest gc, denoted by $N^{\mathit{sel}}$, are selected for masking. The Student network is then trained to reconstruct these masked tokens while simultaneously learning gc through the loss $\mathcal{L}^{GC}$. The reconstruction loss is defined as $\mathcal{L}^{\mathit{rec}} = \mathcal{L}^{\mathit{rec}_p} + \mathcal{L}^{\mathit{rec}_f}$. The Teacher network's weights are updated using the Exponential Moving Average (EMA) of the Student's weights, while the Knowledge Teacher remains frozen and is used for generating encoder features essential for the Student's training with $\mathcal{L}^{\mathit{rec}_f}$.
Figure 4: Visualization of gc values on diverse point clouds from the ShapeNet dataset chang2015shapenet.
Figure 5: Comparison of convergence speed during the training phase (Point-MAE).
...and 5 more figures

GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D

TL;DR

Abstract

GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D

Authors

TL;DR

Abstract

Table of Contents

Figures (10)