Table of Contents
Fetching ...

HFBRI-MAE: Handcrafted Feature Based Rotation-Invariant Masked Autoencoder for 3D Point Cloud Analysis

Xuanhua Yin, Dingxin Zhang, Jianhui Yu, Weidong Cai

TL;DR

HFBRI-MAE tackles the challenge of rotation variance in self-supervised learning for 3D point clouds by introducing rotation-invariant handcrafted features (RIHF) and training with an aligned reconstruction target. It combines Rotation-Invariant Local Features (RILF) and Rotation-Invariant Global Features (RIGF) into a patch-based MAE framework, using a transformer encoder-decoder to learn rotation-stable representations. Empirical results on ModelNet40, ScanObjectNN, ShapeNetPart, and few-shot tasks show state-of-the-art performance under diverse rotation settings, demonstrating strong robustness and generalization to real-world 3D data. The work offers a practical path to rotation-invariant SSL for 3D perception with potential impact on robotics, autonomous systems, and augmented reality.

Abstract

Self-supervised learning (SSL) has demonstrated remarkable success in 3D point cloud analysis, particularly through masked autoencoders (MAEs). However, existing MAE-based methods lack rotation invariance, leading to significant performance degradation when processing arbitrarily rotated point clouds in real-world scenarios. To address this limitation, we introduce Handcrafted Feature-Based Rotation-Invariant Masked Autoencoder (HFBRI-MAE), a novel framework that refines the MAE design with rotation-invariant handcrafted features to ensure stable feature learning across different orientations. By leveraging both rotation-invariant local and global features for token embedding and position embedding, HFBRI-MAE effectively eliminates rotational dependencies while preserving rich geometric structures. Additionally, we redefine the reconstruction target to a canonically aligned version of the input, mitigating rotational ambiguities. Extensive experiments on ModelNet40, ScanObjectNN, and ShapeNetPart demonstrate that HFBRI-MAE consistently outperforms existing methods in object classification, segmentation, and few-shot learning, highlighting its robustness and strong generalization ability in real-world 3D applications.

HFBRI-MAE: Handcrafted Feature Based Rotation-Invariant Masked Autoencoder for 3D Point Cloud Analysis

TL;DR

HFBRI-MAE tackles the challenge of rotation variance in self-supervised learning for 3D point clouds by introducing rotation-invariant handcrafted features (RIHF) and training with an aligned reconstruction target. It combines Rotation-Invariant Local Features (RILF) and Rotation-Invariant Global Features (RIGF) into a patch-based MAE framework, using a transformer encoder-decoder to learn rotation-stable representations. Empirical results on ModelNet40, ScanObjectNN, ShapeNetPart, and few-shot tasks show state-of-the-art performance under diverse rotation settings, demonstrating strong robustness and generalization to real-world 3D data. The work offers a practical path to rotation-invariant SSL for 3D perception with potential impact on robotics, autonomous systems, and augmented reality.

Abstract

Self-supervised learning (SSL) has demonstrated remarkable success in 3D point cloud analysis, particularly through masked autoencoders (MAEs). However, existing MAE-based methods lack rotation invariance, leading to significant performance degradation when processing arbitrarily rotated point clouds in real-world scenarios. To address this limitation, we introduce Handcrafted Feature-Based Rotation-Invariant Masked Autoencoder (HFBRI-MAE), a novel framework that refines the MAE design with rotation-invariant handcrafted features to ensure stable feature learning across different orientations. By leveraging both rotation-invariant local and global features for token embedding and position embedding, HFBRI-MAE effectively eliminates rotational dependencies while preserving rich geometric structures. Additionally, we redefine the reconstruction target to a canonically aligned version of the input, mitigating rotational ambiguities. Extensive experiments on ModelNet40, ScanObjectNN, and ShapeNetPart demonstrate that HFBRI-MAE consistently outperforms existing methods in object classification, segmentation, and few-shot learning, highlighting its robustness and strong generalization ability in real-world 3D applications.

Paper Structure

This paper contains 16 sections, 9 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Comparison between standard MAE and HFBRI-MAE under rotated point clouds. HFBRI-MAE uses handcrafted rotation-invariant features and aligned reconstruction targets to achieve consistent features learning across rotations.
  • Figure 2: Architecture of the proposed HFBRI-MAE framework. The input point cloud is divided into patches using FPS and KNN. RILF and RIGF are extracted to form token and position embeddings, which are processed by the encoder. The decoder reconstructs masked patches using aligned point cloud coordinates, facilitating self-supervised learning and downstream tasks.
  • Figure 3: Visualization of distance features ($d_{pxi}$) and angle relationships with reference point ($\alpha_0, \alpha_1, \alpha_2$) in RILF.
  • Figure 4: Visualization of inter-neighbor angle relationship ($\phi, \beta_0, \beta_1, \beta_2$) in RILF.
  • Figure 5: Visualisation of the RIGF ($d_{p}, d_{pm}, d_{sm}, \alpha, \beta$) construction process using the neighborhood ball, which is centered at the reference point $p$ with a radius $r$ defined by the distance to the farthest neighboring point.
  • ...and 4 more figures