HFBRI-MAE: Handcrafted Feature Based Rotation-Invariant Masked Autoencoder for 3D Point Cloud Analysis
Xuanhua Yin, Dingxin Zhang, Jianhui Yu, Weidong Cai
TL;DR
HFBRI-MAE tackles the challenge of rotation variance in self-supervised learning for 3D point clouds by introducing rotation-invariant handcrafted features (RIHF) and training with an aligned reconstruction target. It combines Rotation-Invariant Local Features (RILF) and Rotation-Invariant Global Features (RIGF) into a patch-based MAE framework, using a transformer encoder-decoder to learn rotation-stable representations. Empirical results on ModelNet40, ScanObjectNN, ShapeNetPart, and few-shot tasks show state-of-the-art performance under diverse rotation settings, demonstrating strong robustness and generalization to real-world 3D data. The work offers a practical path to rotation-invariant SSL for 3D perception with potential impact on robotics, autonomous systems, and augmented reality.
Abstract
Self-supervised learning (SSL) has demonstrated remarkable success in 3D point cloud analysis, particularly through masked autoencoders (MAEs). However, existing MAE-based methods lack rotation invariance, leading to significant performance degradation when processing arbitrarily rotated point clouds in real-world scenarios. To address this limitation, we introduce Handcrafted Feature-Based Rotation-Invariant Masked Autoencoder (HFBRI-MAE), a novel framework that refines the MAE design with rotation-invariant handcrafted features to ensure stable feature learning across different orientations. By leveraging both rotation-invariant local and global features for token embedding and position embedding, HFBRI-MAE effectively eliminates rotational dependencies while preserving rich geometric structures. Additionally, we redefine the reconstruction target to a canonically aligned version of the input, mitigating rotational ambiguities. Extensive experiments on ModelNet40, ScanObjectNN, and ShapeNetPart demonstrate that HFBRI-MAE consistently outperforms existing methods in object classification, segmentation, and few-shot learning, highlighting its robustness and strong generalization ability in real-world 3D applications.
