Representation Learning for Point Cloud Understanding
Siming Yan
TL;DR
This work surveys and advances representation learning for 3D point clouds by integrating supervised primitive segmentation, self-supervised learning, and 2D-to-3D transfer. It introduces HPNet, a hybrid-representation network for primitive segmentation that fuses semantic and spectral cues with adaptive weighting and mean-shift clustering. It then proposes an asymmetric Implicit AutoEncoder (IAE) to address sampling variations in self-supervised learning, and a masked-3D feature prediction approach (MaskFeat3D) that emphasizes recovering high-order point features rather than point positions. Finally, it demonstrates a transfer-learning framework (MVNet) that leverages pre-trained 2D models via multi-view projection and cross-view consistency to boost 3D understanding. Across extensive experiments on benchmarks like ModelNet40, ScanObjectNN, ShapeNetPart, ScanNet, and SUN RGB-D, the methods show robust gains in classification, detection, and segmentation, highlighting practical benefits for 3D scene understanding and autonomous systems.
Abstract
With the rapid advancement of technology, 3D data acquisition and utilization have become increasingly prevalent across various fields, including computer vision, robotics, and geospatial analysis. 3D data, captured through methods such as 3D scanners, LiDARs, and RGB-D cameras, provides rich geometric, shape, and scale information. When combined with 2D images, 3D data offers machines a comprehensive understanding of their environment, benefiting applications like autonomous driving, robotics, remote sensing, and medical treatment. This dissertation focuses on three main areas: supervised representation learning for point cloud primitive segmentation, self-supervised learning methods, and transfer learning from 2D to 3D. Our approach, which integrates pre-trained 2D models to support 3D network training, significantly improves 3D understanding without merely transforming 2D data. Extensive experiments validate the effectiveness of our methods, showcasing their potential to advance point cloud representation learning by effectively integrating 2D knowledge.
