Point-PNG: Conditional Pseudo-Negatives Generation for Point Cloud Pre-Training
Sutharsan Mahendren, Saimunur Rahman, Piotr Koniusz, Tharindu Fernando, Sridha Sridharan, Clinton Fookes, Peyman Moghadam
TL;DR
Point-PNG tackles the invariant-collapse issue in transformation-aware self-supervised learning for point clouds by introducing conditional pseudo-negatives generated by a COPE predictor. The framework jointly trains COPE with a transformer-based MAE backbone and a specialized loss that combines alignment, pseudo-negatives, and uniformity to preserve transformation sensitivity while maintaining discriminative power. Empirical results demonstrate competitive or superior performance on shape classification (ModelNet40, ScanObjectNN) and superior relative pose estimation compared to supervised baselines, including robust performance under challenging rotations. The approach also shows favorable parameter efficiency and extends to an image-domain evaluation on 3DIEBench, highlighting its broader applicability to transformation-aware representation learning.
Abstract
We propose Point-PNG, a novel self-supervised learning framework that generates conditional pseudo-negatives in the latent space to learn point cloud representations that are both discriminative and transformation-sensitive. Conventional self-supervised learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they often suffer from an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations with limited variation across transformations. To address this, we propose Point-PNG that explicitly penalizes invariant collapse through pseudo-negatives generation, enabling the network to capture richer transformation cues while preserving discriminative representations. To this end, we introduce a parametric network, COnditional Pseudo-Negatives Embedding (COPE), which learns localized displacements induced by transformations within the latent space. A key challenge arises when jointly training COPE with the MAE, as it tends to converge to trivial identity mappings. To overcome this, we design a loss function based on pseudo-negatives conditioned on the transformation, which penalizes such trivial invariant solutions and enforces meaningful representation learning. We validate Point-PNG on shape classification and relative pose estimation tasks, showing competitive performance on ModelNet40 and ScanObjectNN under challenging evaluation protocols, and achieving superior accuracy in relative pose estimation compared to supervised baselines.
