Table of Contents
Fetching ...

Point-PNG: Conditional Pseudo-Negatives Generation for Point Cloud Pre-Training

Sutharsan Mahendren, Saimunur Rahman, Piotr Koniusz, Tharindu Fernando, Sridha Sridharan, Clinton Fookes, Peyman Moghadam

TL;DR

Point-PNG tackles the invariant-collapse issue in transformation-aware self-supervised learning for point clouds by introducing conditional pseudo-negatives generated by a COPE predictor. The framework jointly trains COPE with a transformer-based MAE backbone and a specialized loss that combines alignment, pseudo-negatives, and uniformity to preserve transformation sensitivity while maintaining discriminative power. Empirical results demonstrate competitive or superior performance on shape classification (ModelNet40, ScanObjectNN) and superior relative pose estimation compared to supervised baselines, including robust performance under challenging rotations. The approach also shows favorable parameter efficiency and extends to an image-domain evaluation on 3DIEBench, highlighting its broader applicability to transformation-aware representation learning.

Abstract

We propose Point-PNG, a novel self-supervised learning framework that generates conditional pseudo-negatives in the latent space to learn point cloud representations that are both discriminative and transformation-sensitive. Conventional self-supervised learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they often suffer from an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations with limited variation across transformations. To address this, we propose Point-PNG that explicitly penalizes invariant collapse through pseudo-negatives generation, enabling the network to capture richer transformation cues while preserving discriminative representations. To this end, we introduce a parametric network, COnditional Pseudo-Negatives Embedding (COPE), which learns localized displacements induced by transformations within the latent space. A key challenge arises when jointly training COPE with the MAE, as it tends to converge to trivial identity mappings. To overcome this, we design a loss function based on pseudo-negatives conditioned on the transformation, which penalizes such trivial invariant solutions and enforces meaningful representation learning. We validate Point-PNG on shape classification and relative pose estimation tasks, showing competitive performance on ModelNet40 and ScanObjectNN under challenging evaluation protocols, and achieving superior accuracy in relative pose estimation compared to supervised baselines.

Point-PNG: Conditional Pseudo-Negatives Generation for Point Cloud Pre-Training

TL;DR

Point-PNG tackles the invariant-collapse issue in transformation-aware self-supervised learning for point clouds by introducing conditional pseudo-negatives generated by a COPE predictor. The framework jointly trains COPE with a transformer-based MAE backbone and a specialized loss that combines alignment, pseudo-negatives, and uniformity to preserve transformation sensitivity while maintaining discriminative power. Empirical results demonstrate competitive or superior performance on shape classification (ModelNet40, ScanObjectNN) and superior relative pose estimation compared to supervised baselines, including robust performance under challenging rotations. The approach also shows favorable parameter efficiency and extends to an image-domain evaluation on 3DIEBench, highlighting its broader applicability to transformation-aware representation learning.

Abstract

We propose Point-PNG, a novel self-supervised learning framework that generates conditional pseudo-negatives in the latent space to learn point cloud representations that are both discriminative and transformation-sensitive. Conventional self-supervised learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they often suffer from an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations with limited variation across transformations. To address this, we propose Point-PNG that explicitly penalizes invariant collapse through pseudo-negatives generation, enabling the network to capture richer transformation cues while preserving discriminative representations. To this end, we introduce a parametric network, COnditional Pseudo-Negatives Embedding (COPE), which learns localized displacements induced by transformations within the latent space. A key challenge arises when jointly training COPE with the MAE, as it tends to converge to trivial identity mappings. To overcome this, we design a loss function based on pseudo-negatives conditioned on the transformation, which penalizes such trivial invariant solutions and enforces meaningful representation learning. We validate Point-PNG on shape classification and relative pose estimation tasks, showing competitive performance on ModelNet40 and ScanObjectNN under challenging evaluation protocols, and achieving superior accuracy in relative pose estimation compared to supervised baselines.
Paper Structure (30 sections, 21 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 21 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of Point-PNG: It encodes a point cloud $\mathbf{x}_i$ and its transformed counterpart $\mathbf{x}_i^{+}$, extracting their global representations $\mathbf{z}_i$ and $\mathbf{z}_i^{+}$, respectively, using a shared transformer architecture. Simultaneously, COPE network outputs the weight $\Theta_g$ and {$\Theta_{g_r}$} based on the input and random transformations ${}g$ and {${}g_r$}. {$\Theta_{g_r}$} are used to generate pseudo-negatives {$\tilde{\mathbf{z}}_i^{\,g_r}$}.
  • Figure 2: Conceptual visualization of the loss $\mathcal{L}_{\text{Point-PNG}{}}$. Using COPE, we generate the anchor $\frac{\Theta_g\mathbf{z}_i}{\|\Theta_g\mathbf{z}_i\|}$ and a set of pseudo-negatives $\frac{\Theta_{g_r} \mathbf{z}_i}{\|\Theta_{g_r} \mathbf{z}_i\|}$ for the corresponding positive $\mathbf{z}_i^{+}$. As per our main goal, due to these pseudo-negatives, the model remains sensitive to different transformations rather than becoming completely invariant to them.
  • Figure 3: Illustration of relative pose estimation between source (yellow) and target (blue) point clouds for KPConv, EPN, E$^2$PN, and Point-PNG. Red indicates transformed source clouds.
  • Figure 4: Relative pose estimation on 7Scenes evaluated using the mean rotation error under varying maximum rotational angles. The radial axis displays the error values on a logarithmic scale for better visualization. The angular axis represents the specified maximum rotational angle, where we generate random rotations by uniformly sampling the angles within the specified maximum. The plot comparesFMR Huang_2020_CVPR, EquivReg zhu2022correspondence, Point-PNG. (Lower is better)
  • Figure 5: t-SNE Visualisation of learned features from ModelNet40 dataset by our proposed method Point-PNG. The circular markers indicate features of non-rotated point cloud and the 'cross' markers indicate features of rotated point cloud. Every class is visualised with a unique colour.
  • ...and 2 more figures