Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds

Shaofeng Zhang; Xuanqi Chen; Xiangdong Zhang; Sitong Wu; Junchi Yan

Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds

Shaofeng Zhang, Xuanqi Chen, Xiangdong Zhang, Sitong Wu, Junchi Yan

TL;DR

The paper targets the limitation of generative MAE-based SSL for 3D point clouds in learning high-level discriminative features, proposing a contrastive-only approach tailored to 3D geometry.CSCon introduces a dual-branch center-surrounding masking scheme and a novel inner-instance patch-level contrastive loss, all operating without a decoder and with shared encoder parameters.Extensive experiments on ShapeNet and downstream tasks (ScanObjectNN, ModelNet40, ShapeNetPart, S3DIS) show CSCon achieving state-of-the-art results under several protocols, with notable gains over baselines like Point-MAE.Ablation studies substantiate the importance of center-surrounding positives, inner-instance loss, parameter sharing, and masking strategies, demonstrating CSCon’s effectiveness in capturing both global and local 3D structure.

Abstract

Most existing self-supervised learning (SSL) approaches for 3D point clouds are dominated by generative methods based on Masked Autoencoders (MAE). However, these generative methods have been proven to struggle to capture high-level discriminative features effectively, leading to poor performance on linear probing and other downstream tasks. In contrast, contrastive methods excel in discriminative feature representation and generalization ability on image data. Despite this, contrastive learning (CL) in 3D data remains scarce. Besides, simply applying CL methods designed for 2D data to 3D fails to effectively learn 3D local details. To address these challenges, we propose a novel Dual-Branch \textbf{C}enter-\textbf{S}urrounding \textbf{Con}trast (CSCon) framework. Specifically, we apply masking to the center and surrounding parts separately, constructing dual-branch inputs with center-biased and surrounding-biased representations to better capture rich geometric information. Meanwhile, we introduce a patch-level contrastive loss to further enhance both high-level information and local sensitivity. Under the FULL and ALL protocols, CSCon achieves performance comparable to generative methods; under the MLP-LINEAR, MLP-3, and ONLY-NEW protocols, our method attains state-of-the-art results, even surpassing cross-modal approaches. In particular, under the MLP-LINEAR protocol, our method outperforms the baseline (Point-MAE) by \textbf{7.9\%}, \textbf{6.7\%}, and \textbf{10.3\%} on the three variants of ScanObjectNN, respectively. The code will be made publicly available.

Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds

TL;DR

Abstract

Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)