GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Chengyao Wang; Li Jiang; Xiaoyang Wu; Zhuotao Tian; Bohao Peng; Hengshuang Zhao; Jiaya Jia

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia

TL;DR

This work proposes Group Contrast, a novel approach that combines segment grouping and semantic-aware contrastive learning that augments the semantic information extracted from segment grouping and helps to alleviate the issue of “semantic conflict”.

Abstract

Self-supervised 3D representation learning aims to learn effective representations from large-scale unlabeled point clouds. Most existing approaches adopt point discrimination as the pretext task, which assigns matched points in two distinct views as positive pairs and unmatched points as negative pairs. However, this approach often results in semantically identical points having dissimilar representations, leading to a high number of false negatives and introducing a "semantic conflict" problem. To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning. Segment grouping partitions points into semantically meaningful regions, which enhances semantic coherence and provides semantic guidance for the subsequent contrastive representation learning. Semantic-aware contrastive learning augments the semantic information extracted from segment grouping and helps to alleviate the issue of "semantic conflict". We conducted extensive experiments on multiple 3D scene understanding tasks. The results demonstrate that GroupContrast learns semantically meaningful representations and achieves promising transfer learning performance.

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

TL;DR

Abstract

Paper Structure (16 sections, 9 equations, 8 figures, 5 tables)

This paper contains 16 sections, 9 equations, 8 figures, 5 tables.

Introduction
Related Work
Method
Overall Framework
Segment Grouping
Semantic-aware Contrastive Learning
Overall Optimization Objective
Experiments
Main Properties
Results Comparison
Conclusion
Implementation Details
Pre-training
Fine-tuning
Collaboration with Foundation Models
...and 1 more sections

Figures (8)

Figure 1: Visualization of activation maps depicting cosine similarity to the query point (indicated by a yellow cross) in the scene. Our approach demonstrates superior effectiveness in discriminating semantically similar points compared to CSC csc.
Figure 2: Overview of our proposed GroupContrast framework. Our framework uses two neural networks, each comprising a backbone and two projectors for segment grouping and contrastive learning. The parameters of the teacher network are updated as an exponential moving average (EMA) of the parameters of the student network. The student network includes an additional asymmetric predictor for contrastive learning. The Segment Grouping module assigns each point to one of $n$ prototypes, and this clustering result serves as a guide for effective contrastive representation learning.
Figure 3: Segment Grouping is optimized by distilling the assignment scores between each segment and the $n$ prototypes from the teacher network to the student network. An informative weight is employed to make the student network focus on more challenging segments.
Figure 4: The result of Segment Grouping. We compare the grouping results with original geometry segments graph_segment and semantic ground truth. Segment grouping effectively groups points into semantically meaningful regions without human supervision.
Figure 5: Contrastive Learning. We use an InfoNCE loss cpc to aggregate points within the same group and scatter points across different groups, as indicated by the Segment Grouping result. Here the red point in view $V_q$ serves as a query, the red points in view $V_k$ are positive samples, and the blue points in view $V_k$ are negative samples. Both modules are conducted on overlapped regions of the two augmented views only, which are highlighted with darker colors in the figure.
...and 3 more figures

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

TL;DR

Abstract

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Authors

TL;DR

Abstract

Table of Contents

Figures (8)