On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

Jeongheon Oh; Kibok Lee

On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

Jeongheon Oh, Kibok Lee

TL;DR

This work addresses the gap in applying supervision to asymmetric non-contrastive learning (ANCL) by proposing SupSiam and SupBYOL, supervised adaptations that incorporate a supervised target via a target pool. The method blends self-supervised and supervised objectives as $\ell = \alpha \ell_{ssl} + (1-\alpha) \ell_{sup}$, which reduces intra-class variance and mitigates collapse; a theoretical analysis shows the optimal predictor aligns eigenstructures and reduces within-class dispersion by a factor $\alpha$. Empirically, supervised ANCL improves linear evaluation, object detection, and transfer, with the best performance on fine-grained tasks while avoiding collapse, and exhibits robust behavior across architectures and datasets. The results suggest supervision can enhance ANCL effectiveness with competitive computational costs, offering practical benefits for diverse downstream applications.

Abstract

Supervised contrastive representation learning has been shown to be effective in various transfer learning scenarios. However, while asymmetric non-contrastive learning (ANCL) often outperforms its contrastive learning counterpart in self-supervised representation learning, the extension of ANCL to supervised scenarios is less explored. To bridge the gap, we study ANCL for supervised representation learning, coined SupSiam and SupBYOL, leveraging labels in ANCL to achieve better representations. The proposed supervised ANCL framework improves representation learning while avoiding collapse. Our analysis reveals that providing supervision to ANCL reduces intra-class variance, and the contribution of supervision should be adjusted to achieve the best performance. Experiments demonstrate the superiority of supervised ANCL across various datasets and tasks. The code is available at: https://github.com/JH-Oh-23/Sup-ANCL.

On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

TL;DR

, which reduces intra-class variance and mitigates collapse; a theoretical analysis shows the optimal predictor aligns eigenstructures and reduces within-class dispersion by a factor

. Empirically, supervised ANCL improves linear evaluation, object detection, and transfer, with the best performance on fine-grained tasks while avoiding collapse, and exhibits robust behavior across architectures and datasets. The results suggest supervision can enhance ANCL effectiveness with competitive computational costs, offering practical benefits for diverse downstream applications.

Abstract

Paper Structure (35 sections, 6 theorems, 32 equations, 3 figures, 14 tables)

This paper contains 35 sections, 6 theorems, 32 equations, 3 figures, 14 tables.

Introduction
Related Works
Method
Preliminary: Self-Supervised ANCL
Supervised ANCL
Analysis of the Effect of Supervision
Problem Setup
Supervision Reduces Intra-Class Variance
Effect of Reducing Intra-Class Variance
Experiment
Pretraining
Linear Evaluation
Object Detection
Transfer Learning via Linear Evaluation
Few-Shot Classification
...and 20 more sections

Key Result

Proposition 4.4

The covariance matrices of features $\mathbb{E}\left[ z_1 z_1^\top \right]$, $\mathbb{E}\left[ z_2 z_1^\top \right]$, and $\mathbb{E}\left[ z_2^\text{\normalfont sup} z_1^\top \right]$ share the same eigenspace in the data space.

Figures (3)

Figure 1: Our proposed supervised ANCL framework. The components we added to the standard ANCL are highlighted with a red box. We manage a target pool to ensure the existence of positive samples sharing the same class label in the form of $z_2^\text{sup}$. Stop-gradient ($\mathop{\mathrm{\operatorname{sg}}}\nolimits$) applied to $z_2$ and $z_2^\text{sup}$ ensures that the gradients propagate through the online branch with the predictor only. The target branch without the predictor either shares parameters with the online branch (SupSiam), or exhibits a momentum network (SupBYOL).
Figure 2: t-SNE visualization of SupSiam features with different $\alpha$ on the toy dataset.
Figure 3: t-SNE visualization of SupSiam features with different $\alpha$ on 15 dog and 5 bird classes from ImageNet-100.

Theorems & Definitions (10)

Proposition 4.4
proof
Theorem 4.5
Theorem 4.6
Proposition 1.0
proof
Theorem 1.1
proof
Theorem 1.1
proof

On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

TL;DR

Abstract

On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)