On the Universality of Self-Supervised Learning

Wenwen Qiang; Jingyao Wang; Changwen Zheng; Hui Xiong; Gang Hua

On the Universality of Self-Supervised Learning

Wenwen Qiang, Jingyao Wang, Changwen Zheng, Hui Xiong, Gang Hua

TL;DR

The paper addresses what constitutes a good self-supervised representation by defining SSL universality as discriminability, generalizability, and transferability, and then explicitly modeling these properties via General SSL (GeSSL). GeSSL uses a bi-level optimization framework where an inner loop learns a proxy model $f'$ on a support set with a discriminative loss $L_{disc}$ guided by an auxiliary network $g$, while an outer loop updates the base model $f$ and threshold model $g$ using a query set to ensure cross-task generalization. The authors prove a generalization bound showing bounded risk on unseen tasks and demonstrate strong empirical gains across unsupervised, semi-supervised, transfer, and few-shot benchmarks, validating the universality-driven approach. Overall, GeSSL offers a principled path to universal representations in SSL with solid theory and broad empirical coverage.

Abstract

In this paper, we investigate what constitutes a good representation or model in self-supervised learning (SSL). We argue that a good representation should exhibit universality, characterized by three essential properties: discriminability, generalizability, and transferability. While these capabilities are implicitly desired in most SSL frameworks, existing methods lack an explicit modeling of universality, and its theoretical foundations remain underexplored. To address these gaps, we propose General SSL (GeSSL), a novel framework that explicitly models universality from three complementary dimensions: the optimization objective, the parameter update mechanism, and the learning paradigm. GeSSL integrates a bi-level optimization structure that jointly models task-specific adaptation and cross-task consistency, thereby capturing all three aspects of universality within a unified SSL objective. Furthermore, we derive a theoretical generalization bound, ensuring that the optimization process of GeSSL consistently leads to representations that generalize well to unseen tasks. Empirical results on multiple benchmark datasets demonstrate that GeSSL consistently achieves superior performance across diverse downstream tasks, validating its effectiveness in modeling universal representations.

On the Universality of Self-Supervised Learning

TL;DR

on a support set with a discriminative loss

guided by an auxiliary network

, while an outer loop updates the base model

and threshold model

using a query set to ensure cross-task generalization. The authors prove a generalization bound showing bounded risk on unseen tasks and demonstrate strong empirical gains across unsupervised, semi-supervised, transfer, and few-shot benchmarks, validating the universality-driven approach. Overall, GeSSL offers a principled path to universal representations in SSL with solid theory and broad empirical coverage.

Abstract

Paper Structure (38 sections, 1 theorem, 26 equations, 11 figures, 20 tables)

This paper contains 38 sections, 1 theorem, 26 equations, 11 figures, 20 tables.

Introduction
Revisiting SSL from a Task Perspective
Methodology
Definition and Explanation of Universality
Explicit Modeling Universality in SSL
Theoretical Analysis
Empirical Evaluation
Performance Comparison
Ablation Study and Analysis
Related Work
Conclusion
Proofs
Implementation Details
Benchmark Datasets
Baselines
...and 23 more sections

Key Result

Theorem 4.1

Let $\theta^*$ denote the parameter after bi-level training over $N$ tasks (mini-batches). For any new task $\tau_{\text{test}} \sim \mathcal{T}$, let $\theta^*_{\text{test}} = A(\theta^*, S_{\tau_{\text{test}}})$ denote the adapted parameter, under Assumption ass:main, with probability at least $1 where $\theta'$ is the adapted parameter for training task $\tau_i$ (the $i$-th mini-batch).

Figures (11)

Figure 1: Overview of GeSSL. The meaning of the different components is marked below the figure.
Figure 2: Model efficiency.
Figure 3: Ablation study of $\mu$.
Figure 4: Ablation study of $k$.
Figure 5: Universality performance of different models on five image-based tasks (top row) and five video-based tasks (bottom row). We choose $\sigma-$measure as the measurement. It is worth noting that the smaller the $\sigma-$measurefen score, the better the effect. Meanwhile, we normalize the results of $\sigma-$measurefen scores on different datasets and compare the performance between baselines by comparing the corresponding branch of the fan chart.
...and 6 more figures

Theorems & Definitions (2)

Definition 3.1: Universality
Theorem 4.1

On the Universality of Self-Supervised Learning

TL;DR

Abstract

On the Universality of Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)