Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Ruofeng Yang; Xiangyuan Li; Bo Jiang; Shuai Li

Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Ruofeng Yang, Xiangyuan Li, Bo Jiang, Shuai Li

TL;DR

This work advances the theory of representation learnability for nonlinear SSL by analyzing gradient-descent training of a 1-layer nonlinear SSL model on a toy distribution containing a label-related feature $e_1$ and a hidden feature $e_2$. It introduces a two-step analysis: first study a simplified objective $ ilde{L}$ to locate a local minimum, then apply the exact Inverse Function Theorem to transfer results to the original nonconvex objective $L$, establishing the existence of a local minimum and precise feature learned by the SSL model. The main result is that SSL learns both $e_1$ and $e_2$ (i.e., label-related and hidden features) while nonlinear supervised learning (SL) learns only $e_1$, highlighting SSL’s superior representation learnability. The findings are validated by simulation experiments, illustrating the learning dynamics and the feature projections onto the learned subspace, and they offer a theoretical basis for designing SSL models that capture richer data features with nonlinear structures.

Abstract

Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. There are only a few theoretical works on data representation learnability, and many of those focus on final data representation, treating the nonlinear neural network as a ``black box". However, the accurate learning results of neural networks are crucial for describing the data distribution features learned by SSL models. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately. We consider a toy data distribution that contains two features: the label-related feature and the hidden feature. Unlike previous linear setting work that depends on closed-form solutions, we use the gradient descent algorithm to train a 1-layer nonlinear SSL model with a certain initialization region and prove that the model converges to a local minimum. Furthermore, different from the complex iterative analysis, we propose a new analysis process which uses the exact version of Inverse Function Theorem to accurately describe the features learned by the local minimum. With this local minimum, we prove that the nonlinear SSL model can capture the label-related feature and hidden feature at the same time. In contrast, the nonlinear supervised learning (SL) model can only learn the label-related feature. We also present the learning processes and results of the nonlinear SSL and SL model via simulation experiments.

Understanding Representation Learnability of Nonlinear Self-Supervised Learning

TL;DR

and a hidden feature

. It introduces a two-step analysis: first study a simplified objective

to locate a local minimum, then apply the exact Inverse Function Theorem to transfer results to the original nonconvex objective

, establishing the existence of a local minimum and precise feature learned by the SSL model. The main result is that SSL learns both

and

(i.e., label-related and hidden features) while nonlinear supervised learning (SL) learns only

, highlighting SSL’s superior representation learnability. The findings are validated by simulation experiments, illustrating the learning dynamics and the feature projections onto the learned subspace, and they offer a theoretical basis for designing SSL models that capture richer data features with nonlinear structures.

Abstract

Paper Structure (32 sections, 67 equations, 10 figures)

This paper contains 32 sections, 67 equations, 10 figures.

Introduction
Related Work
Theoretical analyses for final data representation.
Theoretical analyses for learning results of SSL.
Theoretical guarantees for supervised learning.
Problem Formulation
Data Distribution
Model
The SSL model.
The SL model.
Definitions and notations.
Notations.
SSL is Superior to SL in Learning Representation
The Learning Abilities of SSL and SL
Discussion
...and 17 more sections

Figures (10)

Figure 1: The structure of SSL model.
Figure 2: Theoretical Results of Theorem \ref{['thm:Convergence_noiseepexp_pro']}
Figure 3: Final weight matrix $W$
Figure 4: Learning curve
Figure 5: The projection of $e_2$
...and 5 more figures

Theorems & Definitions (9)

proof : Proof
proof : Proof of Theorem 1
proof : Proof
proof : Proof
proof : Proof
proof
proof
proof : Proof
proof

Understanding Representation Learnability of Nonlinear Self-Supervised Learning

TL;DR

Abstract

Understanding Representation Learnability of Nonlinear Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (9)