Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

Han Zhang; Yuan Cao

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

Han Zhang, Yuan Cao

TL;DR

This paper considers training a two-layer convolutional neural network to learn a toy image data model and shows that, under certain conditions on the number of labeled data, SimCLR pre-training combined with supervised fine-tuning achieves almost optimal test loss.

Abstract

SimCLR is one of the most popular contrastive learning methods for vision tasks. It pre-trains deep neural networks based on a large amount of unlabeled data by teaching the model to distinguish between positive and negative pairs of augmented images. It is believed that SimCLR can pre-train a deep neural network to learn efficient representations that can lead to a better performance of future supervised fine-tuning. Despite its effectiveness, our theoretical understanding of the underlying mechanisms of SimCLR is still limited. In this paper, we theoretically introduce a case study of the SimCLR method. Specifically, we consider training a two-layer convolutional neural network (CNN) to learn a toy image data model. We show that, under certain conditions on the number of labeled data, SimCLR pre-training combined with supervised fine-tuning achieves almost optimal test loss. Notably, the label complexity for SimCLR pre-training is far less demanding compared to direct training on supervised data. Our analysis sheds light on the benefits of SimCLR in learning with fewer labels.

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

TL;DR

Abstract

Paper Structure (39 sections, 30 theorems, 246 equations, 4 figures)

This paper contains 39 sections, 30 theorems, 246 equations, 4 figures.

Introduction
Notation.
Related Work
Self-supervised Learning.
Feature Learning Theory of Neural Networks.
Problem Setting
A data model for the case study
Self-supervised pre-training with SimCLR
Supervised Fine-tuning
Main result
Proof Sketch
Conclusion
Experiments
Synthetic-data experiments.
Real-data experiments on MNIST dataset
...and 24 more sections

Key Result

Theorem 4.2

Under Condition con:pre-train_fine_tune, for any $\epsilon > 0$, if $n_0\cdot \mathrm{SNR}^2 = \widetilde{\Omega}(1)$, then within $T_{\mathrm{SimCLR}} = \widetilde{\Omega}( \eta^{-1}\tau \|\bm{\mu} \|_2^{-2} )$ iterations of pre-training and $T = \widetilde{\Theta}( \eta^{-1} m\sigma_0^{-(q-2)} \|

Figures (4)

Figure 1: Illustration of the SimCLR pre-training and supervised fine-tuning stages.
Figure 2: Synthetic-data Experiments: Under same conditions on label complexity, SimCLR pre-training combined with supervised fine-tuning ($n_0=250, n=40$) achieves much smaller test loss than direct supervised learning ($n=40$).
Figure 3: Signals and the dataset in real-data experiments. The signals are originated from MNIST dataset. Following the data model in Definition \ref{['def:trainingdata']}, noise is added to the signals and obtain the dataset used in the training.
Figure 4: Real-data Experiments: Under same conditions on label complexity ($n=40$), SimCLR pre-training combined with supervised fine-tuning ($n_0=200,n=40$) achieves much smaller test loss than direct supervised learning.

Theorems & Definitions (32)

Definition 3.1
Theorem 4.2
Theorem 4.3: Theorems 4.3 and 4.4 in cao2022benign, bounds of direct supervised learning
Remark 4.4
Lemma 5.1
Lemma 5.2
Theorem 5.3
Lemma 5.4
Theorem 5.5
Lemma B.1
...and 22 more

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

TL;DR

Abstract

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (32)