Explicit Mutual Information Maximization for Self-Supervised Learning

Lele Chang; Peilin Liu; Qinghai Guo; Fei Wen

Explicit Mutual Information Maximization for Self-Supervised Learning

Lele Chang, Peilin Liu, Qinghai Guo, Fei Wen

TL;DR

This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution.

Abstract

Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.

Explicit Mutual Information Maximization for Self-Supervised Learning

TL;DR

Abstract

Paper Structure (17 sections, 26 equations, 4 figures, 7 tables)

This paper contains 17 sections, 26 equations, 4 figures, 7 tables.

Introduction
Method
The Maximum Mutual Information Criterion
Implementation of the MMI Criterion for SSL
Experiments
LINEAR PROBING
ABLATION ON BATCH SIZE
Conclusions
Related Work
Siamese Networks Based Self-Supervised Representation Learning
Mutual Information Maximization For Self-Supervised Learning
Proof of Theorem 1
Proof of Theorem 2
Ablation Study
Loss Function
...and 2 more sections

Figures (4)

Figure 1: The MMI objective explicitly measures the MI $I(Z;Z^{\prime})$ between the embeddings $Z$ and $Z^{\prime}$ generated by two identical networks $f_{\omega}(\cdot)$ that are fed transformed versions of sample $S$. Maximizing $I(Z;Z^{\prime})$ not only maximizes the dependency between the embeddings $Z$ and $Z^{\prime}$ by minimizing their joint entropy $H(Z,Z^{\prime})$, but also maximizes their marginal entropy $H(Z)$ and $H(Z^{\prime})$, respectively, which naturally avoids trivial constant solutions.
Figure 2: Univariate generalized Gaussian distribution with different values of the shape parameter.
Figure 3: Illustration of a fourth-order approximation of the log function in (10) in main paper.
Figure 4: The convergence curves of our method on CIFAR-100 for different values of the parameter $\beta$ used for the rescaling operation.

Explicit Mutual Information Maximization for Self-Supervised Learning

TL;DR

Abstract

Explicit Mutual Information Maximization for Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)