Table of Contents
Fetching ...

Explicit Mutual Information Maximization for Self-Supervised Learning

Lele Chang, Peilin Liu, Qinghai Guo, Fei Wen

TL;DR

This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution.

Abstract

Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.

Explicit Mutual Information Maximization for Self-Supervised Learning

TL;DR

This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution.

Abstract

Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.
Paper Structure (17 sections, 26 equations, 4 figures, 7 tables)

This paper contains 17 sections, 26 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The MMI objective explicitly measures the MI $I(Z;Z^{\prime})$ between the embeddings $Z$ and $Z^{\prime}$ generated by two identical networks $f_{\omega}(\cdot)$ that are fed transformed versions of sample $S$. Maximizing $I(Z;Z^{\prime})$ not only maximizes the dependency between the embeddings $Z$ and $Z^{\prime}$ by minimizing their joint entropy $H(Z,Z^{\prime})$, but also maximizes their marginal entropy $H(Z)$ and $H(Z^{\prime})$, respectively, which naturally avoids trivial constant solutions.
  • Figure 2: Univariate generalized Gaussian distribution with different values of the shape parameter.
  • Figure 3: Illustration of a fourth-order approximation of the log function in (10) in main paper.
  • Figure 4: The convergence curves of our method on CIFAR-100 for different values of the parameter $\beta$ used for the rescaling operation.