Table of Contents
Fetching ...

Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax

Ivan Butakov, Alexander Semenenko, Alexander Tolmachev, Andrey Gladkov, Marina Munkhoeva, Alexey Frolov

TL;DR

The paper tackles the challenge of learning representations that conform to a specified latent distribution, which benefits generative modeling, disentanglement, and outlier detection. It introduces a simple, non-intrusive distribution-matching mechanism for Deep InfoMax by injecting independent noise $Z$ into the encoder outputs and retaining the DIM objective, enabling $f(X)$ to converge to $\mathcal{N}(0,I)$ or $\mathrm{U}([0,1]^d)$ distributions. The authors provide rigorous information-theoretic results: in the Gaussian DM setting, $I(f(X'); f(X)+Z) \le \frac{d}{2} \log(1+\frac{1}{\sigma^2})$ with equality under ideal conditions, and in the Uniform DM setting, $I(f(X'); f(X)+Z) \le d \log(1+\frac{1}{2\varepsilon})$ with a corresponding equality scenario, plus generalization via probability integral transform and normalizing flows. Experiments on MNIST, CIFAR, and ImageNet demonstrate a controllable trade-off between distribution matching quality and downstream task performance, and show DM embeddings can condition generative models like cGANs. Overall, the method achieves DM without extra networks and provides theoretical guarantees, enabling practical distribution-controlled SSRL with broad applicability.

Abstract

Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.

Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax

TL;DR

The paper tackles the challenge of learning representations that conform to a specified latent distribution, which benefits generative modeling, disentanglement, and outlier detection. It introduces a simple, non-intrusive distribution-matching mechanism for Deep InfoMax by injecting independent noise into the encoder outputs and retaining the DIM objective, enabling to converge to or distributions. The authors provide rigorous information-theoretic results: in the Gaussian DM setting, with equality under ideal conditions, and in the Uniform DM setting, with a corresponding equality scenario, plus generalization via probability integral transform and normalizing flows. Experiments on MNIST, CIFAR, and ImageNet demonstrate a controllable trade-off between distribution matching quality and downstream task performance, and show DM embeddings can condition generative models like cGANs. Overall, the method achieves DM without extra networks and provides theoretical guarantees, enabling practical distribution-controlled SSRL with broad applicability.

Abstract

Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.

Paper Structure

This paper contains 24 sections, 6 theorems, 57 equations, 7 figures, 3 tables.

Key Result

Lemma B.1

Assume the conditions of theorem:Gaussian_distribution_mathcing are satisfied, then for Gaussian distribution matching, we have with equality holding exactly when $f$ is weakly invariant and $f(X) \sim \mathcal{N}(0, \mathrm{I})$.

Figures (7)

  • Figure 1: Results for MNIST dataset in the Gaussian DM setup for $d = 2$ with varying capacity $C = \frac{d}{2} \log \left( 1 + 1/\sigma^2 \right)$, measured in nats (units of information based on natural logarithms) . The dotted line denotes the minimal capacity required to preserve the information about the class labels in $f(X) + Z$. The dashed line represents the upper bound on the mutual information (\ref{['eq:Gaussian_embeddings_MI_upper_bound']}). We run $5$ experiments for each point and report mean values and $99\%$ asymptotic confidence intervals. InfoNCE loss is used to approximate (\ref{['eq:Donsker_Varadhan_MI']}).
  • Figure 2: Visualization of two-dimensional representations of the MNIST handwritten digits dataset.
  • Figure 3: Results of conditional generation.
  • Figure 4: Results of unconditional generation.
  • Figure 5: Visualization of 2D representations of the CIFAR-10 dataset, lr=0.1, 400 epochs.
  • ...and 2 more figures

Theorems & Definitions (16)

  • Remark 2.1
  • Remark 2.2
  • Definition 3.1
  • Remark 3.2: Butakov2024MutualIE
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Corollary B.3
  • proof
  • ...and 6 more