Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
Ivan Butakov, Alexander Semenenko, Alexander Tolmachev, Andrey Gladkov, Marina Munkhoeva, Alexey Frolov
TL;DR
The paper tackles the challenge of learning representations that conform to a specified latent distribution, which benefits generative modeling, disentanglement, and outlier detection. It introduces a simple, non-intrusive distribution-matching mechanism for Deep InfoMax by injecting independent noise $Z$ into the encoder outputs and retaining the DIM objective, enabling $f(X)$ to converge to $\mathcal{N}(0,I)$ or $\mathrm{U}([0,1]^d)$ distributions. The authors provide rigorous information-theoretic results: in the Gaussian DM setting, $I(f(X'); f(X)+Z) \le \frac{d}{2} \log(1+\frac{1}{\sigma^2})$ with equality under ideal conditions, and in the Uniform DM setting, $I(f(X'); f(X)+Z) \le d \log(1+\frac{1}{2\varepsilon})$ with a corresponding equality scenario, plus generalization via probability integral transform and normalizing flows. Experiments on MNIST, CIFAR, and ImageNet demonstrate a controllable trade-off between distribution matching quality and downstream task performance, and show DM embeddings can condition generative models like cGANs. Overall, the method achieves DM without extra networks and provides theoretical guarantees, enabling practical distribution-controlled SSRL with broad applicability.
Abstract
Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
