Neural Entropy
Akhil Premkumar
TL;DR
The paper defines neural entropy ${S_{\rm NN}}$ as the information stored by a diffusion model about its data distribution, tying it to the total entropy ${S_{\rm tot}}$ produced during forward diffusion and to the difficulty of reconstructing ${p}_{\rm d}$ from ${p}_{eq}$. It introduces entropy-matching and quasi-invariant distributions ${p_{eq}^{(t)}}$, deriving bounds that connect ${S_{\rm tot}}$, ${S_{\rm NN}}$, and KL divergences, thereby providing a thermodynamic interpretation of diffusion learning. Through experiments on Gaussian mixtures and image datasets (MNIST, CIFAR-10), it shows that ${S_{\rm NN}}$ grows slowly with the number of samples, often logarithmically, implying that diffusion models compress ensemble statistics efficiently rather than memorizing every example. The framework also yields a thermodynamic speed limit linking entropy production to diffusion speed and the Wasserstein distance between distributions, offering insights into how forward-process design influences learning efficiency and sample quality.
Abstract
We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, which is related to the total entropy produced by diffusion. Neural entropy is a function of not just the data distribution, but also the diffusive process itself. Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data.
