Neural Entropy

Akhil Premkumar

Neural Entropy

Akhil Premkumar

TL;DR

The paper defines neural entropy ${S_{\rm NN}}$ as the information stored by a diffusion model about its data distribution, tying it to the total entropy ${S_{\rm tot}}$ produced during forward diffusion and to the difficulty of reconstructing ${p}_{\rm d}$ from ${p}_{eq}$. It introduces entropy-matching and quasi-invariant distributions ${p_{eq}^{(t)}}$, deriving bounds that connect ${S_{\rm tot}}$, ${S_{\rm NN}}$, and KL divergences, thereby providing a thermodynamic interpretation of diffusion learning. Through experiments on Gaussian mixtures and image datasets (MNIST, CIFAR-10), it shows that ${S_{\rm NN}}$ grows slowly with the number of samples, often logarithmically, implying that diffusion models compress ensemble statistics efficiently rather than memorizing every example. The framework also yields a thermodynamic speed limit linking entropy production to diffusion speed and the Wasserstein distance between distributions, offering insights into how forward-process design influences learning efficiency and sample quality.

Abstract

We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, which is related to the total entropy produced by diffusion. Neural entropy is a function of not just the data distribution, but also the diffusive process itself. Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data.

Neural Entropy

TL;DR

The paper defines neural entropy

as the information stored by a diffusion model about its data distribution, tying it to the total entropy

produced during forward diffusion and to the difficulty of reconstructing

from

. It introduces entropy-matching and quasi-invariant distributions

, deriving bounds that connect

, and KL divergences, thereby providing a thermodynamic interpretation of diffusion learning. Through experiments on Gaussian mixtures and image datasets (MNIST, CIFAR-10), it shows that

grows slowly with the number of samples, often logarithmically, implying that diffusion models compress ensemble statistics efficiently rather than memorizing every example. The framework also yields a thermodynamic speed limit linking entropy production to diffusion speed and the Wasserstein distance between distributions, offering insights into how forward-process design influences learning efficiency and sample quality.

Abstract

Paper Structure (29 sections, 98 equations, 18 figures)

This paper contains 29 sections, 98 equations, 18 figures.

Introduction
Schrödinger's Gedankenexperiment
Diffusion models and Maxwell's demon
Entropy matching
Thermodynamic uncertainty
Experiments
Transport experiments
Storage experiments
Conclusion
Limitations
Related work
Random walk on a lattice
Reversal
Entropy production
Stochastic control
...and 14 more sections

Figures (18)

Figure 1: Neural entropy vs. number of samples for two image diffusion models.
Figure 2: Entropy production rate and total entropy as ${{p}_{{\rm d}}}$ is diffused to ${{p}_{0}}$ by the VPx and SL processes from \ref{['eq:FwdVPx']} and \ref{['eq:FwdSLDM']} respectively. The dashed lines are the ideal curves for $\dot{S}_{\rm tot}$ and $S_{\rm tot}$, while the solid lines are $\dot{S}_{\rm NN}$ and $S_{\rm NN}$ at the end of the $n_{\rm ep}$-th training epoch.
Figure 3: The evolution of neural entropy, cross-entropy, and loss over training epochs for an unconditional image diffusion model (VP) trained on the MNIST dataset. The different colors correspond to models trained on $n_c$ number of samples per class; $n_c=6000$ means the model was trained on the entire dataset. The growth in neural entropy with the number of samples is nearly logarithmic. The values of $S_{\rm NN}(T)$ at the end of training are shown in \ref{['fig:SNNvsN_All_VP']}.
Figure 4: Neural entropy vs. number of samples for a diffusion model with an MLP trained on Gaussian mixtures.
Figure 5: Time variables in the forward (top) and reverse (bottom) directions.
...and 13 more figures

Neural Entropy

TL;DR

Abstract

Neural Entropy

Authors

TL;DR

Abstract

Table of Contents

Figures (18)