Table of Contents
Fetching ...

Neural Entropy

Akhil Premkumar

TL;DR

The paper defines neural entropy ${S_{\rm NN}}$ as the information stored by a diffusion model about its data distribution, tying it to the total entropy ${S_{\rm tot}}$ produced during forward diffusion and to the difficulty of reconstructing ${p}_{\rm d}$ from ${p}_{eq}$. It introduces entropy-matching and quasi-invariant distributions ${p_{eq}^{(t)}}$, deriving bounds that connect ${S_{\rm tot}}$, ${S_{\rm NN}}$, and KL divergences, thereby providing a thermodynamic interpretation of diffusion learning. Through experiments on Gaussian mixtures and image datasets (MNIST, CIFAR-10), it shows that ${S_{\rm NN}}$ grows slowly with the number of samples, often logarithmically, implying that diffusion models compress ensemble statistics efficiently rather than memorizing every example. The framework also yields a thermodynamic speed limit linking entropy production to diffusion speed and the Wasserstein distance between distributions, offering insights into how forward-process design influences learning efficiency and sample quality.

Abstract

We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, which is related to the total entropy produced by diffusion. Neural entropy is a function of not just the data distribution, but also the diffusive process itself. Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data.

Neural Entropy

TL;DR

The paper defines neural entropy as the information stored by a diffusion model about its data distribution, tying it to the total entropy produced during forward diffusion and to the difficulty of reconstructing from . It introduces entropy-matching and quasi-invariant distributions , deriving bounds that connect , , and KL divergences, thereby providing a thermodynamic interpretation of diffusion learning. Through experiments on Gaussian mixtures and image datasets (MNIST, CIFAR-10), it shows that grows slowly with the number of samples, often logarithmically, implying that diffusion models compress ensemble statistics efficiently rather than memorizing every example. The framework also yields a thermodynamic speed limit linking entropy production to diffusion speed and the Wasserstein distance between distributions, offering insights into how forward-process design influences learning efficiency and sample quality.

Abstract

We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, which is related to the total entropy produced by diffusion. Neural entropy is a function of not just the data distribution, but also the diffusive process itself. Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data.
Paper Structure (29 sections, 98 equations, 18 figures)

This paper contains 29 sections, 98 equations, 18 figures.

Figures (18)

  • Figure 1: Neural entropy vs. number of samples for two image diffusion models.
  • Figure 2: Entropy production rate and total entropy as ${{p}_{{\rm d}}}$ is diffused to ${{p}_{0}}$ by the VPx and SL processes from \ref{['eq:FwdVPx']} and \ref{['eq:FwdSLDM']} respectively. The dashed lines are the ideal curves for $\dot{S}_{\rm tot}$ and $S_{\rm tot}$, while the solid lines are $\dot{S}_{\rm NN}$ and $S_{\rm NN}$ at the end of the $n_{\rm ep}$-th training epoch.
  • Figure 3: The evolution of neural entropy, cross-entropy, and loss over training epochs for an unconditional image diffusion model (VP) trained on the MNIST dataset. The different colors correspond to models trained on $n_c$ number of samples per class; $n_c=6000$ means the model was trained on the entire dataset. The growth in neural entropy with the number of samples is nearly logarithmic. The values of $S_{\rm NN}(T)$ at the end of training are shown in \ref{['fig:SNNvsN_All_VP']}.
  • Figure 4: Neural entropy vs. number of samples for a diffusion model with an MLP trained on Gaussian mixtures.
  • Figure 5: Time variables in the forward (top) and reverse (bottom) directions.
  • ...and 13 more figures