Table of Contents
Fetching ...

Mean-Field Microcanonical Gradient Descent

Marcus Häggbom, Morten Karlsmark, Joakim Andén

TL;DR

The paper addresses entropy collapse in microcanonical gradient descent for high-dimensional energy-based sampling by proposing a mean-field variant (MF--MGDM) that updates a batch of samples via the batch mean energy. It provides a theoretical entropy bound showing that increasing the batch size tightens entropy loss, and demonstrates empirically that MF--MGDM preserves entropy better than MGDM while maintaining competitive likelihood on synthetic AR$(p)$, CIR, and real financial time series (e.g., S&P 500, yields). The approach combines micro- and macrocanonical concepts with efficient gradient-based sampling, yielding improved KL trade-offs and more stable entropy dynamics. Limitations include stationarity assumptions, differentiability of the energy function, and the need for further exploration of forward KL and richer energy features.

Abstract

Microcanonical gradient descent is a sampling procedure for energy-based models allowing for efficient sampling of distributions in high dimension. It works by transporting samples from a high-entropy distribution, such as Gaussian white noise, to a low-energy region using gradient descent. We put this model in the framework of normalizing flows, showing how it can often overfit by losing an unnecessary amount of entropy in the descent. As a remedy, we propose a mean-field microcanonical gradient descent that samples several weakly coupled data points simultaneously, allowing for better control of the entropy loss while paying little in terms of likelihood fit. We study these models in the context of financial time series, illustrating the improvements on both synthetic and real data.

Mean-Field Microcanonical Gradient Descent

TL;DR

The paper addresses entropy collapse in microcanonical gradient descent for high-dimensional energy-based sampling by proposing a mean-field variant (MF--MGDM) that updates a batch of samples via the batch mean energy. It provides a theoretical entropy bound showing that increasing the batch size tightens entropy loss, and demonstrates empirically that MF--MGDM preserves entropy better than MGDM while maintaining competitive likelihood on synthetic AR, CIR, and real financial time series (e.g., S&P 500, yields). The approach combines micro- and macrocanonical concepts with efficient gradient-based sampling, yielding improved KL trade-offs and more stable entropy dynamics. Limitations include stationarity assumptions, differentiability of the energy function, and the need for further exploration of forward KL and richer energy features.

Abstract

Microcanonical gradient descent is a sampling procedure for energy-based models allowing for efficient sampling of distributions in high dimension. It works by transporting samples from a high-entropy distribution, such as Gaussian white noise, to a low-energy region using gradient descent. We put this model in the framework of normalizing flows, showing how it can often overfit by losing an unnecessary amount of entropy in the descent. As a remedy, we propose a mean-field microcanonical gradient descent that samples several weakly coupled data points simultaneously, allowing for better control of the entropy loss while paying little in terms of likelihood fit. We study these models in the context of financial time series, illustrating the improvements on both synthetic and real data.
Paper Structure (21 sections, 1 theorem, 36 equations, 11 figures, 1 table)

This paper contains 21 sections, 1 theorem, 36 equations, 11 figures, 1 table.

Key Result

Theorem 4.1

Assume $\Phi \in \bm{\mathrm{C}}^2$, with $\beta$ and $\eta$ denoting the Lipschitz constants of $\Phi$ and $\nabla \Phi$, respectively. Denote $\overline{q}_T^N$ as the distribution of the MF--MGDM model with $N$ particles after $T$ iterations. Then the entropy rate $(Nd)^{-1}H(\overline{q}_T^N)$ a

Figures (11)

  • Figure 1: Densities of $\Phi(X)$, using fitted 2D Gaussians, at different stages of the descent for MGDM and MF--MGDM. In (b) and (c), $T$ is the respective optimal number of steps to minimize KL divergence. The true distribution $p$ is an AR(1) process with $\varphi=0.1$ and $\sigma^2 = 0.99$.
  • Figure 2: Reverse KL divergence for the AR(1) example. The negative entropy and expected log-likelihood are plotted on the left-hand side, and the divergence on the right.
  • Figure 3: Illustration of $\Phi$-pushforward measures of the true distribution in blue centered close to the target energy $\alpha$, and the approximation in orange. In the regular MGDM, each particle individually seeks to minimize its distance to the origin in energy space, potentially causing a collapse; in the mean-field version, the particles move approximately in parallel.
  • Figure 4: Reverse KL divergence through gradient descent with respect to the true model AR(1) for MF--MGDM with different mean-field batch sizes $N$, and with Monte Carlo sample size 128.
  • Figure 5: Reverse KL divergence (top), negative entropy (bottom, solid) and log-likelihood (bottom, dashed) through the descent. Blue is regular MGDM and orange is MF--MGDM. The energy function used for each distribution is the corresponding optimal energy function according to Table \ref{['tab:min-kl']}, i.e., (a) and (d) use ACF while (b) and (c) use scattering spectra. $N=128$.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem 4.1