Table of Contents
Fetching ...

Uniform-in-time convergence bounds for Persistent Contrastive Divergence Algorithms

Paul Felix Valsecchi Oliva, O. Deniz Akyildiz, Andrew Duncan

TL;DR

This work recasts persistent contrastive divergence for energy-based models as a continuous-time, two-time-scale Langevin diffusion, enabling uniform-in-time bounds between PCD iterates and the maximum likelihood solution via averaging. A Poisson-equation based corrector and a rigorous averaging framework connect the slow parameter dynamics to an averaged Langevin gradient flow, with explicit weak-error and UiT bounds. The authors develop and analyze two discretisations—Euler--Maruyama and S-ROCK—providing finite-time and UiT error guarantees, and show that the S-ROCK scheme stabilises training in stiff regimes. Experiments on synthetic data and MNIST demonstrate improved stability and sample quality with SPCD/S-ROCK relative to conventional PCD, underscoring the practical impact of stable, principled continuous-time training of EBMs.

Abstract

We propose a continuous-time formulation of persistent contrastive divergence (PCD) for maximum likelihood estimation (MLE) of unnormalised densities. Our approach expresses PCD as a coupled, multiscale system of stochastic differential equations (SDEs), which perform optimisation of the parameter and sampling of the associated parametrised density, simultaneously. From this novel formulation, we are able to derive explicit bounds for the error between the PCD iterates and the MLE solution for the model parameter. This is made possible by deriving uniform-in-time (UiT) bounds for the difference in moments between the multiscale system and the averaged regime. An efficient implementation of the continuous-time scheme is introduced, leveraging a class of explicit, stable intregators, stochastic orthogonal Runge-Kutta Chebyshev (S-ROCK), for which we provide explicit error estimates in the long-time regime. This leads to a novel method for training energy-based models (EBMs) with explicit error guarantees.

Uniform-in-time convergence bounds for Persistent Contrastive Divergence Algorithms

TL;DR

This work recasts persistent contrastive divergence for energy-based models as a continuous-time, two-time-scale Langevin diffusion, enabling uniform-in-time bounds between PCD iterates and the maximum likelihood solution via averaging. A Poisson-equation based corrector and a rigorous averaging framework connect the slow parameter dynamics to an averaged Langevin gradient flow, with explicit weak-error and UiT bounds. The authors develop and analyze two discretisations—Euler--Maruyama and S-ROCK—providing finite-time and UiT error guarantees, and show that the S-ROCK scheme stabilises training in stiff regimes. Experiments on synthetic data and MNIST demonstrate improved stability and sample quality with SPCD/S-ROCK relative to conventional PCD, underscoring the practical impact of stable, principled continuous-time training of EBMs.

Abstract

We propose a continuous-time formulation of persistent contrastive divergence (PCD) for maximum likelihood estimation (MLE) of unnormalised densities. Our approach expresses PCD as a coupled, multiscale system of stochastic differential equations (SDEs), which perform optimisation of the parameter and sampling of the associated parametrised density, simultaneously. From this novel formulation, we are able to derive explicit bounds for the error between the PCD iterates and the MLE solution for the model parameter. This is made possible by deriving uniform-in-time (UiT) bounds for the difference in moments between the multiscale system and the averaged regime. An efficient implementation of the continuous-time scheme is introduced, leveraging a class of explicit, stable intregators, stochastic orthogonal Runge-Kutta Chebyshev (S-ROCK), for which we provide explicit error estimates in the long-time regime. This leads to a novel method for training energy-based models (EBMs) with explicit error guarantees.

Paper Structure

This paper contains 15 sections, 23 theorems, 205 equations, 3 figures.

Key Result

Lemma 4.1

Let us suppose that, ass:dissx, ass:dissav and ass:poly hold for our system eq:sde, generating the semi-group $\widetilde{\mathcal{P}}$. Then, $\Phi$ given by, is of polynomial order in both $\theta$ and $z$, and is the unique solution to eq:poissoneq.

Figures (3)

  • Figure 1: The accuracy of the S-ROCK (in red) and the Euler--Maruyama (in blue) is compared over 50 simulations to highlight the greater stability of S-ROCK to small values of $\varepsilon$. In (a) we look at the larger step-size $\delta=0.01$ and in (b) the smaller step-size $\delta=0.001$, where the latter has a larger stability region, in which the Euler--Maruyama integrator converges. For further details see \ref{['appn']}.
  • Figure 2: The samples obtained by training the SPCD and PCD schemes, for 60 epochs (details of the learning routine are given in the \ref{['appn']}). In the top row the algorithms are trained on the images of ones, whilst in the second row the algorithms were trained on images for the digit 4. The samples shown are chosen randomly from the samples generated.
  • Figure 3: The model structure of $E(\theta, x)$, where the pyramids represent convolutions and the vectors represent fully connected linear layers. On the left we have a realisation of $x$ and on the right the scalar output of $E(\theta, x)$. We note that between convolutions we apply spectral normalisation and Swish activations (the Swish activation is given as $x\mapsto x\sigma(x)$, with $\sigma$ corresponding to the sigmoid activation). For the linear transformations we similarly normalise and apply Swish activations, except for the last layer.

Theorems & Definitions (46)

  • Remark 1
  • Example 1
  • Remark 2
  • Remark 3
  • Example 2
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • ...and 36 more