Table of Contents
Fetching ...

Learning Energy-Based Models by Self-normalising the Likelihood

Hugo Senetaire, Paul Jeha, Pierre-Alexandre Mattei, Jes Frellsen

TL;DR

The paper tackles the challenge of training energy-based models with intractable normalisation constants by introducing the self-normalised log-likelihood (SNL). SNL adds a single learnable parameter $b$ so that maximizing $\ell_{\mathrm{SNL}}(\theta,b)$ yields the maximum-likelihood solution and recovers $\log Z_{\theta}$, while enabling unbiased gradient estimates via a proposal distribution. It proves concavity for exponential families and offers an information-theoretic perspective linking SNL to a generalized KL divergence; the framework extends to regression via $b_{\phi}(x)$ and to VAEs through SNELBO. Empirically, SNL-based EBMs outperform traditional methods on density estimation and regression tasks, and SNELBO enables VAEs with latent-EBM priors to achieve improved objective scores, all while maintaining simplicity and stability of training.

Abstract

Training an energy-based model (EBM) with maximum likelihood is challenging due to the intractable normalisation constant. Traditional methods rely on expensive Markov chain Monte Carlo (MCMC) sampling to estimate the gradient of logartihm of the normalisation constant. We propose a novel objective called self-normalised log-likelihood (SNL) that introduces a single additional learnable parameter representing the normalisation constant compared to the regular log-likelihood. SNL is a lower bound of the log-likelihood, and its optimum corresponds to both the maximum likelihood estimate of the model parameters and the normalisation constant. We show that the SNL objective is concave in the model parameters for exponential family distributions. Unlike the regular log-likelihood, the SNL can be directly optimised using stochastic gradient techniques by sampling from a crude proposal distribution. We validate the effectiveness of our proposed method on various density estimation tasks as well as EBMs for regression. Our results show that the proposed method, while simpler to implement and tune, outperforms existing techniques.

Learning Energy-Based Models by Self-normalising the Likelihood

TL;DR

The paper tackles the challenge of training energy-based models with intractable normalisation constants by introducing the self-normalised log-likelihood (SNL). SNL adds a single learnable parameter so that maximizing yields the maximum-likelihood solution and recovers , while enabling unbiased gradient estimates via a proposal distribution. It proves concavity for exponential families and offers an information-theoretic perspective linking SNL to a generalized KL divergence; the framework extends to regression via and to VAEs through SNELBO. Empirically, SNL-based EBMs outperform traditional methods on density estimation and regression tasks, and SNELBO enables VAEs with latent-EBM priors to achieve improved objective scores, all while maintaining simplicity and stability of training.

Abstract

Training an energy-based model (EBM) with maximum likelihood is challenging due to the intractable normalisation constant. Traditional methods rely on expensive Markov chain Monte Carlo (MCMC) sampling to estimate the gradient of logartihm of the normalisation constant. We propose a novel objective called self-normalised log-likelihood (SNL) that introduces a single additional learnable parameter representing the normalisation constant compared to the regular log-likelihood. SNL is a lower bound of the log-likelihood, and its optimum corresponds to both the maximum likelihood estimate of the model parameters and the normalisation constant. We show that the SNL objective is concave in the model parameters for exponential family distributions. Unlike the regular log-likelihood, the SNL can be directly optimised using stochastic gradient techniques by sampling from a crude proposal distribution. We validate the effectiveness of our proposed method on various density estimation tasks as well as EBMs for regression. Our results show that the proposed method, while simpler to implement and tune, outperforms existing techniques.

Paper Structure

This paper contains 39 sections, 7 theorems, 59 equations, 4 figures, 15 tables, 2 algorithms.

Key Result

Lemma 2.1

For all $z>0$,

Figures (4)

  • Figure 1: The SNL for a Gaussian with unknown mean $\theta \in \mathbb{R}$ and unit variance. The SNL a function of both $\theta$ and the additional parameter $b$, estimating the normalising constant. The black line corresponds to maximising $b$ for each given $\theta$, which exactly recovers the log-likelihood. The red star is the maximum log-likelihood, that is also the maximum of $\ell_\textrm{SNL}(\theta, b)$, see details in \ref{['sec:gaussian case']}.
  • Figure 2: Performance evolution of the EBMs for regression trained on Cell Count and UTKFaces dataset.
  • Figure 3: Visualisation of the two toy regression datasets.
  • Figure 4: Each row is a dataset, the first column displays samples from the dataset, the second column displays the energy function of an EBM trained with the self normalised log-likelihood (ours), the third column displays the energy function of an EBM trained with NCE. We use a standard Gaussian as base distribution for both training methods. These parameterisations corresponds to the first two lines of \ref{['tab:small_toy_distribution_estimation']}.

Theorems & Definitions (11)

  • Lemma 2.1
  • Theorem 2.1
  • Theorem 3.1
  • Lemma A.1
  • proof
  • Theorem A.1
  • proof
  • Theorem A.1
  • proof
  • Theorem A.1
  • ...and 1 more