On the connection between Noise-Contrastive Estimation and Contrastive Divergence
Amanda Olmin, Jakob Lindqvist, Lennart Svensson, Fredrik Lindsten
TL;DR
The paper addresses training unnormalised probabilistic models by bridging Noise-Contrastive Estimation (NCE) with maximum likelihood via importance sampling (ML-IS) and with Contrastive Divergence (CD). It shows RNCE is ML estimation with Conditional Importance Sampling (CIS) and that RNCE and CNCE are special cases of CD, unifying NCE within ML/CD frameworks and enabling cross-pollination of techniques. A key practical insight is that the noise proposal should resemble the model distribution (q ≈ p_θ), and the authors propose adaptive proposals, persistent variants, and SMC-based extensions to RNCE/CNCE, supported by theoretical arguments. Empirical results on autoregressive EBMs demonstrate gains from RNCE, MH-CNCE, persistence, and SMC-RNCE across multiple datasets, highlighting the approach's robustness and scalability for unnormalised models.
Abstract
Noise-contrastive estimation (NCE) is a popular method for estimating unnormalised probabilistic models, such as energy-based models, which are effective for modelling complex data distributions. Unlike classical maximum likelihood (ML) estimation that relies on importance sampling (resulting in ML-IS) or MCMC (resulting in contrastive divergence, CD), NCE uses a proxy criterion to avoid the need for evaluating an often intractable normalisation constant. Despite apparent conceptual differences, we show that two NCE criteria, ranking NCE (RNCE) and conditional NCE (CNCE), can be viewed as ML estimation methods. Specifically, RNCE is equivalent to ML estimation combined with conditional importance sampling, and both RNCE and CNCE are special cases of CD. These findings bridge the gap between the two method classes and allow us to apply techniques from the ML-IS and CD literature to NCE, offering several advantageous extensions.
