Table of Contents
Fetching ...

Contrastive Bayesian Inference for Unnormalized Models

Naruki Sonobe, Shonosuke Sugasawa, Daichi Mochihashi, Takeru Matsuda

TL;DR

This work builds on noise contrastive estimation, which recasts inference as a binary classification problem between observed and noise samples, and treats the normalizing constant as an additional unknown parameter within the resulting likelihood.

Abstract

Unnormalized (or energy-based) models provide a flexible framework for capturing the characteristics of data with complex dependency structures. However, the application of standard Bayesian inference methods has been severely limited because the parameter-dependent normalizing constant is either analytically intractable or computationally prohibitive to evaluate. A promising approach is score-based generalized Bayesian inference, which avoids evaluating the normalizing constant by replacing the likelihood with a scoring rule. However, this approach requires careful tuning of the likelihood information, and it may fail to yield valid inference without appropriate control. To overcome this difficulty, we propose a fully Bayesian framework for inference on unnormalized models that does not require such tuning. We build on noise contrastive estimation, which recasts inference as a binary classification problem between observed and noise samples, and treat the normalizing constant as an additional unknown parameter within the resulting likelihood. For exponential families, the classification likelihood becomes conditionally Gaussian via Pólya-Gamma data augmentation, leading to a simple Gibbs sampler. We demonstrate the proposed approach through two models: time-varying density models for temporal point process data and sparse torus graph models for multivariate circular data. Through simulation studies and real-data analyses, the proposed method provides accurate point estimation and enables principled uncertainty quantification.

Contrastive Bayesian Inference for Unnormalized Models

TL;DR

This work builds on noise contrastive estimation, which recasts inference as a binary classification problem between observed and noise samples, and treats the normalizing constant as an additional unknown parameter within the resulting likelihood.

Abstract

Unnormalized (or energy-based) models provide a flexible framework for capturing the characteristics of data with complex dependency structures. However, the application of standard Bayesian inference methods has been severely limited because the parameter-dependent normalizing constant is either analytically intractable or computationally prohibitive to evaluate. A promising approach is score-based generalized Bayesian inference, which avoids evaluating the normalizing constant by replacing the likelihood with a scoring rule. However, this approach requires careful tuning of the likelihood information, and it may fail to yield valid inference without appropriate control. To overcome this difficulty, we propose a fully Bayesian framework for inference on unnormalized models that does not require such tuning. We build on noise contrastive estimation, which recasts inference as a binary classification problem between observed and noise samples, and treat the normalizing constant as an additional unknown parameter within the resulting likelihood. For exponential families, the classification likelihood becomes conditionally Gaussian via Pólya-Gamma data augmentation, leading to a simple Gibbs sampler. We demonstrate the proposed approach through two models: time-varying density models for temporal point process data and sparse torus graph models for multivariate circular data. Through simulation studies and real-data analyses, the proposed method provides accurate point estimation and enables principled uncertainty quantification.
Paper Structure (24 sections, 19 equations, 8 figures, 7 tables)

This paper contains 24 sections, 19 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: True time-varying density (upper), posterior mean of the time-varying density model fitted by NC-Bayes (middle) and time-wise KDE (lower), for selected four time points under Scenario 2.
  • Figure 2: Estimated spatial density functions for gun assault incident locations in Washington, DC, for four selected months ($t=1,5,9,12$) in 2022. The observed points (upper) and the posterior mean of the time-varying density model fitted by NC-Bayes (middle) are compared with the time-wise KDE (lower).
  • Figure 3: The detected edges from NC-Bayes. Note that the true edge structure corresponds to a linear chain where each node $j$ is connected to node $j+1$, for $j = 1, \ldots, d-1$.
  • Figure 4: Pairwise scatter plots for one selected channel from each of the four regions (CA3, DG, Sub, and PFC) are shown. Each axis represents the oscillatory beta phase (range: $-\pi$ to $\pi$).
  • Figure 5: Inferred torus graph structure from the phase angle data using NC-Bayes.
  • ...and 3 more figures