Table of Contents
Fetching ...

BayesSum: Bayesian Quadrature in Discrete Spaces

Sophia Seulkee Kang, François-Xavier Briol, Toni Karvonen, Zonghao Chen

TL;DR

BayesSum extends Bayesian quadrature to discrete domains by placing a Gaussian process prior on the integrand f and deriving a Gaussian posterior for the intractable sum I = E[f(X)]. It achieves superior sample efficiency over traditional Monte Carlo baselines and provides finite-sample uncertainty, with practical variants for mixed discrete-continuous domains and active query strategies. The approach is validated on synthetic benchmarks and realistic unnormalized models (CMP, Potts), showing improved normalization-constant estimation and parameter learning with fewer function evaluations. The work also develops closed-form kernel mean embeddings for discrete distributions and introduces Stein BayesSum to handle cases lacking such embeddings, highlighting strong potential for discrete probabilistic numerics and complex Bayesian inference tasks.

Abstract

This paper addresses the challenging computational problem of estimating intractable expectations over discrete domains. Existing approaches, including Monte Carlo and Russian Roulette estimators, are consistent but often require a large number of samples to achieve accurate results. We propose a novel estimator, \emph{BayesSum}, which is an extension of Bayesian quadrature to discrete domains. It is more sample efficient than alternatives due to its ability to make use of prior information about the integrand through a Gaussian process. We show this through theory, deriving a convergence rate significantly faster than Monte Carlo in a broad range of settings. We also demonstrate empirically that our proposed method does indeed require fewer samples on several synthetic settings as well as for parameter estimation for Conway-Maxwell-Poisson and Potts models.

BayesSum: Bayesian Quadrature in Discrete Spaces

TL;DR

BayesSum extends Bayesian quadrature to discrete domains by placing a Gaussian process prior on the integrand f and deriving a Gaussian posterior for the intractable sum I = E[f(X)]. It achieves superior sample efficiency over traditional Monte Carlo baselines and provides finite-sample uncertainty, with practical variants for mixed discrete-continuous domains and active query strategies. The approach is validated on synthetic benchmarks and realistic unnormalized models (CMP, Potts), showing improved normalization-constant estimation and parameter learning with fewer function evaluations. The work also develops closed-form kernel mean embeddings for discrete distributions and introduces Stein BayesSum to handle cases lacking such embeddings, highlighting strong potential for discrete probabilistic numerics and complex Bayesian inference tasks.

Abstract

This paper addresses the challenging computational problem of estimating intractable expectations over discrete domains. Existing approaches, including Monte Carlo and Russian Roulette estimators, are consistent but often require a large number of samples to achieve accurate results. We propose a novel estimator, \emph{BayesSum}, which is an extension of Bayesian quadrature to discrete domains. It is more sample efficient than alternatives due to its ability to make use of prior information about the integrand through a Gaussian process. We show this through theory, deriving a convergence rate significantly faster than Monte Carlo in a broad range of settings. We also demonstrate empirically that our proposed method does indeed require fewer samples on several synthetic settings as well as for parameter estimation for Conway-Maxwell-Poisson and Potts models.

Paper Structure

This paper contains 27 sections, 1 theorem, 26 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Suppose that $k(x, y) \leq C^2$ for all $x, y \in \mathcal{X}$. Suppose that the $N$ samples $\{x_i\}_{i=1}^N$ are non-repetitive. Without loss of generality, let $\mathcal{X}$ admit an enumeration $\mathcal{X} = \{x_i\}_{i=1}^\infty$ where the first $N$ samples $\{x_i\}_{i=1}^N$ correspond to the o

Figures (6)

  • Figure 1: Comparison of BayesSum / active BayesSum against baselines: Monte Carlo (MC), Russian roulette (RR), importance sampling (IS) and stratified sampling (SS). The reported results are absolute error on (a) Poisson distribution (Left), (b) uniform distribution over $\mathcal{X}=\{0,1,2\}^L$ (Middle) and (c) un-normalized distribution characterized by a Potts model (Right). Results are averaged over 50 independent runs, while the shaded regions give the 25%-75% quantiles.
  • Figure 2: Left: Comparison of BayesSum+BQ and MC over the mixed domain. Middle: Comparison of BayesSum, active BayesSum and MC with increasing dimension $d$ in case (b). Right: Comparison of BayesSum and MC in estimating the normalization constant for maximum likelihood inference of CMP model. BayesSum uses $N=10$ samples while MC uses $N=30$ samples.
  • Figure 3: The optimization trajectory of training CMP model with normalization constant estimated via BayesSum and MC.
  • Figure 4: Comparison of a large Potts model trained via maximum log likelihood, with the normalization constant estimated by BayesSum and Monte Carlo, under the same computational time (top) and sample size $N$ (bottom). Error bars represent the standard error repeated over $30$ random seeds.
  • Figure 5: Left: Calibration of the posterior variance of BayesSum in synthetic settings (a)---(d). Middle: Ablation study on different kernels in synthetic setting (a): polynomial kernels of degree 2, 5, and 8 and Brownian motion kernel. Right: Ablation study of kernel lengthscales of the the exponential Hamming kernel in synthetic setting (b).
  • ...and 1 more figures

Theorems & Definitions (4)

  • Remark 1
  • Theorem 1
  • proof
  • Remark 2: Non-repetitive points