Table of Contents
Fetching ...

Scalable Expectation Estimation with Subtractive Mixture Models

Lena Zellinger, Nicola Branchini, Víctor Elvira, Antonio Vergari

TL;DR

The paper addresses the challenge of estimating expectations when target distributions are complex by leveraging subtractive mixture models (SMMs) as powerful, expressive proposals in importance sampling. It introduces ΔEx, an unbiased estimator that avoids direct sampling from SMMs by decomposing an SMM into the difference of additive MMs and sampling from the positive and negative parts separately. ΔEx is shown to be unbiased and strongly consistent, with a variance analysis guiding allocation across the two components; a safe-mixing variant further stabilizes performance in valley-rich regions. Empirical results demonstrate that ΔEx can match the estimation quality of costly autoregressive sampling while delivering substantial runtime gains, and initial normalizing-constant experiments highlight the importance of robust proposal design (including safe components) for practical use. The work lays groundwork for adaptive IS with SMMs and suggests directions toward hierarchical, circuit-based mixtures and more refined variance-reduction strategies.

Abstract

Many Monte Carlo (MC) and importance sampling (IS) methods use mixture models (MMs) for their simplicity and ability to capture multimodal distributions. Recently, subtractive mixture models (SMMs), i.e. MMs with negative coefficients, have shown greater expressiveness and success in generative modeling. However, their negative parameters complicate sampling, requiring costly auto-regressive techniques or accept-reject algorithms that do not scale in high dimensions. In this work, we use the difference representation of SMMs to construct an unbiased IS estimator ($Δ\text{Ex}$) that removes the need to sample from the SMM, enabling high-dimensional expectation estimation with SMMs. In our experiments, we show that $Δ\text{Ex}$ can achieve comparable estimation quality to auto-regressive sampling while being considerably faster in MC estimation. Moreover, we conduct initial experiments with $Δ\text{Ex}$ using hand-crafted proposals, gaining first insights into how to construct safe proposals for $Δ\text{Ex}$.

Scalable Expectation Estimation with Subtractive Mixture Models

TL;DR

The paper addresses the challenge of estimating expectations when target distributions are complex by leveraging subtractive mixture models (SMMs) as powerful, expressive proposals in importance sampling. It introduces ΔEx, an unbiased estimator that avoids direct sampling from SMMs by decomposing an SMM into the difference of additive MMs and sampling from the positive and negative parts separately. ΔEx is shown to be unbiased and strongly consistent, with a variance analysis guiding allocation across the two components; a safe-mixing variant further stabilizes performance in valley-rich regions. Empirical results demonstrate that ΔEx can match the estimation quality of costly autoregressive sampling while delivering substantial runtime gains, and initial normalizing-constant experiments highlight the importance of robust proposal design (including safe components) for practical use. The work lays groundwork for adaptive IS with SMMs and suggests directions toward hierarchical, circuit-based mixtures and more refined variance-reduction strategies.

Abstract

Many Monte Carlo (MC) and importance sampling (IS) methods use mixture models (MMs) for their simplicity and ability to capture multimodal distributions. Recently, subtractive mixture models (SMMs), i.e. MMs with negative coefficients, have shown greater expressiveness and success in generative modeling. However, their negative parameters complicate sampling, requiring costly auto-regressive techniques or accept-reject algorithms that do not scale in high dimensions. In this work, we use the difference representation of SMMs to construct an unbiased IS estimator () that removes the need to sample from the SMM, enabling high-dimensional expectation estimation with SMMs. In our experiments, we show that can achieve comparable estimation quality to auto-regressive sampling while being considerably faster in MC estimation. Moreover, we conduct initial experiments with using hand-crafted proposals, gaining first insights into how to construct safe proposals for .

Paper Structure

This paper contains 22 sections, 1 theorem, 15 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

See app:proofs for proofs.

Figures (3)

  • Figure 1: A squared mixture can be split into its positive and negative parts as illustrated via its representation as a computational graph, also called circuit choi2020probabilisticloconte2024relationshiptensorfactorizationscircuits.
  • Figure 2: $\Delta\text{Ex}$ can result in high variance when $q$ has low-density valleys, but we can fix that with a "safe" component. We show here that (i) choosing a good proposal for $\Delta\text{Ex}$ does not simply amount to having a small KL divergence, as coefficient of variation (CoV) can be very large even when KL is small and (ii) low-density valleys in $q$ can cause high CoV, but we can fix that with the inclusion of a "safe" component. The integral of interest here is the normalizing constant of $p$, i.e., $I=\int \widetilde{p}({\bm{x}})d {\bm{x}}$. Estimates are computed from $S=15000$ samples, $\text{CoV}$ and average error are based on $100$ estimates, and the KL is estimated from $200000$ samples.
  • Figure 3: ARITS samples directly from the full SMM $q_{\text{SMM}}$ while $\Delta\text{ExS}$ samples from $q_+$ and $q_-$ in isolation. The figure shows 1500 samples obtained by ARITS (first row) and $\Delta\text{ExS}$ (second row) respectively. All depicted densities are normalized and visualized using the same color map.

Theorems & Definitions (1)

  • Proposition 1: Properties of $\Delta\text{Ex}$