Table of Contents
Fetching ...

Conditional neural control variates for variance reduction in Bayesian inverse problems

Ali Siahkoohi, Hyunwoo Oh

TL;DR

Conditional neural control variates is introduced, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators and demonstrates substantial variance reduction on stylized and partial differential equation-constrained Darcy flow inverse problems.

Abstract

Bayesian inference for inverse problems involves computing expectations under posterior distributions -- e.g., posterior means, variances, or predictive quantities -- typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand many samples -- a cost often prohibitive for partial differential equation-constrained problems. To address this challenge, we introduce conditional neural control variates, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators. To scale to high-dimensional problems, we leverage Stein's identity to design an architecture based on an ensemble of hierarchical coupling layers with tractable Jacobian trace computation. Training requires: (i) samples from the joint distribution of unknown parameters and observed data; and (ii) the posterior score function, which can be computed from physics-based likelihood evaluations, neural operator surrogates, or learned generative models such as conditional normalizing flows. Once trained, the control variates generalize across observations without retraining. We validate our approach on stylized and partial differential equation-constrained Darcy flow inverse problems, demonstrating substantial variance reduction, even when the analytical score is replaced by a learned surrogate.

Conditional neural control variates for variance reduction in Bayesian inverse problems

TL;DR

Conditional neural control variates is introduced, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators and demonstrates substantial variance reduction on stylized and partial differential equation-constrained Darcy flow inverse problems.

Abstract

Bayesian inference for inverse problems involves computing expectations under posterior distributions -- e.g., posterior means, variances, or predictive quantities -- typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand many samples -- a cost often prohibitive for partial differential equation-constrained problems. To address this challenge, we introduce conditional neural control variates, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators. To scale to high-dimensional problems, we leverage Stein's identity to design an architecture based on an ensemble of hierarchical coupling layers with tractable Jacobian trace computation. Training requires: (i) samples from the joint distribution of unknown parameters and observed data; and (ii) the posterior score function, which can be computed from physics-based likelihood evaluations, neural operator surrogates, or learned generative models such as conditional normalizing flows. Once trained, the control variates generalize across observations without retraining. We validate our approach on stylized and partial differential equation-constrained Darcy flow inverse problems, demonstrating substantial variance reduction, even when the analytical score is replaced by a learned surrogate.
Paper Structure (37 sections, 20 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 20 equations, 18 figures, 4 tables, 1 algorithm.

Figures (18)

  • Figure 1: Dimension scaling of CNCV. (a) VRF for mean and variance estimation across $d \in \{2,4,8,16\}$ (lower is better; $\text{VRF} = 1$ means no improvement). (b) Correlation between $h$ and $g$ across dimensions.
  • Figure 2: Sample efficiency for $d=4$ mean estimation. (a) VRF is sample-size invariant at ${\sim}\,0.04$. (b) MSE follows $1/N$ rate; the CV estimator provides a ${\sim}\,25\times$ effective sample increase. Index $i$ denotes the component.
  • Figure 3: Amortized CNCV on Rosenbrock posterior. (a--c) Test observations from left tail, right tail, and ridge (stars) yield diverse posteriors; VRF values are shown in each panel. (d) Per-component VRF for each observation; the same trained model achieves VRF $\in [0.08, 0.23]$ across all cases (lower is better).
  • Figure 4: Posterior variance estimation on the Rosenbrock problem. (a) Per-component VRF: $x_1$ achieves strong reduction while $x_2$ (banana direction) is harder. (b) Per-observation VRF across 10 test observations (mean $0.44$).
  • Figure 5: Corner plot of the nonlinear posterior ($d=4$) for three test observations (blue, red, green). Contours show 50% and 90% HDRs; stars mark the true parameters.
  • ...and 13 more figures