Table of Contents
Fetching ...

Theoretical guarantees for neural control variates in MCMC

Denis Belomestny, Artur Goldman, Alexey Naumov, Sergey Samsonov

TL;DR

A variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance is proposed.

Abstract

In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlying Markov chain. The proposed approach relies upon recent results on the stochastic errors of variance reduction algorithms and function approximation theory.

Theoretical guarantees for neural control variates in MCMC

TL;DR

A variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance is proposed.

Abstract

In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlying Markov chain. The proposed approach relies upon recent results on the stochastic errors of variance reduction algorithms and function approximation theory.
Paper Structure (21 sections, 19 theorems, 166 equations, 3 figures, 11 tables, 2 algorithms)

This paper contains 21 sections, 19 theorems, 166 equations, 3 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

Assume assu:AUF, assu:ge, and assu:br. Then for any $x_0 \in \mathsf{S}$ and $\delta \in (0,1)$ there exists $n_0=n_0(\delta, R_0, d, \beta, \mathcal{D})>0$ such that for all $n\geqslant n_0$, $n\in\mathbb{N}$ by setting $R=\log n$, $K = n^{\frac{1}{2\beta + d}}$, $b_n=2(\log(1/\rho))^{-1}\log(n)$, where $C_{th:bound_approx, 1}=C_{th:bound_approx, 1}(\beta, \mathcal{D})$ is independent of the pro

Figures (3)

  • Figure 1: Funnel distribution
  • Figure 2: Banana-shaped
  • Figure 3: Logistic regression, Pima dataset

Theorems & Definitions (34)

  • Theorem 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4: belomestny_variance_2020_esvm
  • Lemma 5
  • proof
  • Lemma 6
  • proof
  • ...and 24 more