Table of Contents
Fetching ...

Langevin Monte Carlo: random coordinate descent and variance reduction

Zhiyan Ding, Qin Li

TL;DR

This work analyzes the efficiency of Langevin Monte Carlo in high dimensions by introducing random coordinate descent (RCD) and two variance-reduction strategies. Naive RCD on LMC fails to improve total cost due to increased variance, but integrating SVRG or SAGA variance reduction with RCD yields substantial computational gains, especially for the underdamped Langevin Monte Carlo where the per-iteration cost can be reduced to a single directional derivative while preserving iteration complexity. The authors provide non-asymptotic convergence results, counterexamples showing limits of RCD, and detailed analyses for four variance-reduced variants (SVRG-O/U-LMC and RCAD-O/U-LMC), supported by numerical experiments. The results demonstrate that variance-reduced RCD-LMC achieves favorable total cost in high dimensions and offers practical guidelines for choosing step sizes and epoch lengths in stochastic gradient-based MCMC.

Abstract

Langevin Monte Carlo (LMC) is a popular Bayesian sampling method. For the log-concave distribution function, the method converges exponentially fast, up to a controllable discretization error. However, the method requires the evaluation of a full gradient in each iteration, and for a problem on $\mathbb{R}^d$, this amounts to $d$ times partial derivative evaluations per iteration. The cost is high when $d\gg1$. In this paper, we investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC. There are two sides of the theory: 1 By blindly applying RCD to LMC, one surrogates the full gradient by a randomly selected directional derivative per iteration. Although the cost is reduced per iteration, the total number of iteration is increased to achieve a preset error tolerance. Ultimately there is no computational gain; 2 We then incorporate variance reduction techniques, such as SAGA (stochastic average gradient) and SVRG (stochastic variance reduced gradient), into RCD-LMC. It will be proved that the cost is reduced compared with the classical LMC, and in the underdamped case, convergence is achieved with the same number of iterations, while each iteration requires merely one-directional derivative. This means we obtain the best possible computational cost in the underdamped-LMC framework.

Langevin Monte Carlo: random coordinate descent and variance reduction

TL;DR

This work analyzes the efficiency of Langevin Monte Carlo in high dimensions by introducing random coordinate descent (RCD) and two variance-reduction strategies. Naive RCD on LMC fails to improve total cost due to increased variance, but integrating SVRG or SAGA variance reduction with RCD yields substantial computational gains, especially for the underdamped Langevin Monte Carlo where the per-iteration cost can be reduced to a single directional derivative while preserving iteration complexity. The authors provide non-asymptotic convergence results, counterexamples showing limits of RCD, and detailed analyses for four variance-reduced variants (SVRG-O/U-LMC and RCAD-O/U-LMC), supported by numerical experiments. The results demonstrate that variance-reduced RCD-LMC achieves favorable total cost in high dimensions and offers practical guidelines for choosing step sizes and epoch lengths in stochastic gradient-based MCMC.

Abstract

Langevin Monte Carlo (LMC) is a popular Bayesian sampling method. For the log-concave distribution function, the method converges exponentially fast, up to a controllable discretization error. However, the method requires the evaluation of a full gradient in each iteration, and for a problem on , this amounts to times partial derivative evaluations per iteration. The cost is high when . In this paper, we investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC. There are two sides of the theory: 1 By blindly applying RCD to LMC, one surrogates the full gradient by a randomly selected directional derivative per iteration. Although the cost is reduced per iteration, the total number of iteration is increased to achieve a preset error tolerance. Ultimately there is no computational gain; 2 We then incorporate variance reduction techniques, such as SAGA (stochastic average gradient) and SVRG (stochastic variance reduced gradient), into RCD-LMC. It will be proved that the cost is reduced compared with the classical LMC, and in the underdamped case, convergence is achieved with the same number of iterations, while each iteration requires merely one-directional derivative. This means we obtain the best possible computational cost in the underdamped-LMC framework.

Paper Structure

This paper contains 40 sections, 22 theorems, 202 equations, 1 table, 4 algorithms.

Key Result

Theorem 1

[DALALYAN20195278 Theorem 5] Assume $h<\frac{2}{\mu +L}$ and $f$ satisfies Assumptions assum:Cov-assum:Hessian. Denote $q^O_m(x)$ the probability density function of $x^m$ computed using O-LMC, and define $W_m=W_2(q^O_{m},p)$, the $L_2$-Wasserstein distance between $q^O_m(x)$ and $p$, then we have:

Theorems & Definitions (32)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 4
  • Remark 5
  • Theorem 6
  • Theorem 7
  • Remark 8
  • Theorem 9
  • Theorem 10
  • ...and 22 more