Table of Contents
Fetching ...

Tamed Langevin sampling under weaker conditions

Iosif Lytras, Panayotis Mertikopoulos

TL;DR

This work addresses sampling from targets with non-Lipschitz log-gradients under weak dissipativity by introducing two taming schemes, wd-TULA and reg-TULA, that work under $PI$ (with and without weak convexity) and $LSI$ regimes. The authors derive non-asymptotic convergence guarantees in KL divergence, total variation, and Wasserstein distances, using a differential-inequality framework and isoperimetric inequalities, while achieving polynomial dependence on dimension under $PI$ and optimal rates under $LSI$. A regularized taming approach extends the results to the PI-only setting by constructing a regularized target that satisfies a Poincaré inequality and, under suitable conditions, a Log-Sobolev inequality. Numerical experiments on a high-dimensional double-well potential corroborate the theoretical findings and demonstrate the practical stability of wd-TULA compared to vanilla ULA and standard TULA.

Abstract

Motivated by applications to deep learning which often fail standard Lipschitz smoothness requirements, we examine the problem of sampling from distributions that are not log-concave and are only weakly dissipative, with log-gradients allowed to grow superlinearly at infinity. In terms of structure, we only assume that the target distribution satisfies either a log-Sobolev or a Poincaré inequality and a local Lipschitz smoothness assumption with modulus growing possibly polynomially at infinity. This set of assumptions greatly exceeds the operational limits of the "vanilla" unadjusted Langevin algorithm (ULA), making sampling from such distributions a highly involved affair. To account for this, we introduce a taming scheme which is tailored to the growth and decay properties of the target distribution, and we provide explicit non-asymptotic guarantees for the proposed sampler in terms of the Kullback-Leibler (KL) divergence, total variation, and Wasserstein distance to the target distribution.

Tamed Langevin sampling under weaker conditions

TL;DR

This work addresses sampling from targets with non-Lipschitz log-gradients under weak dissipativity by introducing two taming schemes, wd-TULA and reg-TULA, that work under (with and without weak convexity) and regimes. The authors derive non-asymptotic convergence guarantees in KL divergence, total variation, and Wasserstein distances, using a differential-inequality framework and isoperimetric inequalities, while achieving polynomial dependence on dimension under and optimal rates under . A regularized taming approach extends the results to the PI-only setting by constructing a regularized target that satisfies a Poincaré inequality and, under suitable conditions, a Log-Sobolev inequality. Numerical experiments on a high-dimensional double-well potential corroborate the theoretical findings and demonstrate the practical stability of wd-TULA compared to vanilla ULA and standard TULA.

Abstract

Motivated by applications to deep learning which often fail standard Lipschitz smoothness requirements, we examine the problem of sampling from distributions that are not log-concave and are only weakly dissipative, with log-gradients allowed to grow superlinearly at infinity. In terms of structure, we only assume that the target distribution satisfies either a log-Sobolev or a Poincaré inequality and a local Lipschitz smoothness assumption with modulus growing possibly polynomially at infinity. This set of assumptions greatly exceeds the operational limits of the "vanilla" unadjusted Langevin algorithm (ULA), making sampling from such distributions a highly involved affair. To account for this, we introduce a taming scheme which is tailored to the growth and decay properties of the target distribution, and we provide explicit non-asymptotic guarantees for the proposed sampler in terms of the Kullback-Leibler (KL) divergence, total variation, and Wasserstein distance to the target distribution.
Paper Structure (28 sections, 36 theorems, 174 equations, 1 figure, 2 tables)

This paper contains 28 sections, 36 theorems, 174 equations, 1 figure, 2 tables.

Key Result

Theorem 1

Suppose that asm:driftasm:targetasm:WC hold and let $\rho_{n}$ denote the distribution of the $n$-th iterate of eq-wdTULA run with $\lambda < \lambda_{\max}=\min\{\frac{1}{4(2AC^* +2L+1)^2},\frac{1}{\dot{c}_0 H_\pi(\rho_0)},\frac{2}{\mu^2}\}$ where the constants are given in the proof of Proposition where $c_{1}$ depends polynomially on $d$ and $\dot c_{0}$ is an explicit function of the Poincaré

Figures (1)

  • Figure 1: Performance of \ref{['eq-wdTULA']} compared to TULA (left and right respectively); lower values are better.

Theorems & Definitions (71)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • Proposition A.1
  • proof
  • ...and 61 more