Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

Mert Gürbüzbalaban; Yuanhan Hu; Lingjiong Zhu

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu

TL;DR

This work addresses constrained sampling from a Gibbs-like target $\pi(x)\propto e^{-f(x)}$ over a convex set $\mathcal{C}$ by introducing penalty-based reformulations: the penalized target $\pi_{\delta}(x)\propto e^{-f(x)-S(x)/\delta}$ with $S(x)$ vanishing on $\mathcal{C}$. It develops penalized Langevin methods (PLD, PULMC) and their stochastic-gradient variants (PSGLD, PSGULMC), establishing non-asymptotic convergence guarantees and explicit iteration complexities under smoothness and boundary-regularity assumptions; notably, $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ for PLD and $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ for PULMC in the deterministic-gradient setting. When gradients are stochastic, the paper provides convergence bounds in $\mathcal{W}_2$ with complexities $\tilde{\mathcal{O}}(d/\varepsilon^{18})$ (PSGLD) and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ (PSGULMC) under strong convexity, and analogous finite-time bounds for non-convex $f$; these results are among the first for constrained ULMC with non-convex potentials. The authors also show how to avoid projections via alternative penalty constructions and demonstrate the methods on Bayesian LASSO and constrained deep learning tasks, highlighting practical scalability and accuracy improvements. Overall, the penalty-based approach converts constrained sampling into unconstrained sampling with provable guarantees, enabling efficient, scalable Bayesian inference under convex constraints and non-convex objectives.

Abstract

We consider the constrained sampling problem where the goal is to sample from a target distribution $π(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. When $f$ is smooth and gradients are available, we get $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides logarithmic factors. For PULMC, we improve the result to $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ when the Hessian of $f$ is Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our knowledge, these are the first convergence results for underdamped Langevin Monte Carlo methods in the constrained sampling that handle non-convex $f$ and provide guarantees with the best dimension dependency among existing methods with deterministic gradient. If unbiased stochastic estimates of the gradient of $f$ are available, we propose PSGLD and PSGULMC methods that can handle stochastic gradients and are scaleable to large datasets without requiring Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when $f$ is strongly convex and smooth, we obtain $\tilde{\mathcal{O}}(d/\varepsilon^{18})$ and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ iteration complexity in W2 distance. When $f$ is smooth and can be non-convex, we provide finite-time performance bounds and iteration complexity results. Finally, we illustrate the performance on Bayesian LASSO regression and Bayesian constrained deep learning problems.

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

TL;DR

This work addresses constrained sampling from a Gibbs-like target

over a convex set

by introducing penalty-based reformulations: the penalized target

with

vanishing on

. It develops penalized Langevin methods (PLD, PULMC) and their stochastic-gradient variants (PSGLD, PSGULMC), establishing non-asymptotic convergence guarantees and explicit iteration complexities under smoothness and boundary-regularity assumptions; notably,

for PLD and

for PULMC in the deterministic-gradient setting. When gradients are stochastic, the paper provides convergence bounds in

with complexities

(PSGLD) and

(PSGULMC) under strong convexity, and analogous finite-time bounds for non-convex

; these results are among the first for constrained ULMC with non-convex potentials. The authors also show how to avoid projections via alternative penalty constructions and demonstrate the methods on Bayesian LASSO and constrained deep learning tasks, highlighting practical scalability and accuracy improvements. Overall, the penalty-based approach converts constrained sampling into unconstrained sampling with provable guarantees, enabling efficient, scalable Bayesian inference under convex constraints and non-convex objectives.

Abstract

We consider the constrained sampling problem where the goal is to sample from a target distribution

when

is constrained to lie on a convex body

. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. When

is smooth and gradients are available, we get

iteration complexity for PLD to sample the target up to an

-error where the error is measured in the TV distance and

hides logarithmic factors. For PULMC, we improve the result to

when the Hessian of

is Lipschitz and the boundary of

is sufficiently smooth. To our knowledge, these are the first convergence results for underdamped Langevin Monte Carlo methods in the constrained sampling that handle non-convex

and provide guarantees with the best dimension dependency among existing methods with deterministic gradient. If unbiased stochastic estimates of the gradient of

are available, we propose PSGLD and PSGULMC methods that can handle stochastic gradients and are scaleable to large datasets without requiring Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when

is strongly convex and smooth, we obtain

and

iteration complexity in W2 distance. When

is smooth and can be non-convex, we provide finite-time performance bounds and iteration complexity results. Finally, we illustrate the performance on Bayesian LASSO regression and Bayesian constrained deep learning problems.

Paper Structure (23 sections, 26 theorems, 198 equations, 9 figures, 3 tables)

This paper contains 23 sections, 26 theorems, 198 equations, 9 figures, 3 tables.

Introduction
Our Approach and Contributions
Related Work
Main Results
Bounding the Distance Between $\pi_{\delta}$ and $\pi$
Penalized Langevin Algorithms with Deterministic Gradient
Penalized Langevin Dynamics
Penalized Underdamped Langevin Monte Carlo
Penalized Langevin Algorithms with Stochastic Gradient
Strongly Convex Case
Non-Convex Case
Avoiding Projections
Numerical Experiments
Synthetic Experiment for Dirichlet Posterior
Bayesian Constrained Linear Regression
...and 8 more sections

Key Result

Lemma 2.3

Suppose Assumption assump:S:0 holds and $e^{-f}$ is integrable over $\mathcal{C}$. For any $\delta>0$,If $e^{-\frac{1}{\delta}S(y)-f(y)}$ is not integrable over $\mathbb{R}^{d}\backslash\mathcal{C}$, we take the term $\int_{\mathbb{R}^{d}\backslash\mathcal{C}}e^{-\frac{1}{\delta}S(y)-f(y)}dy$ to be

Figures (9)

Figure 1: Wasserstein distance between the target distribution and our proposed methods.
Figure 2: Density plots of the target distribution and samples obtained by PLD and PULMC.
Figure 3: Average number of iterations required for achieving a target accuracy $\varepsilon$ (measured in terms of the Wasserstein distance) for the Dirichlet sampling problem as $\varepsilon$ is varied for PLD (left panel) and PULMC (right panel).
Figure 4: Dimension ($d$) dependency of PLD and PULMC on the Dirichlet distribution sampling problem.
Figure 5: Prior and posterior distribution with $1$-norm constraint in dimension 2.
...and 4 more figures

Theorems & Definitions (31)

Lemma 2.3
Lemma 2.4
Lemma 2.5
Lemma 2.6
Theorem 2.7
Remark 2.8
Lemma 2.10
Proposition 2.11
Remark 2.12
Lemma 2.13
...and 21 more

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

TL;DR

Abstract

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (31)