Table of Contents
Fetching ...

Stochastic Langevin Differential Inclusions with Applications to Machine Learning

Fabio V. Difonzo, Vyacheslav Kungurtsev, Jakub Marecek

TL;DR

This paper studies Langevin-type dynamics with a set-valued drift given by the Clarke subdifferential of a continuous tame potential, addressing non-smooth losses common in machine learning. It proves strong existence for the stochastic differential inclusion $dX_t ∈ -F(X_t) dt + sqrt{2σ} dB_t$ and develops a weak Fokker-Planck formulation, showing the distribution evolves toward a Gibbs-type stationary measure $π(x) ∝ exp(-f(x)/σ)$. It further extends the variational (JKO) framework to nonsmooth drifts, linking the dynamics to a free-energy functional $alF(ρ)=∫ f(x)ρ dx + ∫ ρ log ρ l dx$ and demonstrating entropy-regularized gradient flow behavior. Numerical experiments on a one-dimensional, piecewise-smooth example and Bayesian ReLU networks illustrate the expected ergodic behavior and practical implications for learning with non-smooth objectives. Overall, the results provide a rigorous basis for using Langevin-type dynamics in non-smooth ML settings and open avenues for studying mixing properties under nonsmooth conditions.

Abstract

Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as an asymptotic minimization of the canonical free-energy functional.

Stochastic Langevin Differential Inclusions with Applications to Machine Learning

TL;DR

This paper studies Langevin-type dynamics with a set-valued drift given by the Clarke subdifferential of a continuous tame potential, addressing non-smooth losses common in machine learning. It proves strong existence for the stochastic differential inclusion and develops a weak Fokker-Planck formulation, showing the distribution evolves toward a Gibbs-type stationary measure . It further extends the variational (JKO) framework to nonsmooth drifts, linking the dynamics to a free-energy functional and demonstrating entropy-regularized gradient flow behavior. Numerical experiments on a one-dimensional, piecewise-smooth example and Bayesian ReLU networks illustrate the expected ergodic behavior and practical implications for learning with non-smooth objectives. Overall, the results provide a rigorous basis for using Langevin-type dynamics in non-smooth ML settings and open avenues for studying mixing properties under nonsmooth conditions.

Abstract

Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as an asymptotic minimization of the canonical free-energy functional.
Paper Structure (14 sections, 11 theorems, 61 equations, 6 figures)

This paper contains 14 sections, 11 theorems, 61 equations, 6 figures.

Key Result

Theorem 8

Let $F \colon \mathbb{R}^n \rightrightarrows \mathbb{R}^n$ be a definable conservative field that admits a tame potential $f\colon \mathbb{R}^n \to \mathbb{R}$ with bounded variation. Let $F$ be Lipschitz on $\mathbb{R}^n\setminus B_\delta(0)$ with constant $L$, for some $L,\delta>0$. Then $F$ is pi

Figures (6)

  • Figure 1: Stationary Distribution $e^{-f(x)}$ Associated with \ref{['eq:example']}.
  • Figure 2: Histogram of 100k Metropolis Samples of \ref{['eq:example']}
  • Figure 3: Histogram for \ref{['eq:iterationexample']} at $\epsilon\in\{0.01,0.001,0.0001\}$ in this order.
  • Figure 4: Wasserstein Distance of the Langevin-type iteration \ref{['eq:iterationexample']} and Metropolis-generated samples for $\epsilon\in\{0.01,0.001,0.0001\}$, in this order.
  • Figure 5: Test Loss on the sampled parameters generated by the unadjusted as well as the Metropolis-adjusted Langevin method on a ReLU Neural Nework on a regression task for the dataset E2006. The discretization rate is 1e-05.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Definition 1: Structure, cf. pillay1986definable
  • Definition 2: Definable functions, cf. van1996geometric
  • Definition 3: Conservative set-valued fields of bolte2021conservative
  • Definition 4: Potential function of bolte2021conservative
  • Definition 5
  • Definition 6: van1996geometricbolte2007clarke
  • Remark 7
  • Theorem 8
  • Example 1
  • Theorem 9: Theorem 9.4.3 in aubinFrankowska
  • ...and 12 more