Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

Dong-Young Lim; Ariel Neufeld; Sotirios Sabanis; Ying Zhang

Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

Dong-Young Lim, Ariel Neufeld, Sotirios Sabanis, Ying Zhang

TL;DR

The paper tackles stochastic optimization with discontinuous stochastic gradients, a setting arising in quantile/CVaR problems, vector quantization, and ReLU-based neural nets. It introduces e-TH$\varepsilon$O POULA, a Langevin-dynamics–based algorithm that uses polygonal Euler approximations and a taming/boosting gradient scheme to achieve non-asymptotic convergence to the invariant measure $\pi_{\beta}\propto e^{-\beta u}$, along with explicit bounds for the Wasserstein distance and the expected excess risk. Theoretical contributions include assumptions under which $W_1$ and $W_2$ convergence rates are exponential in the discretization step and polynomial in moments, plus an explicit excess-risk bound with controllable $\beta$ and $\lambda$. Empirically, e-TH$\varepsilon$O POULA outperforms SGLD, TUSLA, ADAM, and AMSGrad in high-dimensional, discontinuous-gradient tasks, including multi-period portfolio optimization with transfer learning and nonlinear Gamma regression on real data, highlighting its practical impact for finance, insurance, and deep learning contexts where convergence guarantees are critical.

Abstract

We introduce a new Langevin dynamics based algorithm, called e-TH$\varepsilon$O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world applications such as quantile estimation, vector quantization, CVaR minimization, and regularized optimization problems involving ReLU neural networks. We demonstrate both theoretically and numerically the applicability of the e-TH$\varepsilon$O POULA algorithm. More precisely, under the conditions that the stochastic gradient is locally Lipschitz in average and satisfies a certain convexity at infinity condition, we establish non-asymptotic error bounds for e-TH$\varepsilon$O POULA in Wasserstein distances and provide a non-asymptotic estimate for the expected excess risk, which can be controlled to be arbitrarily small. Three key applications in finance and insurance are provided, namely, multi-period portfolio optimization, transfer learning in multi-period portfolio optimization, and insurance claim prediction, which involve neural networks with (Leaky)-ReLU activation functions. Numerical experiments conducted using real-world datasets illustrate the superior empirical performance of e-TH$\varepsilon$O POULA compared to SGLD, TUSLA, ADAM, and AMSGrad in terms of model accuracy.

Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

TL;DR

The paper tackles stochastic optimization with discontinuous stochastic gradients, a setting arising in quantile/CVaR problems, vector quantization, and ReLU-based neural nets. It introduces e-TH

O POULA, a Langevin-dynamics–based algorithm that uses polygonal Euler approximations and a taming/boosting gradient scheme to achieve non-asymptotic convergence to the invariant measure

, along with explicit bounds for the Wasserstein distance and the expected excess risk. Theoretical contributions include assumptions under which

and

convergence rates are exponential in the discretization step and polynomial in moments, plus an explicit excess-risk bound with controllable

and

. Empirically, e-TH

O POULA outperforms SGLD, TUSLA, ADAM, and AMSGrad in high-dimensional, discontinuous-gradient tasks, including multi-period portfolio optimization with transfer learning and nonlinear Gamma regression on real data, highlighting its practical impact for finance, insurance, and deep learning contexts where convergence guarantees are critical.

Abstract

We introduce a new Langevin dynamics based algorithm, called e-TH

O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world applications such as quantile estimation, vector quantization, CVaR minimization, and regularized optimization problems involving ReLU neural networks. We demonstrate both theoretically and numerically the applicability of the e-TH

O POULA algorithm. More precisely, under the conditions that the stochastic gradient is locally Lipschitz in average and satisfies a certain convexity at infinity condition, we establish non-asymptotic error bounds for e-TH

O POULA in Wasserstein distances and provide a non-asymptotic estimate for the expected excess risk, which can be controlled to be arbitrarily small. Three key applications in finance and insurance are provided, namely, multi-period portfolio optimization, transfer learning in multi-period portfolio optimization, and insurance claim prediction, which involve neural networks with (Leaky)-ReLU activation functions. Numerical experiments conducted using real-world datasets illustrate the superior empirical performance of e-TH

O POULA compared to SGLD, TUSLA, ADAM, and AMSGrad in terms of model accuracy.

Paper Structure (26 sections, 15 theorems, 173 equations, 3 figures, 7 tables)

This paper contains 26 sections, 15 theorems, 173 equations, 3 figures, 7 tables.

Introduction
e-TH$\varepsilon$O POULA: Setting and definition
Setting
Algorithm
Numerical Experiments
Multi-period portfolio optimization
Black-Scholes model.
AR($1$) model.
Transfer learning in the multi-period portfolio optimization
Transfer learning setting
Comparison with full learning setting
Non-linear Gamma regression
Conclusion of numerical experiments
Non-asymptotic convergence bounds for e-TH$\varepsilon$O POULA
Assumptions
...and 11 more sections

Key Result

Proposition 3.1

The optimization problem eq:optim_tl_reg satisfies Assumptions asm:AI-asm:AC in Section sec:main.

Figures (3)

Figure 1: Test score $V_K^*(s_0)$ of each optimizer for different number of assets under the Black-Scholes model. The parameter settings are summarized in Table \ref{['tab:iid']}.
Figure 2: Test score $V_K^*(s_0)$ of each optimizer for different values of $\nu$ under the AR($1$) model.
Figure 3: Negative likelihood curve on training and test set. The colored area corresponds to the mean $\pm$ standard deviation for each algorithm.

Theorems & Definitions (43)

Remark 2.1
Remark 2.2
Proposition 3.1
proof
Remark 3.2
Remark 4.1
Remark 4.2
Remark 4.3
Theorem 4.4
Corollary 4.5
...and 33 more

Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

TL;DR

Abstract

Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (43)