Table of Contents
Fetching ...

Robust SGLD algorithm for solving non-convex distributionally robust optimisation problems

Ariel Neufeld, Matthew Ng Cheng En, Ying Zhang

TL;DR

This work develops a robust Stochastic Gradient Langevin Dynamics (SGLD) method to solve a class of non-convex distributionally robust optimization problems. By applying Wasserstein-type duality, finite-grid discretisation, and Nesterov smoothing, the authors obtain a differentiable objective suitable for SGLD and establish non-asymptotic excess-risk bounds with explicit constants. The framework is demonstrated on a regression task with adversarial perturbations, showing both theoretical convergence guarantees and empirical improvements over vanilla SGLD in test accuracy. The approach highlights the practical value of incorporating model uncertainty in data-driven stochastic optimization and provides implementable methodology and code for robust learning under perturbations.

Abstract

In this paper we develop a Stochastic Gradient Langevin Dynamics (SGLD) algorithm tailored for solving a certain class of non-convex distributionally robust optimisation (DRO) problems. By deriving non-asymptotic convergence bounds, we build an algorithm which for any prescribed accuracy $\varepsilon>0$ outputs an estimator whose expected excess risk is at most $\varepsilon$. As a concrete application, we consider the problem of identifying the best non-linear estimator of a given regression model involving a neural network using adversarially corrupted samples. We formulate this problem as a DRO problem and demonstrate both theoretically and numerically the applicability of the proposed robust SGLD algorithm. Moreover, numerical experiments show that the robust SGLD estimator outperforms the estimator obtained using vanilla SGLD in terms of test accuracy, which highlights the advantage of incorporating model uncertainty when optimising with perturbed samples.

Robust SGLD algorithm for solving non-convex distributionally robust optimisation problems

TL;DR

This work develops a robust Stochastic Gradient Langevin Dynamics (SGLD) method to solve a class of non-convex distributionally robust optimization problems. By applying Wasserstein-type duality, finite-grid discretisation, and Nesterov smoothing, the authors obtain a differentiable objective suitable for SGLD and establish non-asymptotic excess-risk bounds with explicit constants. The framework is demonstrated on a regression task with adversarial perturbations, showing both theoretical convergence guarantees and empirical improvements over vanilla SGLD in test accuracy. The approach highlights the practical value of incorporating model uncertainty in data-driven stochastic optimization and provides implementable methodology and code for robust learning under perturbations.

Abstract

In this paper we develop a Stochastic Gradient Langevin Dynamics (SGLD) algorithm tailored for solving a certain class of non-convex distributionally robust optimisation (DRO) problems. By deriving non-asymptotic convergence bounds, we build an algorithm which for any prescribed accuracy outputs an estimator whose expected excess risk is at most . As a concrete application, we consider the problem of identifying the best non-linear estimator of a given regression model involving a neural network using adversarially corrupted samples. We formulate this problem as a DRO problem and demonstrate both theoretically and numerically the applicability of the proposed robust SGLD algorithm. Moreover, numerical experiments show that the robust SGLD estimator outperforms the estimator obtained using vanilla SGLD in terms of test accuracy, which highlights the advantage of incorporating model uncertainty when optimising with perturbed samples.
Paper Structure (20 sections, 14 theorems, 132 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 14 theorems, 132 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.5

Let Assumptions assumption:1, assumption:2, assumption:3, and assumption:4 hold. Let $\beta,\delta>0$, and let $\bar{\theta}_0\in L^4(\Omega, \mathcal{F},\mathbb{P};\mathbb{R}^{d+1})$. Moreover, let $(\hat{{\theta}}^{\lambda, \delta,\ell,\mathfrak{j}}_n)_{n\in\mathbb{N}}$ denote the first $d$ compon The dependence of the constants on key parameters is summarised in Appendix appendix:A.

Figures (2)

  • Figure 1: Path of robust SGLD for different values of $\eta_2$
  • Figure 2: Mean squared loss for vanilla SGLD and robust SGLD on test dataset

Theorems & Definitions (38)

  • Remark 2.1
  • Definition 2.2
  • Remark 2.3
  • Remark 2.4
  • Theorem 2.5
  • Corollary 2.6
  • proof
  • Remark 2.7
  • Proposition 3.1
  • proof
  • ...and 28 more