Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Qi Zhang; Yi Zhou; Ashley Prater-Bennette; Lixin Shen; Shaofeng Zou

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Qi Zhang, Yi Zhou, Ashley Prater-Bennette, Lixin Shen, Shaofeng Zou

TL;DR

This work tackles large-scale constrained distributionally robust optimization with non-convex losses, focusing on the general Cressie-Read divergence family. It develops a stochastic algorithm (SFK-DRO) that keeps per-iteration cost independent of dataset size by leveraging a dual formulation and a smooth, Lipschitz approximation, combining stochastic gradient updates for the decision variable with Frank-Wolfe updates for dual parameters. The authors prove convergence to an $ε$-stationary point with a rate characterized by $O(ε^{-3k_* -5})$ through a careful bias control and variance analysis, and they show the method extends to smoothed CVaR DRO. Empirical results on imbalanced CIFAR-10 demonstrate faster convergence and improved robustness compared with baselines, highlighting practical impact for large-scale, non-convex DRO tasks in settings with distributional shifts.

Abstract

Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $χ^2$-divergences as a special case. We prove that our algorithm finds an $ε$-stationary point with a computational complexity of $\mathcal O(ε^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

TL;DR

-stationary point with a rate characterized by

through a careful bias control and variance analysis, and they show the method extends to smoothed CVaR DRO. Empirical results on imbalanced CIFAR-10 demonstrate faster convergence and improved robustness compared with baselines, highlighting practical impact for large-scale, non-convex DRO tasks in settings with distributional shifts.

Abstract

-divergences as a special case. We prove that our algorithm finds an

-stationary point with a computational complexity of

, where

is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.

Paper Structure (27 sections, 5 theorems, 103 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 27 sections, 5 theorems, 103 equations, 1 figure, 1 table, 1 algorithm.

Introduction
Challenges and Contributions
Related Work
Various Uncertainty Sets.
Penalized DRO.
Constrained DRO with Convex Loss.
Preliminaries and Problem Model
Notations
Assumptions
DRO Objective and Its Dual Form
Analysis of Constrained DRO
Smooth and Lipschitz Approximation
Convexity and Smoothness on Parameters
Mini-Batch Algorithm
Proof Sketch of Theorem \ref{['theorem:main']}
...and 12 more sections

Key Result

Lemma 1

$\forall x\in \mathbb R^d, 0\le \lambda_0\le \bar{\lambda}$,

Figures (1)

Figure 1: Training curve of classification task.

Theorems & Definitions (10)

Lemma 1
Lemma 2
Theorem 1
Lemma 3
Lemma 4
proof
proof
proof
proof
proof

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

TL;DR

Abstract

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (10)