Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

Lijun Zhang; Haomin Bai; Peng Zhao; Tianbao Yang; Zhi-Hua Zhou

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

Lijun Zhang, Haomin Bai, Peng Zhao, Tianbao Yang, Zhi-Hua Zhou

TL;DR

This paper investigates group distributionally robust optimization (GDRO) as a stochastic convex-concave saddle-point problem, which is solved by stochastic mirror descent (SMD) with nearly optimal sample complexity, and introduces a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution.

Abstract

This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with $m$ samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses $m$ samples per iteration via SMD, and the other consumes $k$ samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

TL;DR

Abstract

This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over

different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with

samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from

to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top-

risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses

samples per iteration via SMD, and the other consumes

samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.

Paper Structure (59 sections, 27 theorems, 266 equations, 8 figures, 1 table, 9 algorithms)

This paper contains 59 sections, 27 theorems, 266 equations, 8 figures, 1 table, 9 algorithms.

Introduction
Extension to Imbalanced Data
Extension to Heterogeneous Distributions
Related Work
SA Approaches to GDRO
Preliminaries
Stochastic Mirror Descent for GDRO
Comparisons with Gouop_DRO
Comparisons with Online:Multiple:Distribution
Anytime Extensions
Non-oblivious Online Learning for GDRO
Comparisons with DRO:Online:Game
Anytime Extensions
Weighted GDRO for Imbalanced Data
Stochastic Mirror Descent with Non-uniform Sampling
...and 44 more sections

Key Result

Theorem 1

Under Assumptions ass:1, ass:2a, ass:2 and ass:3, and setting $\eta_w = D^2\sqrt{\frac{8 }{5T (D^2 G^2+\ln m)}}$ and $\eta_q = (\ln m)\sqrt{\frac{8}{5T (D^2 G^2+\ln m)}}$ in Algorithm alg:1, we have and with probability at least $1-\delta$,

Figures (8)

Figure 1: Graphical illustrations of Example \ref{['example:GDRO&ATk']}.
Figure 2: Balanced settings: maximum risk versus the number of iterations.
Figure 3: Balanced settings: maximum risk versus the number of samples.
Figure 4: Imbalanced settings with the synthetic data set: individual risk versus the number of iterations.
Figure 5: Imbalanced settings with the Adult data set: individual risk versus the number of iterations.
...and 3 more figures

Theorems & Definitions (40)

Theorem 1
Remark 1
Theorem 2
Remark 2
Theorem 3
Theorem 4
Theorem 5
Remark 3
Remark 4
Theorem 6
...and 30 more

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

TL;DR

Abstract

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (40)