Table of Contents
Fetching ...

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

Lijun Zhang, Haomin Bai, Peng Zhao, Tianbao Yang, Zhi-Hua Zhou

TL;DR

This paper investigates group distributionally robust optimization (GDRO) as a stochastic convex-concave saddle-point problem, which is solved by stochastic mirror descent (SMD) with nearly optimal sample complexity, and introduces a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution.

Abstract

This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with $m$ samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses $m$ samples per iteration via SMD, and the other consumes $k$ samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

TL;DR

This paper investigates group distributionally robust optimization (GDRO) as a stochastic convex-concave saddle-point problem, which is solved by stochastic mirror descent (SMD) with nearly optimal sample complexity, and introduces a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution.

Abstract

This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top- risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses samples per iteration via SMD, and the other consumes samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.
Paper Structure (59 sections, 27 theorems, 266 equations, 8 figures, 1 table, 9 algorithms)

This paper contains 59 sections, 27 theorems, 266 equations, 8 figures, 1 table, 9 algorithms.

Key Result

Theorem 1

Under Assumptions ass:1, ass:2a, ass:2 and ass:3, and setting $\eta_w = D^2\sqrt{\frac{8 }{5T (D^2 G^2+\ln m)}}$ and $\eta_q = (\ln m)\sqrt{\frac{8}{5T (D^2 G^2+\ln m)}}$ in Algorithm alg:1, we have and with probability at least $1-\delta$,

Figures (8)

  • Figure 1: Graphical illustrations of Example \ref{['example:GDRO&ATk']}.
  • Figure 2: Balanced settings: maximum risk versus the number of iterations.
  • Figure 3: Balanced settings: maximum risk versus the number of samples.
  • Figure 4: Imbalanced settings with the synthetic data set: individual risk versus the number of iterations.
  • Figure 5: Imbalanced settings with the Adult data set: individual risk versus the number of iterations.
  • ...and 3 more figures

Theorems & Definitions (40)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Remark 3
  • Remark 4
  • Theorem 6
  • ...and 30 more