Table of Contents
Fetching ...

Efficient Algorithms for Empirical Group Distributionally Robust Optimization and Beyond

Dingzhi Yu, Yunuo Cai, Wei Jiang, Lijun Zhang

TL;DR

A two-stage optimization algorithm called ALEG is developed to deal with the empirical Minimax Excess Risk Optimization (MERO) problem, and the computation complexity of ALEM nearly matches that of ALEG, surpassing the rates of existing methods.

Abstract

In this paper, we investigate the empirical counterpart of Group Distributionally Robust Optimization (GDRO), which aims to minimize the maximal empirical risk across $m$ distinct groups. We formulate empirical GDRO as a $\textit{two-level}$ finite-sum convex-concave minimax optimization problem and develop an algorithm called ALEG to benefit from its special structure. ALEG is a double-looped stochastic primal-dual algorithm that incorporates variance reduction techniques into a modified mirror prox routine. To exploit the two-level finite-sum structure, we propose a simple group sampling strategy to construct the stochastic gradient with a smaller Lipschitz constant and then perform variance reduction for all groups. Theoretical analysis shows that ALEG achieves $\varepsilon$-accuracy within a computation complexity of $\mathcal{O}\left(\frac{m\sqrt{\bar{n}\ln{m}}}{\varepsilon}\right)$, where $\bar n$ is the average number of samples among $m$ groups. Notably, our approach outperforms the state-of-the-art method by a factor of $\sqrt{m}$. Based on ALEG, we further develop a two-stage optimization algorithm called ALEM to deal with the empirical Minimax Excess Risk Optimization (MERO) problem. The computation complexity of ALEM nearly matches that of ALEG, surpassing the rates of existing methods.

Efficient Algorithms for Empirical Group Distributionally Robust Optimization and Beyond

TL;DR

A two-stage optimization algorithm called ALEG is developed to deal with the empirical Minimax Excess Risk Optimization (MERO) problem, and the computation complexity of ALEM nearly matches that of ALEG, surpassing the rates of existing methods.

Abstract

In this paper, we investigate the empirical counterpart of Group Distributionally Robust Optimization (GDRO), which aims to minimize the maximal empirical risk across distinct groups. We formulate empirical GDRO as a finite-sum convex-concave minimax optimization problem and develop an algorithm called ALEG to benefit from its special structure. ALEG is a double-looped stochastic primal-dual algorithm that incorporates variance reduction techniques into a modified mirror prox routine. To exploit the two-level finite-sum structure, we propose a simple group sampling strategy to construct the stochastic gradient with a smaller Lipschitz constant and then perform variance reduction for all groups. Theoretical analysis shows that ALEG achieves -accuracy within a computation complexity of , where is the average number of samples among groups. Notably, our approach outperforms the state-of-the-art method by a factor of . Based on ALEG, we further develop a two-stage optimization algorithm called ALEM to deal with the empirical Minimax Excess Risk Optimization (MERO) problem. The computation complexity of ALEM nearly matches that of ALEG, surpassing the rates of existing methods.
Paper Structure (38 sections, 21 theorems, 110 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 38 sections, 21 theorems, 110 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Lemma 4.3

For any $s\in[S]^0, k\in[K_s]^0$, $\nabla F(\mathbf{z};\xi_{k}^s)$ is $L_z$-Lipschitz continuous, where

Figures (2)

  • Figure 1: Comparison of the max empirical risk $\max_{i\in[m]}R_i(\cdot)$ with respect to the number of stochastic gradient evaluations $\# \text{ of } \nabla\ell(\cdot;\xi_{ij})$ on the synthetic dataset and the CIFAR-100 dataset.
  • Figure 2: Comparison of the max excess empirical risk $\max_{i\in[m]}\underline{R}_i(\cdot)$ with respect to the number of stochastic gradient evaluations $\# \text{ of } \nabla\ell(\cdot;\xi_{ij})$ on the synthetic dataset and the CIFAR-100 dataset.

Theorems & Definitions (56)

  • Definition 3.1
  • Definition 3.2
  • Remark 3.5
  • Remark 3.7
  • Remark 4.1
  • Remark 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Remark 4.5
  • Corollary 4.6
  • ...and 46 more