Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

Quan Nguyen; Nishant A. Mehta; Cristóbal Guzmán

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

Quan Nguyen, Nishant A. Mehta, Cristóbal Guzmán

TL;DR

This work introduces a novel sparsity structure, $(\lambda,β)$-sparsity, for group distributionally robust optimization (GDRO) to surpass traditional minimax sample-complexity bounds. By linking GDRO to sleeping bandits and a two-player zero-sum game, the authors develop SB-GDRO, which leverages time-varying active sets (dominant sets) to reduce the leading dependence on the number of groups from $O(K)$ to $O(β_λ)$. They also present adaptive algorithms (SB-GDRO-A and SB-GDRO-SA) that achieve near-optimal or dimension-free bounds without knowing the optimal $\lambda^*$ in advance, with SolveOpt guiding the efficient estimation of $λ^*$. Experiments on synthetic and real data validate the practical sparsity structure and demonstrate improved sample efficiency and convergence over baselines, highlighting potential for scalable, robust performance in heterogeneous data settings. Overall, the paper provides a principled way to exploit structure in group distributions to achieve significantly better sample complexity in GDRO than classical minimax guarantees.

Abstract

The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a $\log(K)$ factor, where $K$ is the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub $(λ, β)$-sparsity. In short, this condition means that at any parameter $θ$, there is a set of at most $β$ groups whose risks at $θ$ all are at least $λ$ larger than the risks of the other groups. To find an $ε$-optimal $θ$, we show via a novel algorithm and analysis that the $ε$-dependent term in the sample complexity can swap a linear dependence on $K$ for a linear dependence on the potentially much smaller $β$. This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. We next show an adaptive algorithm which, up to log factors, gets a sample complexity bound that adapts to the best $(λ, β)$-sparsity condition that holds. We also show how to get a dimension-free semi-adaptive sample complexity bound with a computationally efficient method. Finally, we demonstrate the practicality of the $(λ, β)$-sparsity condition and the improved sample efficiency of our algorithms on both synthetic and real-life datasets.

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

TL;DR

This work introduces a novel sparsity structure,

-sparsity, for group distributionally robust optimization (GDRO) to surpass traditional minimax sample-complexity bounds. By linking GDRO to sleeping bandits and a two-player zero-sum game, the authors develop SB-GDRO, which leverages time-varying active sets (dominant sets) to reduce the leading dependence on the number of groups from

. They also present adaptive algorithms (SB-GDRO-A and SB-GDRO-SA) that achieve near-optimal or dimension-free bounds without knowing the optimal

in advance, with SolveOpt guiding the efficient estimation of

. Experiments on synthetic and real data validate the practical sparsity structure and demonstrate improved sample efficiency and convergence over baselines, highlighting potential for scalable, robust performance in heterogeneous data settings. Overall, the paper provides a principled way to exploit structure in group distributions to achieve significantly better sample complexity in GDRO than classical minimax guarantees.

Abstract

The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a

factor, where

is the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub

-sparsity. In short, this condition means that at any parameter

, there is a set of at most

groups whose risks at

all are at least

larger than the risks of the other groups. To find an

-optimal

, we show via a novel algorithm and analysis that the

-dependent term in the sample complexity can swap a linear dependence on

for a linear dependence on the potentially much smaller

. This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. We next show an adaptive algorithm which, up to log factors, gets a sample complexity bound that adapts to the best

-sparsity condition that holds. We also show how to get a dimension-free semi-adaptive sample complexity bound with a computationally efficient method. Finally, we demonstrate the practicality of the

-sparsity condition and the improved sample efficiency of our algorithms on both synthetic and real-life datasets.

Paper Structure (36 sections, 30 theorems, 161 equations, 5 figures, 1 table, 11 algorithms)

This paper contains 36 sections, 30 theorems, 161 equations, 5 figures, 1 table, 11 algorithms.

Introduction
Contributions and Techniques
Related Works
Problem Setup
$(\bm{\lambda, \beta})$-Sparsity Structure
Two-Player Zero-Sum Game Approach
Computing the Dominant Sets
Non-Oblivious Sleeping Bandits
Sample Complexity of SB-GDRO
$\bm{\lambda^*}$-Adaptive Sample Complexity
$\bm{\lambda^*}$-Adaptive Sample Complexity for GDRO
First Step: Constructing $\bm{\hat{g}}$
Second Step: Solving for $\bm{\lambda_{C, \hat{g}}^*}$
A Semi-Adaptive Bound in High-Precision Settings
Experimental Results
...and 21 more sections

Key Result

Lemma 3.1

Let $m = \frac{384n\ln(\frac{741GDK}{\delta})}{0.01\lambda^2}$. With probability at least $1-\delta/2$, for any $t \in [T]$, DominantSet returns a $0.4\lambda$-dominant set $\hat{S}_{\theta_t}$ at $\theta_t$ satisfying $\abs{\hat{S}_{\theta_t}} \leq \beta_\lambda$.

Figures (5)

Figure 1: SB-GDRO with a known $\lambda$
Figure 2: Sizes of the dominant sets in the first $10000$ rounds computed by SB-GDRO-SA.
Figure 3: The optimality gap of SB-GDRO-SA and SMD-GDRO on GDRO with the Adult dataset. Lower is better.
Figure 4: The number of times a group is selected by the max-player, displayed in natural log. The highest group (group 8) is female Amer-Indian-Eskimo people.
Figure 5: The construction for the $\Omega\left(\frac{G^2D^2 + \beta}{{\epsilon}^2}\right)$ lower bound.

Theorems & Definitions (59)

Definition 2.2
Definition 2.3
Lemma 3.1
Theorem 3.2
Lemma 3.3
Theorem 3.4
Theorem 3.5
Theorem 4.1
Theorem 4.2
Lemma A.1
...and 49 more

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

TL;DR

Abstract

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (59)