Table of Contents
Fetching ...

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

Zhihao Gu, Zi Xu

TL;DR

This work tackles MERO, a distributionally robust minimax excess risk problem, by reformulating it as a stochastic convex-concave saddle-point with φ(w,q) = ∑ q_i [R_i(w) - R_i^*]. It introduces a zeroth-order stochastic mirror descent (ZO-SMD) algorithm that leverages UniGE-based gradient estimators for smooth losses and a non-smooth variant, enabling gradient-free optimization across m distributions. The authors prove optimal convergence rates: the excess-risk estimates converge at O(1/√t) and the saddle-point error at O(1/√t) for both smooth and non-smooth MERO, with an overall complexity of O(1/t). This provides the first zeroth-order guarantees for MERO and demonstrates the practical viability of gradient-free approaches in distributionally robust minimax settings, with potential extensions to nonconvex regimes and broader stochastic saddle-point problems.

Abstract

The minimax excess risk optimization (MERO) problem is a new variation of the traditional distributionally robust optimization (DRO) problem, which achieves uniformly low regret across all test distributions under suitable conditions. In this paper, we propose a zeroth-order stochastic mirror descent (ZO-SMD) algorithm available for both smooth and non-smooth MERO to estimate the minimal risk of each distrbution, and finally solve MERO as (non-)smooth stochastic convex-concave (linear) minimax optimization problems. The proposed algorithm is proved to converge at optimal convergence rates of $\mathcal{O}\left(1/\sqrt{t}\right)$ on the estimate of $R_i^*$ and $\mathcal{O}\left(1/\sqrt{t}\right)$ on the optimization error of both smooth and non-smooth MERO. Numerical results show the efficiency of the proposed algorithm.

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

TL;DR

This work tackles MERO, a distributionally robust minimax excess risk problem, by reformulating it as a stochastic convex-concave saddle-point with φ(w,q) = ∑ q_i [R_i(w) - R_i^*]. It introduces a zeroth-order stochastic mirror descent (ZO-SMD) algorithm that leverages UniGE-based gradient estimators for smooth losses and a non-smooth variant, enabling gradient-free optimization across m distributions. The authors prove optimal convergence rates: the excess-risk estimates converge at O(1/√t) and the saddle-point error at O(1/√t) for both smooth and non-smooth MERO, with an overall complexity of O(1/t). This provides the first zeroth-order guarantees for MERO and demonstrates the practical viability of gradient-free approaches in distributionally robust minimax settings, with potential extensions to nonconvex regimes and broader stochastic saddle-point problems.

Abstract

The minimax excess risk optimization (MERO) problem is a new variation of the traditional distributionally robust optimization (DRO) problem, which achieves uniformly low regret across all test distributions under suitable conditions. In this paper, we propose a zeroth-order stochastic mirror descent (ZO-SMD) algorithm available for both smooth and non-smooth MERO to estimate the minimal risk of each distrbution, and finally solve MERO as (non-)smooth stochastic convex-concave (linear) minimax optimization problems. The proposed algorithm is proved to converge at optimal convergence rates of on the estimate of and on the optimization error of both smooth and non-smooth MERO. Numerical results show the efficiency of the proposed algorithm.
Paper Structure (13 sections, 7 theorems, 131 equations, 2 algorithms)

This paper contains 13 sections, 7 theorems, 131 equations, 2 algorithms.

Key Result

Lemma 1

If Assumptions assum:3 and assum:4 hold, for all $i \in [m]$ and any $\mathbf{w} \in \mathcal{W}$, we have where $v(\mu,\mathbf{w}) \in \mathbb{R}^d$ is an error vector with $\left\|v(\mu,\mathbf{w})\right\|_{w,*} \leq \tau_2$, and

Theorems & Definitions (7)

  • Lemma 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Theorem 6
  • Theorem 7