Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

Zhihao Gu; Zi Xu

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

Zhihao Gu, Zi Xu

TL;DR

This work tackles MERO, a distributionally robust minimax excess risk problem, by reformulating it as a stochastic convex-concave saddle-point with φ(w,q) = ∑ q_i [R_i(w) - R_i^*]. It introduces a zeroth-order stochastic mirror descent (ZO-SMD) algorithm that leverages UniGE-based gradient estimators for smooth losses and a non-smooth variant, enabling gradient-free optimization across m distributions. The authors prove optimal convergence rates: the excess-risk estimates converge at O(1/√t) and the saddle-point error at O(1/√t) for both smooth and non-smooth MERO, with an overall complexity of O(1/t). This provides the first zeroth-order guarantees for MERO and demonstrates the practical viability of gradient-free approaches in distributionally robust minimax settings, with potential extensions to nonconvex regimes and broader stochastic saddle-point problems.

Abstract

The minimax excess risk optimization (MERO) problem is a new variation of the traditional distributionally robust optimization (DRO) problem, which achieves uniformly low regret across all test distributions under suitable conditions. In this paper, we propose a zeroth-order stochastic mirror descent (ZO-SMD) algorithm available for both smooth and non-smooth MERO to estimate the minimal risk of each distrbution, and finally solve MERO as (non-)smooth stochastic convex-concave (linear) minimax optimization problems. The proposed algorithm is proved to converge at optimal convergence rates of $\mathcal{O}\left(1/\sqrt{t}\right)$ on the estimate of $R_i^*$ and $\mathcal{O}\left(1/\sqrt{t}\right)$ on the optimization error of both smooth and non-smooth MERO. Numerical results show the efficiency of the proposed algorithm.

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

TL;DR

Abstract

on the estimate of

and

on the optimization error of both smooth and non-smooth MERO. Numerical results show the efficiency of the proposed algorithm.

Paper Structure (13 sections, 7 theorems, 131 equations, 2 algorithms)

This paper contains 13 sections, 7 theorems, 131 equations, 2 algorithms.

Introduction
Related Works
Preliminaries
Notations
Zeroth-Order Gradient Estimator
Zeroth-Order Stochastic Mirror Descent Algorithm
ZO-SMD for Smooth MERO
Technical Preparation
Complexity Analysis
ZO-SMD for Non-smooth MERO
Technical Preparation
Complexity Analysis
Conclusions and Future Work

Key Result

Lemma 1

If Assumptions assum:3 and assum:4 hold, for all $i \in [m]$ and any $\mathbf{w} \in \mathcal{W}$, we have where $v(\mu,\mathbf{w}) \in \mathbb{R}^d$ is an error vector with $\left\|v(\mu,\mathbf{w})\right\|_{w,*} \leq \tau_2$, and

Theorems & Definitions (7)

Lemma 1
Theorem 2
Theorem 3
Lemma 4
Lemma 5
Theorem 6
Theorem 7

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

TL;DR

Abstract

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (7)