Table of Contents
Fetching ...

Towards Sharper Risk Bounds for Minimax Problems

Bowei Zhu, Shaojie Li, Yong Liu

TL;DR

This work targets sharper risk bounds for minimax problems by developing a uniform localized convergence framework that measures generalization via gradients. By replacing Lipschitz requirements with Bernstein conditions and leveraging generic chaining, the authors obtain high-probability gradient generalization bounds that are dimension-free under a PL outer-layer. They show $O(1/n^2)$ excess primal risk under additional assumptions and extend the results to ESP, GDA, and SGDA, providing explicit bounds for both high-probability and expectation settings. The approach offers improved sample-efficiency and sharper theoretical guarantees for adversarial training, robust optimization, and related minimax ML problems.

Abstract

Minimax problems have achieved success in machine learning such as adversarial training, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in strongly-convex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak-Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.

Towards Sharper Risk Bounds for Minimax Problems

TL;DR

This work targets sharper risk bounds for minimax problems by developing a uniform localized convergence framework that measures generalization via gradients. By replacing Lipschitz requirements with Bernstein conditions and leveraging generic chaining, the authors obtain high-probability gradient generalization bounds that are dimension-free under a PL outer-layer. They show excess primal risk under additional assumptions and extend the results to ESP, GDA, and SGDA, providing explicit bounds for both high-probability and expectation settings. The approach offers improved sample-efficiency and sharper theoretical guarantees for adversarial training, robust optimization, and related minimax ML problems.

Abstract

Minimax problems have achieved success in machine learning such as adversarial training, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in strongly-convex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak-Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.

Paper Structure

This paper contains 20 sections, 35 theorems, 195 equations, 1 figure, 1 table, 3 algorithms.

Key Result

Theorem 1

Under Assumption assumption:NC-SC and assumption:minimax-bernstein-condition, for any $\delta \in (0,1)$, with probability at least $1-\delta$, it holds for all ${\mathbf{x}} \in {\mathcal{X}}$ that where $C$ is a absolute constant.

Theorems & Definitions (75)

  • Definition 1: Primal (empirical/population) function
  • Definition 2: Strongly convex function
  • Definition 3: Smooth function
  • Definition 4: Bernstein condition
  • Remark 1
  • Remark 2
  • Theorem 1
  • Theorem 2: Theorem in zhang2022uniform
  • Remark 3
  • Remark 4
  • ...and 65 more