Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Michael I. Jordan; Tianyi Lin; Zhengyuan Zhou

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Michael I. Jordan, Tianyi Lin, Zhengyuan Zhou

TL;DR

The paper develops parameter-free, adaptive online learning algorithms for both single-agent and multi-agent gradient-Feedback settings. AdaOGD achieves near-optimal regret $O( ext{log}^2(T))$ in the strongly convex single-agent case and, when used by all players in strongly monotone games, yields near-optimal last-iterate convergence to a unique Nash equilibrium at $Oig(rac{ ext{log}^3(T)}{T}ig)$. Extending to exp-concave costs and games, AdaONS delivers near-optimal regret $O(d ext{log}^2(T))$ for the single-agent EC setting and MA-AdaONS attains $Oig(rac{d ext{log}^2(T)}{T}ig)$ convergence in EC games, all while avoiding prior parameter knowledge by using a unified randomization mechanism with geometric variables. The framework unifies no-regret optimization with last-iterate NE convergence in a decentralized, parameter-free fashion, and applies to practical problems such as newsvendor with lost sales and power management. Overall, the results advance the design of feasible, doubly optimal online learning algorithms that adapt to problem curvature without requiring problem-parameter estimation.

Abstract

Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $Θ(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $Θ(\frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

TL;DR

The paper develops parameter-free, adaptive online learning algorithms for both single-agent and multi-agent gradient-Feedback settings. AdaOGD achieves near-optimal regret

in the strongly convex single-agent case and, when used by all players in strongly monotone games, yields near-optimal last-iterate convergence to a unique Nash equilibrium at

. Extending to exp-concave costs and games, AdaONS delivers near-optimal regret

for the single-agent EC setting and MA-AdaONS attains

convergence in EC games, all while avoiding prior parameter knowledge by using a unified randomization mechanism with geometric variables. The framework unifies no-regret optimization with last-iterate NE convergence in a decentralized, parameter-free fashion, and applies to practical problems such as newsvendor with lost sales and power management. Overall, the results advance the design of feasible, doubly optimal online learning algorithms that adapt to problem curvature without requiring problem-parameter estimation.

Abstract

Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of

for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of

. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves

regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of

, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.

Paper Structure (27 sections, 13 theorems, 130 equations, 4 algorithms)

This paper contains 27 sections, 13 theorems, 130 equations, 4 algorithms.

Introduction
Related Work
Our Contributions
Feasible Multi-Agent Online Learning in Strongly Monotone Games
Basic Definitions and Notations
Algorithmic Scheme
Finite-Time Last-Iterate Convergence Guarantee
Proof of Theorem \ref{['Thm:AdaOGD-rate']}.
Applications: Feasible Multi-Agent Learning for Power Management and Newsvendors with Lost Sales
Extensions to Exp-Concave Cost Functions and Games
Single-Agent Learning with Exp-Concave Cost
Feasible Single-Agent Online Learning under Exp-Concave Cost
Exp-Concave (EC) Games
Multi-Agent Online Learning in EC Games
Feasible Multi-Agent Online Learning in EC Games
...and 12 more sections

Key Result

Proposition 2.4

If all cost functions are in a continuous game $\mathcal{G}$ are individually convex, the joint action $x^\star \in \mathcal{X}$ is a Nash equilibrium if and only if $(x - x^\star)^\top v(x^\star) \geq 0$ for all $x \in \mathcal{X}$.

Theorems & Definitions (27)

Definition 2.1
Definition 2.2
Definition 2.3
Proposition 2.4
Theorem 2.5
Remark 2.6
Theorem 2.7
Remark 2.8
Remark 2.9: Importance of doubly optimality
Lemma 2.10
...and 17 more

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

TL;DR

Abstract

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (27)