Decentralized Online Learning in General-Sum Stackelberg Games

Yaolong Yu; Haipeng Chen

Decentralized Online Learning in General-Sum Stackelberg Games

Yaolong Yu, Haipeng Chen

TL;DR

This work investigates online, decentralized learning in repeated general-sum Stackelberg games with a leader–follower structure under two follower information regimes. It develops algorithms for both players and proves last-iterate convergence and sample complexity bounds, covering a myopic follower with limited information and a manipulative follower with side information (omniscient or noisy). The FBM and FMUCB strategies reveal intrinsic follower advantages when information about the leader’s rewards is available, highlighting that manipulation can outperform best-response dynamics in online settings. Empirical results on synthetic games corroborate the theoretical guarantees, demonstrating convergence and measurable gains for the follower under manipulation strategies. Overall, the paper advances understanding of how information asymmetry and strategic behavior affect learning dynamics in Stackelberg games and provides practical online-learning tools for decentralized settings.

Abstract

We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner. We study two settings depending on the type of information for the follower: (1) the limited information setting where the follower only observes its own reward, and (2) the side information setting where the follower has extra side information about the leader's reward. We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting, but not necessarily so for the side information setting -- the follower can manipulate the leader's reward signals with strategic actions, and hence induce the leader's strategy to converge to an equilibrium that is better off for itself. Based on these insights, we study decentralized online learning for both players in the two settings. Our main contribution is to derive last-iterate convergence and sample complexity results in both settings. Notably, we design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy. Our theories are also supported by empirical results.

Decentralized Online Learning in General-Sum Stackelberg Games

TL;DR

Abstract

Paper Structure (25 sections, 15 theorems, 71 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 25 sections, 15 theorems, 71 equations, 3 figures, 2 tables, 3 algorithms.

Introduction
Related Work
Preliminary
A myopic follower with limited information
Algorithm for the myopic follower
Last-iterate convergence of EXP3-UCB
Last-iterate convergence of UCBE-UCB
A manipulative and omniscient follower
Follower's best manipulation (FBM)
Manipulative follower with noisy side information
Follower's manipulation strategy
Last iterate convergence analysis
Empirical Results
Conclusion
Acknowledgment
...and 10 more sections

Key Result

Proposition 1

In a repeated general-sum Stackelberg game, if the follower uses UCB as the learning algorithm, with probability at least $1-\delta$, the follower's regret is bounded as:

Figures (3)

Figure 1: Limited information
Figure 2: Omniscient follower
Figure 3: Noisy side information

Theorems & Definitions (20)

Proposition 1
Theorem 1
Theorem 2
Theorem 3
Proposition 2
Proposition 3
Theorem 4: Last iterate convergence of EXP3-FMUCB with noisy side information
Theorem 5: Last iterate convergence of UCBE-FMUCB with noisy side information
Theorem : Last iterate convergence of EXP3-UCB under limited information
Lemma 1
...and 10 more

Decentralized Online Learning in General-Sum Stackelberg Games

TL;DR

Abstract

Decentralized Online Learning in General-Sum Stackelberg Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)